Great Minds Think Together: Code42 Security Guild

How do we scale security in development where resources are scarce? The typical answer is via automation, but how do we automate people?

Having worked in security for a few years now, I often find myself with much more work than time. For instance, when I started as an analyst, I had about 10 application teams working with me to ensure security requirements and testing were completed. When I left that position, I had 40 teams requesting that I help them secure their application.

The problem: I am not scalable. 

This led me to wonder: how can we support individuals who want to be secure in a way that is scalable? One answer is collaboration. 

People can’t be automated, but we can deputize security across the organization. The right way for Code42 was to start by trusting developers to “Do It Right” while they “Get It Done”. We can allow developers the autonomy to learn how to code securely. We can also rely on them to run static code analysis tools and entrust them to include this within their deployment pipelines. If we give developers a safe place to address security concerns, we can achieve more together. 

As an added bonus, security no longer blocks anyone’s work. We can simply be available to developers when needed to review any findings or concerns. 

Because they have addressed the low-hanging fruit, developers are able to address the hard questions about security: where and when it can and should be applied. 

Developers are the experts of their domains. They know how security modules will impact their application, and they know where that can happen in their implementations. Furthermore, have you ever looked for a security professional with extensive software development experience? It can be like trying to find a unicorn. If you’ve found one and happen to work alongside them, consider yourself lucky. 

At Code42, we have a Security Guild made up of volunteers from each development team along with members from the security department. These are the folks you want participating in the guild, as they have a general curiosity for how security affects the projects they’re working on, and can take their learnings back to their teams.

Each team’s representative becomes the security expert on their team, and the guild meets regularly to discuss security-related topics. Each representative brings questions and concerns from their development team to review as a team. 

The security team benefits, and the developers do too!

The security guild allows developers to grow their skill sets. The security team benefits from the visibility into the security concerns of their developers. Together, both identify and develop solutions as they collaboratively address their concerns.

A great example of this partnership is when we gear up for the annual Secure Code Review training. As a guild, we review the questions, add new ones, and customize them to topics relevant to us in the current moment. We also take time to have honest conversation on whether the questions are confusing or biased and work towards a better question. This collaboration across departments ensures that we continue to further strengthen our relationships and bring value to one another. 

How do you scale your security engineers?

My take: I’ve been in this role for a few years now, and I’m still learning and growing. I think the most important thing is to be engaged first. Learn how developers receive and do their work. Choose courage and be the person who asks the “dumb” questions. Find balance and hold each other accountable for security. Lastly, discover ways to be collaborative within the organization, because together we win.

At Code42, we are lucky. Our company’s dedicated and engaged employees drove the success of the Security Guild. We’ve encouraged everyone to see security as part of their responsibilities and inspired our most security-curious employees to educate themselves and grow their skills. With great minds thinking together, security is achievable.  

Using Automation to Ease Security Scanning

Meeting your security requirements for authenticated scanning with a vulnerability scanner can be a challenge. Think about it: as a best practice, we are required to rotate credentials on a specified basis. When creating a static scan, we must populate those credentials and need to update them while ensuring they are in sync with the rotation schedule. Doing this by hand is difficult when you have dozens or hundreds of targets, so automating the process is key. AWS Secrets Manager and a Lambda function, combined with an API key, will allow you to set-and-forget this important, yet tedious process.


  • AWS IAM Role with the ability to view Secrets Manager secrets
  • Security tool credentials stored in Secrets Manager
  • Security tool API to update credentials
  • AWS Lambda function written in your favorite language

Optional (the suspense!):

  • Role with the ability to query AWS EC2

Once you’ve got these in place, the process is very straightforward. Your Lambda will pull the credentials used to authenticate with your security tool and call the relevant API for updating the password. Assuming you have more than one scan credential to rotate, create a dictionary of Secrets Manager secrets to the name of the credential in your security tool, loop through this dictionary and enjoy the time savings.

One additional step we take is updating our scan targets prior to scans. If you have frequent cloud releases, that can increase the likelihood of omitting newly spun up instances, thus leaving a blind spot. While we are updating rotated credentials in our security tool, we update our scan targets as well. This is where the “Optional” step from above comes into play. In your lambda, query for running EC2 instances, and with the appropriate API, push those hostnames to your security tool.

With one simple Lambda you can rid yourself of tedious, manual, password and scan target updates. We made the switch and scans are no longer something I loathe, because I’ve automated the process. As Ron Popeil, R.I.P., so famously said, “Set it and forget it!”

Simple Python PW rotation example:

import boto3
from import TenableIO
import json

# Boto3 Resources
secrets_config = boto3.client('secretsmanager')

# Get ScanTool credentials
# Authenticate to ScanTool prior to taking further actions
scantool_sm_arn = 'arn:aws:secretsmanager:${Region}:${Account}:secret:${SecretId}'
scantool_secrets = secrets_config.get_secret_value(SecretId=scantool_sm_arn)
scantool_secrets = json.loads(scantool_secrets['SecretString'])
scantool_key = scantool_secrets['scantool_key']
scantool_secret = scantools_secrets['scantool_secret']
tio = TenableIO(scantool_key, scantool_secret)

# Get Authenticated Scan Credentials to Rotate
auth_scan_params = 'arn:aws:secretsmanager:${Region}:${Account}:secret:${SecretId}'
# Authenticated Scan UUID in your Scan Tool, which we want to rotate
auth_scan_uuid = '${Auth_Scan_UUID}'
auth_scan_secrets = secrets_config.get_secret_value(SecretId=auth_scan_params)
auth_scan_secrets = json.loads(auth_scan_secrets['SecretString'])
auth_scan_rotate = auth_scan_secrets['password']
# Gather and print description of configured scan to be rotated
# This can be used to troubleshoot is something goes awry
scantool_details = tio.credentials.details(auth_scan_uuid)

# Rotate PWs

Simple Python scan target example:

import boto3
from import TenableIO
import json

ec2_scan_list = []
auth_params_arn = 'arn:aws:secretsmanager:${Region}:${Account}:secret:${SecretId}'

# Boto Resources
ec2 = boto3.resource('ec2')
secrets_config = boto3.client('secretsmanager')

# Get ScanTool credentials
# Authenticate to ScanTool prior to taking further actions
secrets = secrets_config.get_secret_value(SecretId=auth_params_arn)
secrets = json.loads(secrets['SecretString'])
scan_tool_key = secrets['key']
scan_tool_secret = secrets['secret']
tio = TenableIO(scan_tool_key, scan_tool_secret, url='https://${Scan_Tool_URL_Here}')

# Get information for all EC2 Instances
running_instances = ec2.instances.filter(Filters=[{
    'Name': 'instance-state-name',
    'Values': ['running']}])

# Get private_ip for instances and add to tenable_list
for instance in running_instances:
    private_ip = instance.private_ip_address

# Get the target_group_id
# Scan Target Group to be updated
for tg in tio.target_groups.list():
    if "${Resource_Scan_Group}" in tg['name']:
        target_group = tg['id']

# Update Scan Target Group with running EC2 list
tio.target_groups.edit(target_group, members=ec2_scan_list)

How to tag at resource and account level in AWS?

What is Tagging?

A tag is a label that you attach to an AWS resource to meet different requirements. It makes it easy to identify the owner, service, environment, cost center, data classification, and many other details. Each tag has two components- Key and Value. Both are case sensitive. The maximum number of tags allowed per resource is 50. You can find other tag restrictions here. The below example shows how a resource is tagged-

Graphical user interface

Description automatically generated with medium confidence

What are the different assets which can be tagged in AWS?

At the time of writing this story, AWS allows tagging the following assets-

  • AWS Resources (for example- ec2, rds, ebs, subnet, vps, and others)
  • Root Organizational Units and its Children OUs within AWS Organizations
  • AWS Accounts within Organizational Units
  • Policies within AWS Organizations

How to address the tagging problem at the AWS account level?

An organization may have several hundreds of AWS accounts intended for specific purposes. Some of them are dedicated for development purposes so engineers and developers can work on the next big thing. Some are production environments. Staging environments are also separated and mimic to production. When the code is pushed, it is first sent to staging before it goes to production. Tracking and managing hundreds of accounts for ownership, data classification, cost center, tier, environment, service, and primary contact is very difficult. It is not efficient to track in an excel file or google doc or any other out-of-band method. 

One method to solve this problem is to have tagging included at the account level. This means you can attach the tags to the AWS accounts within AWS Organizations. You can tag AWS accounts using the AWS Organizations console or programmatically. AWS Organizations help to centrally manage, consolidate, and govern all of your AWS accounts, allocate resources, apply policies and simplify billing. Here are instructions to create tags at the organization level –

How to address the tagging problem at the AWS resource level?

You can tag existing AWS resources or tag them upon creation. But not all resources are supported for tagging. You can find what EC2 resources are supported for tagging here. There are different ways you can handle missing tag problems at the resource level. The most popular method is to declare the tags within Terraform at the AWS provider level. You can read here on how to propagate the tags from the provider level down to all resources using the Terraform template. Cloud Custodian provides several ways of solving the missing tag problem for both existing and newly created AWS resources. 

Among them, the auto-tag-user action item is very powerful in automatically tagging the resources upon creation with missing owner tags. This saves time for the analyst to identify all the resources that are missing the mandatory tag requirement and, more importantly, identify the individual who stood them up to take corrective actions. The auto-tag-user action item is supported for both the public cloud providers AWS and Azure. You can read the story here on using the guardrail for doing auto-remediation for missing tags

Final Thoughts

Tagging is very important and plays a very critical role in knowing and managing your cloud resources. Don’t get too complex with your tagging that the developers find it difficult to understand when to label what or introduce so many tags that it soon becomes unmanageable and irrelevant. This will result in a number of hours trimming the tag count. Simply; start with the basics and see what works best for your organization.

How do we contribute back to the Open Source Community?

We have all heard about the term “Open Source”. This refers to the code that is either written by an individual or group of people or community and made available to the public to access for free. The code is available for anyone to view and modify under the open-source license agreement. One such project that is very popular among the cloud governance community is Cloud Custodian from Capital One. In this blog we will discuss what Cloud Custodian does, associated components, and how we use it and contribute back to the community.

What is Cloud Custodian?

Cloud Custodian is Python-based and has many scripts, tools, and capabilities all in one application. It is a rule engine where you can write policy definitions in YAML. This enables an organization to manage their public cloud resources by writing policies for cost savings, exploring asset tagging, compliance, security, operations-related concerns, and resource inventory. Cloud Custodian supports AWS, Azure, and GCP Cloud Providers.

Open Source | Python-based | Agentless | Serverless | Governance-as-Code | Real-Time Guard Rail | Visibility | Powerful Cloud Security Management Tool

The first step is to write the simple YAML DSL policy that allows you to define the rules that include the resource type, filters, mode, and actions. The below command will deploy the Cloud Custodian policy as a Lambda function. custodian run -s .  policy.yml  –assume arn:aws:iam::123456:role/c7n

Cloud Custodian, behind the scenes, automatically creates the CloudWatch Log Group and CloudWatch Event Rule. Within the policy, you can define the output directory where the Cloud Custodian can save the output. In this case it is an S3 Bucket. It can further be ingested into your SIEM solution where you can write different queries and draw dashboards.

What does a Policy look like?

The example below shows what a Cloud Custodian policy looks like:

The policy contains the name, resource type, filters, mode, and actions. The above policy applies to the resource type “aws.rds”. Under filters, we have allowed the exemption tag if the business wants the RDS instance to be accessible to the public. Cloud Custodian is looking for a filter named “PubliclyAccessible”. When it finds the value to be true, it flags the resources for non-compliance.  The mode type includes the schedule of the policy to run. In this case, it is based on the cloudtrail events. The action defined is to delete the RDS and skip the snapshots. It will also send an email notification to It will use the default template in HTML format. 

How are we giving back to the community?

The Code42 team responsible for Cloud Custodian has been very active and engaged with the community, both leveraging existing knowledge, innovating, and giving back.   We share back with the community by engaging in a real-time chat at Gitter, submitting issues at GitHub, writing blogs, and sharing the policies we have created and have had success with. You can find various Cloud Custodian blog posts on Medium that have been created by the Code42 team. Below is a breakdown and summary of the topics we have created that you may find useful.

Stories at Medium:


Policies for AWS

Policies for Azure

AWS Dangling Resources, revisited

This topic, again? Yeah, egg on my face, I believe I ended the last blog with something like “constant pursuit of better.” The truth of the matter is, I deployed the code from the original post and, upon review of the results, it became clear that something was amiss. We were seeing results which we knew were false positives and thus knew that we needed to dig deeper.

The way we originally tackled dangling resources, via Route 53, was by looking for a entry and then attempting to locate an S3 bucket matching that entry’s alias. While this logic may work in circumstances where a user tells S3 to host the contents of the bucket, it doesn’t necessarily hold true when the CloudFront service is properly utilized. I reached out to the team who handles this type of configuration to get a better understanding of how it’s properly configured and to make sure the change in logic would both eliminate the false positives and also report actual dangling resources.

CloudFront allows you to name entries whatever you’d like, they don’t need to match an S3 bucket name; the “Origin domain” in the entry is the associated S3 bucket. So how do we go about building logic to make sure S3 buckets associated with CloudFront distributions still exist? Much like before, we are still looking for S3 distributions and as we iterate through CloudFront entries we are gathering:

  • Origin domain
  • CloudFront ID
  • CloudFront domain

If the Origin domain contains “s3”, we drop the tail end of the domain (ex: and check to see if an S3 bucket exists that matches that domain. If it does exist, awesome. If not, we report that we have a dangling resource which needs to be addressed.

We still check Route 53 for entries pointed at CloudFront domains to validate if a distribution is configured properly. This becomes a bit more straightforward since we already have a dictionary containing all CloudFront defined entries. We then iterate through Route 53 records looking for “cloudfront” in the DNS name and then compare the AliasTarget DNS name to our dictionary of CloudFront distributions. If the AliasTarget DNS name is not in the CloudFront distribution dictionary, we report the useful information as a dangling resource!

“Why do we fall, sir? So we can learn to pick ourselves up.”* It’s easy to get discouraged when things you’ve spent time architecting and building fail but it’s important to learn from such failures. Rather than telling me my results were wrong, the teams I support helped me better understand the deployment architecture and data. Through that learning I was able to accurately identify misconfigurations and detect dangling resources. I hope you too can accept this blog update for what it is and see the positive side of failure.

* Alfred Pennyworth, Batman Begins (2005)

How We Made Threat Assessments Fun

At Code42, we move fast, but our security process and the way we do threat assessments has had a tough job of keeping pace with our development teams. Add the pandemic to this challenge and we had a hard time keeping our developers engaged in this critical process. This year, we took the opportunity to rethink how we do threat assessments by making the process virtual and in line with our current development environment.

At Code42, we have been playing Microsoft’s Elevation of Privilege (EoP) game. When it was first created, the game was pretty ideal as a threat assessment tool for application development. It allowed players to use their creativity and think through possible ways to attack their application using the “STRIDE” framework. STRIDE is a mnemonic for Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege, each a mechanism for attacking a service.

As a DevOps shop, we definitely find STRIDE has a lot of potential for threat assessment, but the EoP game as-is doesn’t address all areas of development at Code42.

DevOps has reshaped traditional roles and responsibilities in application development. The lines of who is responsible for what tend to blur between different teams and the cloud computing environments in use. When we used to play the game, some of the attack scenarios were pretty outdated and only focused on a small fraction of the space in which our teams are creating products and services. 

We needed the game to be relevant for the microservice architectures our developers use such as on-demand functions, containers, virtual machines, load-balancers, web stacks, authentication tokens, and the like. And still, we want the game to represent both the endpoint application development environment, and reflect cloud services, micro-services, and distributed computing environments as well.

Furthermore, we want to focus on the challenges of our highly distributed development environment where teams are focused on UI/UX, front-end interaction, back-end databases, metrics and telemetry, and infrastructure deployment. We want to ensure we capture a broad view across all development environments that are needed, and not just focus on how an endpoint application is secured. 

With the greater power that is DevOps comes great responsibility, and we wanted the game to challenge our developers to think more broadly about the systems and environments for which they are developing. This approach gives the developers and the security team more insight into our product so we can better defend against future attacks.

For those reasons, we found a new way to conduct threat assessments, with … another game!

To play, we start off with a review of the type of feature being developed, its infrastructure and architecture, and any additional components in play. 

One of the most crucial roles in the game is The Scribe (aka facilitator). This person will guide the conversation, take notes of any findings, drive the threat modeling conversation, and award points as needed. The second most crucial role is that of the Subject Matter Expert (SME) – they can be called upon to clarify an attack or validate a proposed remediation. The Scribe will divide up the attendees into two teams, and ensure that a SME from the scrum team is on each team. 

To get started, the facilitator calls on a player and offers a STRIDE category scenario. To score a point(s), the player will read the attack listed and then provide an example of how an attacker could use that particular vector against the application. If the player cannot think of anything, they open it up to other players on their respective team to offer an attack scenario in that category. Additional points will be awarded to the team who can come up with a viable attack, regardless of severity, and to those who can provide the mitigation technique (especially if it’s one we’re already using).

If the player requires help, they can ask for a hint on the application or “phone a friend” and call on the SME. For the hint, we draw from previous experience in pentests or the latest CVEs that have been reported.  To keep the game moving, each category will have five minutes for discussion and brainstorming. The opposing team will then have one minute to provide further attacks or mitigating techniques. This will be played until at least all of the letters in the STRIDE Framework have been reviewed. 

The team with the most points WINS, but really, everybody’s a winner when they play with the security team!

The goal of this game is to get developers to put their security hats on and think like an attacker. Given the nature of DevOps, we will continuously refine the threat modeling scenarios to make this a better learning experience for all involved. 

At the end of the day, we believe playing games will allow our developers to stay engaged with the security team by seeing attack vectors from the perspective of an attacker and at the same time, have fun in the process.

AWS Dangling Resources

Do you know what lurks behind your unattached resources that were thought to have been destroyed? Not only are there potential cost implications, paying for resources you are no longer using, there are also risks that threat actors may try to exploit these resources. In our cloud journey, we found this to be a common problem which we had to deal with. Here is how we resolved dangling resources at Code42. 

We utilize Terraform to deploy resources to AWS, and by that same method, Terraform should tear down those resources when they are no longer needed. At the end of last year we found that, despite deploying resources as code, we occasionally had bug bounty researchers notifying us of resources upon which they could squat. 

One easy example of how you could get into this situation: 

  • Create an S3 bucket
  • Host the contents of that bucket with CloudFront
  • Delete your bucket but not the CloudFront entry 

The way AWS handles S3 bucket hosting via CloudFront allows for someone to create an S3 bucket in their account with the same name as one you were hosting, thus potentially hosting a malicious site posing as your company. Their S3 bucket will get attached to your CloudFront entry and the malicious hacker could claim traffic going to your domain. AWS doesn’t appear to have an offering to look for and notify account owners of these dangling resources (interestingly enough, Azure does notify when dangling resources are found within an account). Without an AWS service to handle this we decided to build an in-house solution.

So what are we looking for? As mentioned above, we focused on CloudFront entries which don’t have an associated S3 bucket, or Elastic IPs which aren’t associated with an EC2 instance. I built a pretty simple script in Python, utilizing the AWS SDK Boto3. Since EIPs and CloudFront entries are configured via Route53, we start by listing all zones which are configured for the account and we pull all records within that zone. We are interested in any records which point at CloudFront, since those will be tied to an S3 bucket, and ‘A’ records which will contain EIPs. For records with CloudFront, we compare the name in the record to a list of all S3 buckets within the account. If there isn’t an S3 bucket containing that name, we’ve found a dangling resource. And now, we report it. Similarly with Elastic IPs, we call describe_instances and provide the EIP we found in Route53. If an EC2 instance doesn’t exist with that IP, we’ve found a dangling resource. Again, report it. It is important to note that you can have ‘A’ records porting to internal resources, which won’t have an associated EC2. I’ve added logic to filter those internal resources out based on matching an IP string.

The linked code is a modified version of what we have deployed; this is a small section of a much larger Python script used for info gathering. This code is meant to be run locally, when authenticated to an account, or from within an AWS account. We have our script deployed as an AWS Lambda. Amazon EventBridge handles the scheduling, running every 24 hours, with the results being output to Amazon S3. We have configured a SumoLogic S3 collector which allows us to review and alert on findings.

Deploying infrastructure as code allows for consistent deployments and easy validation that things are correct. While this should add reassurance that resources are torn down appropriately when no longer needed, manual deletions can let resources slip through the cracks. It’s important to verify you’re not leaving resources out there for opportunists to take advantage of. Even if you think your policies and procedures should prevent dangling resources from occurring, review the script and scan your environments.

Associated code:

Security: be Agile, be effective

In a blog post I wrote not too long ago, I dove into the reasons why security teams need to learn how to code. In short, as the lines between security practitioner and developer continue to blur due to the rise of infastructure-as-code, automation, and security tools becoming more developer-friendly, it is very hard to be a top-performing security team without coding. Today, I’ll take this topic a bit further and talk about why security teams need to use Agile as well.

What is Agile? There are many different interpretations of Agile and countless books, videos, and blogs on the subject. Since I don’t come from a formal development background, I’ve never done software development following the Agile methodology, but inexperience doesn’t need to be a barrier to using it. The way our security team uses Agile at Code42 encapsulates several key concepts::

  • Iterative development using small chunks of work effort
  • Ongoing and timely feedback from stakeholders
  • A willingness to experiment that focuses on outcomes rather than jumping to solutions

One of the core tenets of Agile is to iterate and break work up into small pieces in order to rapidly develop, test and deploy. In this regard, it is the opposite of the Waterfall method, with its structured project plans and multi-month timelines. Agile doesn’t mean that complex projects or features can’t be implemented, but that each iteration in the journey to the end state should be able to stand on its own.

It is fairly common in security to see a lot of major projects that have a waterfall-like mentality. Think of implementing a SIEM tool, or maybe rolling out a new endpoint security tool across the organization. Those kinds of projects typically rely on everything coming together perfectly over the entire timeline, and when things inevitably go wrong, the result is slipped deadlines, finger pointing, and angry stakeholders.

With Agile, you can still get to the end, but the path is much more flexible and reactive — it is called “Agile” after all! Instead of rigidly defining the exact order of steps in a project and creating a lot of dependencies between large work efforts, which are inherently fragile, Agile’s goal is to break it up. That massive SIEM project timeline becomes smaller, more independent work efforts like stand up a collector, send one kind of log and validate it, and so forth. That way, if the project priorities or needs change, there isn’t a massive amount of planning that needs to be redone.

That brings me to my next concept: Ongoing, rapid feedback from stakeholders. One of the promises of Agile is that it can prevent the divergence between what stakeholders want and what practitioners deliver, which often happens when large projects run into problems. Flexibility is a must in today’s rapidly-changing world. Take our SIEM project example: what if a SIEM project that started in February had a timeline for onboarding VPN logs in June with a lot of dependencies beforehand? The realities on the ground, and the need to reprioritize that VPN work, would have resulted in a lot of rework and wasted effort put into that original timeline.

With Agile, the goal is to finish your work quickly, then sit back down with your stakeholders and ask questions like “Did this deliver? What is your next priority? What has changed since we last spoke?” This ensures that security teams are delivering real value to stakeholders, instead of going down rabbit holes for months at a time and ending up with a solution that doesn’t improve the security posture of the organization.

The last concept is probably the hardest for security teams to be comfortable with: having a willingness to experiment and fail, while focusing on the outcome instead of the solution. Too often security teams jump to the solution (SIEM, EDR, firewall) before identifying the desired outcome (alerting on events of interest, identifying and acting on C2 traffic). This narrow focus can result in poor resource allocation, or even worse a fundamental misunderstanding of the problem that results in a tool that doesn’t solve it.

There are many, many ways to solve security problems, and experimenting with alternate solutions or questioning the status quo may result in less complex and more efficient ways of achieving goals. These simpler solutions may not have the cachet of a major project, but if your desired outcome is to alert on certain events from a security tool, a one-week effort at leveraging a script, an API, and a cron job may be much more effective than a long SIEM implementation.

How does a security team start to leverage Agile? If you work in an environment where you have Agile development teams, you are in luck: you can reach out to them and start learning the tricks of the trade. Even if you don’t, however, you can become more Agile by breaking your work efforts into small pieces, setting up ongoing feedback both internally and externally, and focusing on the outcome rather than jumping to the solution. Do all of this and you’ll find that you are able to deliver more security value to your organization.