Agent Health on Enterprise Endpoints

Security Ops

An important function of IT and security teams is the management and securement of enterprise endpoints. This is a somewhat trivial task when there are limited endpoints to oversee. What if there are 100 or even 1000 endpoints? What if the endpoints being managed vary in their common daily use and function? It’s not feasible to manage and secure a large number of endpoints without some sort of tooling that sits on the endpoint itself.

Fortunately, security analysts and IT administrators can utilize a variety of advanced tools to make their jobs easier. These tools are helpful, but they can also provide a false sense of security. Many of these tools work by deploying an agent to each endpoint that sends status updates to a central server for viewing. Sometimes the agents running on the endpoints malfunction and the agent is said to be in an unhealthy state. Sometimes a network error occurs, and the agent is not able to forward data to the central server. Sometimes, programs conflict with each other and cause unexpected behavior. In any case, when this happens, the administrators are unable to see the true state of the endpoint. Worse still, the administrators may not even be aware that the agent is not reporting data.

Imagine that you, as an administrator, have deployed an antivirus agent to every endpoint in your environment. The AV agents diligently detect and remove any malware found on the endpoints in real-time. The agents also send status updates to a central server for viewing. Life is good. Imagine now that an attacker has compromised one of the endpoints and has successfully managed to disable the AV agent. This will open a new avenue of attacks for the adversary since they no longer have to worry about being caught by antivirus. Since the agent is disabled, it is no longer sending status updates to the central server and the security analyst is unaware that there is a problem! The analyst may get lucky and notice that 1 out of 1000 of the agents hasn’t reported data in a while but this is a reactionary approach.

At Code42, we utilize several different tools that work by communicating with an agent that runs on all employee endpoints. We needed a way to proactively combat this issue as opposed to retroactively fixing unhealthy agents discovered by happenstance.

Luckily, we have an ambitious team at Code42, and a solution was quickly developed. Mike Mulcahy, Senior Security Engineer, created a python script that queries our tools that utilize agents and then aggregates several data points into a CSV file for inspection. To clarify, the tool queries each central service where agents report data to; it does not query the agent itself. Viewing or parsing this CSV file allows for quick and easy identification of unhealthy agents, what endpoints those agents are associated with, and how long the agents have been unhealthy for.

*Figure 1: Example CSV File Displaying Agent IDs and Last Seen Times*

Although this is a new and simple solution, it is scripted and further functionality such as automation and alerting can be added! It’s a bit odd to use a tool simply to “watch over” other tools but it is a very useful solution that provides an understanding of the true state of endpoint agents.