The promise of automation is doing more with less, freeing people from repetitive tasks allowing focus on more interesting activities. This claim makes for a great tagline but can fall short in implementation.
Automation doesn’t have to include complicated machine learning or deep learning. It could be a simple script. Automation is far from a panacea and can create hard to rectify issues. In this post, I’ll provide some perspective and a quick zero friction gut check on the impacts of automation.
Issues from automation aren’t theoretical, they’ve happened to me, and they’ve happened to you. Before we dive in, let me describe two very recent events.
A mobile provider charged me for a phone I didn’t have. Since I had autopay, it immediately hit my credit card. I called them countless times. At first, everyone I talked to treated me with skepticism. After all, how could my situation happen? I was transferred around to multiple departments while I could almost hear people’s eyes rolling. After a while, everyone I talked to in every department saw the issue and was very sorry. Once identified as an issue, nobody from the various departments (including billing) could fix it. This was over the course of weeks. I was in a weird limbo, an outlier. I’d been a customer for 20 years, and it literally took them taking money from me to leave.
In another instance, I made an order from an online retailer. My package went out for delivery in my neighborhood and then back to the regional distribution center every day for two weeks before I received a notification my package was being returned to the sender for having an “Insufficient Address.” Four days later, I received the package with the notification that it was delivered to the original sender (I was the recipient.) When I looked at the label, my address was clearly printed and visible, but above my name there was a single question mark. I used the same ordering system that everyone else used, but somehow, without my interaction, something got messed up.
Automation is inevitable, and I’m not suggesting we don’t automate, but we need to understand the negative impacts and implement mitigations. Two things about the future are certain, error rates will increase, and it will become harder to correct errors. Automation removes people from a process, but if you remove the people entirely, unanticipated small issues can become big fast.
Two things about the future are certain, error rates will increase, and it will become harder to correct errors.
An algorithm or process that is 99% accurate seems great, but think about this, if you have a million instances, then 1% is 10,000. That is not an insignificant amount. Imagine each instance is a person, so potentially 10,000 people are affected by an issue.
Companies are often quick to implement automation, but they don’t consider the adverse effects or how they’d handle these effects. The negative impact to humans is often thought of as job loss, and any technical issues that crop up are considered addressable with a future tweak.
A human system without automation may very well have a higher error rate, but there is a human in the loop. People are more likely to believe that a human made a mistake than an automated system. Being this is the case, human systems can have more robust resolution processes.
Our confidence in technology will lead to a lack of trust in other humans. The system can’t be wrong, so the human must be wrong. This isn’t a perspective we should encourage.
Automation Impact Audit
We have to realize that technology is not infallible, and mistakes will happen. Given this fallibility, you need to have an appropriate mechanism for people to correct errors and inaccuracies.
To start with, you can perform a simple process that I call an automation impact audit. This audit will help you understand the process being automated and help identify potential issues. This process looks at a few fundamental elements, Process, Inputs, Impact, Detection, Rollback, and Resolution.
Evaluate the process you are automating. What components make up the system today, and how will it look after implementing automation? What percentage of the process will be automated? This can range from automating small tasks of a larger human process to the complete removal of humans from the loop. Although any inaccuracies can be bad, in a process that has humans completely removed, they can be harder to detect and resolve.
How complex is the process you are trying to automate? Higher complexity systems can lead to a higher number of unexpected issues. You should also evaluate the data you are using and implement data quality standards. Poor quality data can lead to poor decisions, and sometimes you don’t realize it until after the system has launched.
What is the impact of a wrong or inaccurate decision? Will people or businesses incur harm, or would it be relatively inconsequential? Understanding the answer to this question is one of the most important aspects of the audit. The higher the impact of inaccuracy, the more controls you need to put in place. Something that may appear as a minor issue or cost may be a point of frustration for a customer, causing them to discontinue using your service.
Do you have a way of detecting issues that could result from your implemented automation? If not, think of ways or areas that you can measure and determine if a problem arises. You should also implement this detection periodically so that you can look for issues over time. This can be an indicator that something is changing in your data or process that needs to be adjusted. Customers may not report issues and just choose to discontinue using your business.
How do you get out of automation? Do you have the ability to go back to a previous version that worked better or had fewer issues? If you have reassigned people who were previously performing tasks that automation handles now, you may not be able to go back to your previous state. This is why automation implemented in phases is more robust to the overall process.
Is there a way for impacted parties to correct issues and inaccuracies? Just having a resolution process isn’t enough. That process needs to be clearly communicated so that people know what to do when an issue arises.
Human-level performance doesn’t equal human-level resolution.
Human-level performance doesn’t equal human-level resolution. When implementing automation, robust resolution processes need to be implemented, allowing for proper resolution when issues present themselves. Preparation and care are required before implementing automation, including considerations of the impacts as well as clearly communicated resolution processes. Automation thrown at a problem for the sake of automation isn’t a winning strategy.
We are hosting a series of events around the U.S. called Integrated Technology Summits where we talk about the real-world struggles facing security leaders and how leveraging integrated security technologies yields practical solutions. We typically feature two or three of our technology vendors and discuss how these can be made to work better together, creating operational efficiencies and security effectiveness – saving time and money, while also helping to reduce risk.
The cohesive message we tell with our vendors is made possible using API and other integrations, which then allow us to do some really cool and exciting things with automation and orchestration. But before we get to that, it is important that we first understand why we want to integrate these solutions. After all, if we do not have a real need and are not solving a real problem for our organization, then this is merely an interesting exercise that may or may not actually help us reduce risk in the enterprise.
Many clients that we talk to have a visibility problem – put simply, they don’t know what they don’t know about what is happening in their environment. Many organizations have some level of visibility at the traditional perimeter thanks to firewalls or IDS, but often lack an appropriate level of fidelity at other critical junctures – at the endpoints, within the network perimeter, or into the cloud, just to name a few. In a world where the traditional network perimeter is eroding, these vantage points provide a wealth of information into assets, risks, and exposure.
Sometimes these blind spots are the result of a technology gap – the organization does not have the appropriate tools implemented to provide the level of granularity desired. In other cases, capable tools do exist in the environment, but the telemetry data that they are (or could be) collecting is not being shared with other solutions in the environment. These other solutions, such as a configuration management database or threat intelligence platform, are inadvertently rendered less-effective by virtue of having less information – knowledge is power.
Many organizations we talk to are not entirely comfortable with the idea of fully, or even partially, orchestrating security activities. They fear that something may go wrong without a human in the loop to provide a sanity check against automated actions, which could potentially disrupt business operations. Integrating technologies in the environment for the purposes of data sharing is a good way for organizations to begin exploring automation and orchestration. A sharing-only approach provides an initial value discussed above, lays the technical groundwork for additional, automated capabilities in the future, and provides an opportunity for organizations to self-evaluate whether their maturity and culture will allow more robust usage of these capabilities.
Sharing is Caring
At a recent Integrated Technology Summit in Dallas, Kudelski Security featured three of our technology partners to discuss automation and orchestration – McAfee, Aruba Networks and Illusive Networks. At first glance, these three vendors may seem to have little in common. But as we discussed with the security professionals in attendance, each has a vantage point (or perhaps multiple vantage points) and collects relevant information about the IT environment that could enhance the capability of the other tools, if only shared.
Fortunately, most vendors in today’s market embrace the heterogeneous best-of-breed ecosystem that defines many enterprises today. But, as we also discussed with the audience, platform-based security vendors leverage the capabilities of automation and orchestration, inherently, and also provide this for their customers to utilize outside of the platform. For example, although security tools can be integrated point-to-point, leveraging a common communications layer such as McAfee’s OpenDXL abstracts the exchange of information from the underlying application architecture and reduces the integration and maintenance complexity of McAfee and numerous third-party tools.
With the communication groundwork laid, organizations can begin the process of enabling data exchange. Aruba wireless access points can share telemetry data on connecting endpoints, including device information, location, and time. Illusive Networks can share high-fidelity alerts on advanced persistent threats and zero-days exploits it detects when a deployed deception is triggered. McAfee Advanced Threat Defense can share threat intelligence information from indicators of compromise (IoCs) identified through code analysis or malware sandboxing.
Beyond these use cases, the sharing possibilities of contextual data are numerous, especially as organizations consider integrating additional tools within the environment, such as directory services and other security analytics tools we may have deployed. These integrations can begin to address the blind spots they have in their environment and establish a path forward for additional integrations and orchestration opportunities to make sure our security tools can (you guessed it) work better together.
Be on the lookout for an Integrated Technology Summit soon in a city near you. Our next event will be later this fall in Austin, Texas. For a full list of Kudelski Security events, click here.
With our final part in the Security Automation series, Kudelski Security will take a look at what our clients are doing to take their manual playbooks to the next level, using automation. Before we take a look at playbooks, a quick review of the key factors from our previous articles to ensure automation success.
Keys to Successful Automation
- Understanding your organization’s problem that you are trying to solve
- Understanding your organization’s environment as it exists currently
- Understanding the maturity of your program and mapping it to a cybersecurity framework
- Identifying the business risk areas that automation can solve
- Identifying the common business issues across teams in the organization
- Designing and documenting manual processes to allow for future automation
With a firm grasp on these automation keys, let’s take a look at a common use case for most environments that are ripe for automation: Firewall Requests.
With most organizations protecting their network perimeter with a firewall (or ten), a question we get asked often is how to make that process more efficient, even if without automation, but preferably with automation. Let’s take a look at a fairly common workflow for generating firewall changes.
One of the first things we review is how the teams interact with one another for any workflow. In this example, each team is shown in a different color because they interface with each other using a different mechanism, whether that be by email, chat, or by separate IT Service Management (ITSM) tools (e.g. ServiceNow). While everyone in the technology world has been looking for the holy grail of a single pane of glass for all tools, for processes like these and most other IT operations processes, it is fairly routine to have all parties involved using the same means for communicating and documenting the tasks in a workflow. Using the same tool or mechanism for communication allows you to then create a separate channel or project for each workflow, and each channel or project includes only the team members needed for that workflow. For example, most network operation teams use an ITSM and have their own project for handling day to day activities, and the same for the firewall team, application team, etc. By leveraging that same ITSM and creating a new project that includes all teams that are required for that workflow, there is a central location for all communication and interaction, for both a manual and automated process.
Once a centralized location or mechanism for handling these requests has been agreed upon, it really facilitates open discussions and interactions on what the actual process should look like, where changes can be made, and where parts of the process are being duplicated. We have seen the most success when the teams get together, in the same room (crazy right?), and iron out the process to where all parties are satisfied and unified. From a manual process, it becomes a much cleaner simpler process that looks like the figure below.
From this cleaner process, we can apply automation to handle the interactions between the central location or mechanism for communication (i.e. ITSM, email) and the relevant technology to handle the firewall changes via automation platforms or custom API gateways. The final playbook comes out like the figure below.
From this playbook, we can see how everything comes together. The interactions between the ITSM and Firewall Management System are handled via web-hooks or API calls either after the push of a button from the ITSM workflow or via the custom-built automated workflow in the Firewall Management System (e.g. FireMon). This allows for maintaining a human interaction in the loop but drastically reduces the required work for that individual. That individual or team is also only using one system for this interaction, with all data flowing through the ITSM for process handling, thus reducing the number of systems required for access, and creating a much easier audit trail for long-term auditability and accountability.
Reviewing Keys to Automation Success
Let’s take a look at a few of the key areas for successful automation:
- Understanding your organization’s problem that you are trying to solve
- In our example, the organization’s problem was a decentralized firewall request process that led to many interactions with multiple systems to accomplish the same task.
- Understanding your organization’s environment as it exists currently
- In our example, the organization’s environment was a typical enterprise environment, with teams existing in different segments. These teams are required to interact with one another to accomplish certain tasks but do not see the inefficiencies in the overall process. Once able to understand each team’s process and responsibilities, the realization became clear that there were many areas they were able to consolidate.
- Identifying the common business issues across teams in the organization
- In our example, the common business issue was each team had their own process for handling the same task, albeit a smaller portion. With each team having to create and manage their own ITSM workflow and incidents and not able to interact with other teams, there were lots of tasks being repeated, and no centralized location for communication or data retention.
- Designing and documenting manual processes to allow for future automation
- In our example, the creation of the visual workflow for the legacy process was a key driver to have each team get together and spend quality time redesigning the process. The process was disjointed and repetitive for no real reason and a simpler, more consolidated, process was quickly realized and developed. By consolidating the process and having each team be a key stakeholder in the process, the benefits of automation could be realized by augmenting what the teams already handle on a daily basis.
Final Thoughts and Questions
Many companies are beginning to understand that automation will not soon replace human interaction in IT, but there are many processes that can be improved by it. Automation should not be used to fix a broken process, but to give information to those who can fix the problem in a faster, repetitive, and less expensive fashion. This will allow your superstar employees to continue to be superstars, freeing them from remedial tasks, and allow for less experienced employees to get all the information they need without having to needlessly search for the correct resource to engage for assistance. From a CISO’s perspective, it comes down to what problems am I having, and is there a sliver of a chance that automation can be leveraged to maximize the return on investment. At Kudelski Security, we are happy to assist with understanding what issues exist and where there are potentials for automation or just assistance helping to make processes more efficient, just let us know how we can help.
In our previous Security Automation series post, we identified areas that should be reviewed to allow for the most success with automation. Those areas included identifying the problems, dealing with the environment, and looking for frameworks that can apply a solid foundation for the security program and its automation success. In this post, we will look at how to apply those ideas to start building a security program that is designed for automation to have a key role in the program’s success.
Identifying and Determining Risk
After reviewing the problems your organization is facing, and observing those problems in your environment, the next step is to identify areas of risk and quantifying those areas, to determine the risk level for each set of problems. How do you quantify these risk levels?
- Data gathered from monitoring tools
- Data gathered from business owners
- Internal or external scans
Data gathered from monitoring tools
Using monitoring tools such as SIEMS or log-aggregators allow for a centralized location for events happening in your environment. These tools provide vital information about the systems in your environment, and serves as the launch point for many processes and tasks, and with the vast amount of information in a centralized place, allows automation to have information readily available and accessible. Simple items, such as hostnames, users/groups, IP addresses, and application names are items that security analysts spend a lot of time gathering for each incident, and are required to get the same information for every event. Having the right monitoring tools along with automation can serve that information up to the analysts automatically for every new event.
Data gathered from business owners
Getting information from business owners, managers, or individual teams about how much risk is associated with each product, application, or service is crucial to the overall security picture for your organization. This information is often not represented well, due to teams not communicating with each, not getting the information on a scheduled basis, or not having a central place to store or visualize the information. This is where frameworks and services such as Kudelski Security’s Secure Blueprint really assist organizations determine the risk for each business owner, and rolling those risks up with the security posture of the organization.
Internal or external scans
Running continuous or scheduled scans of your environment for vulnerabilities, new or removed systems, and network changes allow you to get an understanding of what needs attention. These scans and tools give a technical picture of the potential holes in your environment, and allow both the business owner and your security team to determine the risk associated with each system, and which process to apply to remediate these systems. Automation can play a large role with these scans, from running the scans on a scheduled basis, to moving high risk systems to a quarantined zone, or to remediating the systems based on overall risk score automatically.
Identifying Common Business Issues
As areas of risk come into focus, taking a look at those areas to determine any commonality, identifying common challenges and platforms, even if it spans multiple teams. In large organizations, one of the largest challenges with both security and automation, is that each team will not communicate with each other well, often using similar platforms and duplicating the work. What common issues are occurring in each risk area?
- Fatigue from overall volume, leading to high mean time to response (MTTR)
- Lack of relevant information, leading to multiple teams responding to same issue
- Lack of documented processes
Fatigue from overall volume, leading to high mean time to response (MTTR)
Large majorities of security and IT teams across all verticals deal with alert fatigue, either from not having enough personnel to deal with the events, not properly configuring their security tools, or from not have a well-defined process for handling the events. These reasons lead to the teams not responding quickly enough, or in some cases not responding at all, with big events getting lost in the noise. It is important to recognize when these teams are having difficulty responding to events in a timely manner, especially when there are multiple teams that collaborate with each other.
Lack of relevant information, leading to multiple teams responding to same issue
Many security teams spend a large bulk of their time searching for relevant information to appropriately respond to an incident. Many times, multiple teams receive the same event, and work on the event in parallel with other teams, not knowing what the other team is doing. Building a security environment that has centralized reporting and monitoring, centralized case management, and a mandate from the executive level to communicate with other teams really allow the security team to thrive. Adding automated processes to those teams just puts more time back into the analysts workflow, augmenting their skills to respond to the event in a more efficient manner.
Lack of documented processes
All companies have a process for handling events, whether that is just having an analysts “fix” it, or a detailed workflow diagram that is followed religiously. Reliability is the biggest key for security processes, can they be completed over and over, the same way every time. When security teams do not have a reliable, documented process, it leads to having gaps in your ability to handle the events. If analysts handle events on their own, it is often found that those analysts spend more time for each event, leading to other events potentially falling through the cracks, or the same event not being handled the same way the next time. Another key for processes is they must be just flexible to change, but those changes need to be documented or version controlled as the business needs change.
Designing Processes for Automation
With a solid understanding of the risks, the common issues, and the frameworks that help build the case for automating a task to help better align with a business objective, the foundation is set to allow automation to thrive. Beginning an automation process without these key areas is automating for the wrong reasons, and generally leads to homegrown scripts and applications that require more maintenance than benefit. When starting to design an automated process, some key areas to have mapped out:
- What system(s) is the process targeting?
- What level of human interaction is wanted?
- Reporting success/failures of the process
What system(s) is the process targeting?
Knowing what system(s) to target with a process is vital to designing the process for automation. This allows for boundaries to be set for the automation to work within, keeping it from moving beyond its intended need. Knowing what system(s) also provides a better understanding of what information you will need to gather to interact with those system(s).
What level of human interaction is wanted?
Automation is not designed to replace your security team, only to augment them. Identifying key areas within the process for a human interaction to either approve the workflow to the next step, requiring someone to input a particular device target, or having someone audit the workflow once it has finished before pushing back into production.
Reporting success/failures of the process
After building simple or elaborate processes that automation can implement and really assist your security team, there has to be a way to measure the outcomes to map back into the overall security posture of the organization. By taking the automation journey this far, adding those measures back into the overall risk score for the organization allow for continuity in your security program and its risk posture.
With these areas mapped out, a documented workflow can be automated just by filling in the holes in the workflow with system info and/or adding an analyst approval step. These documented workflows that are being automated allow your team to spend less time getting information and more time responding to the incidents at hand. Don’t have a documented workflow? At Kudelski Security, we have built numerous custom workflows for customers after answering the above questions. Our team thrives on building effective processes for challenging but repetitive tasks, allowing your security team to focus on protecting your business.
In the last part of this series, we look at taking security automation to the next level, improving playbooks, and bringing multiple assets into one workflow to improve overall security efficiency.
Security automation takes planning for success. At Kudelski Security, we have assisted many clients through a variety of use cases and integrations with automation. In many of these cases, clients begin with a broad strategy of “Let’s Automate.” This is a fantastic strategy to have from leadership, however it is difficult to get that strategy moving in a positive direction without some tactical goals. In the first part of this series, the discussion will be around how Kudelski Security has been successful with our current clients, mainly with understanding these three things:
First let’s look at common problems companies try and address using automation.
A Lack Of Experienced Personnel
How do you do more with less? The entire cybersecurity industry is dealing with a lack of qualified and experienced personnel. This leads to security teams spending the majority of their time responding to incidents, not being able spend time developing and documenting automated processes that can protect business goals. Even with the growth in DevOps culture, finding qualified security-focused individuals that understand SOC operations and can help security teams automate their processes are still in very short supply.
Along with the shortage of qualified security personnel with automation, and the shortage of automation developers with security skills, is the challenge of shifting the culture of security teams and programs to embrace automation. Also, many times automation is perceived as a threat to a security team member and their job security, and can work to derail automation strategies to reduce the perceived job impact.
How do you reduce the noise to get the real issues in front of experienced security analysts? With enterprises having 50+ security vendors on average, the noise generated from these products puts unneeded pressure on security analysts to decide which alert requires more attention. This leads to alerts that may be really important falling through the cracks more often, increasing the probability of a larger event happening.
Moving Past Short-Term Needs
How do you shift workloads from teams that are already over-committed? With the lack of experienced personnel, and sifting through the alert overload, the largest problem is how to allocate resources away from running day-to-day operations, to work through the tactics of the automation strategy. These resources may currently automate to facilitate some short-term needs or projects, but require leadership to be on board with having an automation strategy and plan to allocate the needed resources.
After defining the problem, it is key to understand how bad that problem is in your environment. Here are two facets to that challenge.
One approach to problem scope is to analyze metrics that are already in place. For example, use ticket resolution metrics to determine how much time and resources are being spent on specific tasks. If usable metrics are not available, you can rely on managers and operations team leads to identify the pain points and bottlenecks within their groups. Questions that can help understand the impact with automation include:
- How long does this process take?
- How often does this process happen?
- How many resources does this process require?
- How many technologies are required to be utilized?
Dealing With Misconceptions
How do you keep automation in check? Many security programs deal with the misconception that automation is designed to replace the human factor in their environment. Anxiousness from leaders and employees who believe that automation will replace their work, can have a less energetic mindset to allowing automation to work in their environment. Automation is designed to play a supporting role in your environment, with employees developing the mindset that automation can:
- Reduce the time spent on repeatable tasks
- Increase ability to accurately log and collect metrics
- Augment, not replace, current employees workflows
Security Program Maturity
After identifying and quantifying the problem(s) in your environment, you need to decide if your security program is mature enough for automation. It’s critical to build automation on a solid foundation in order to succeed. Frameworks allow a collection of ideas to organize thoughts, strategies, and tactics to build that foundation upon. If your security program already has a solid foundation, taking another look at the framework that it is built on with the viewpoint of automation. Using the Capability Maturity Model Integration (CMMI) model as an example, here are a few questions that should be answered to allow automation to thrive:
- Are there defined, repeatable processes, that are now being handled manually?
- Are the security and business objectives mapped and aligned?
- Are the security processes being monitored?
Applying The Framework
Applying the CMMI to security programs, the figure below represents the levels of maturity of security programs and illustrates the levels in a security program’s lifecycle. For example, a newly minted security program initiative may lack defined processes that can be designed from the onset, easily transitioning to automated processes. With security programs that are already established, the figure illustrates ways to ensure that automation can thrive. As an example, if there are already defined processes and workflows, how to take those, and properly monitor and measure those process. This allows proper insight into the value that automation can provide, and how the processes can help align the security program with business objectives.
In the next part of this series, we will look at how to build a cybersecurity program to thrive with automation, and provide a more efficient security team, able to handle more requests and drive the security strategy for the organization.
October is Cybersecurity Awareness Month, a time traditionally focused on empowering individuals and organizations to adopt more safer practices online. But October should also provide a moment for honest reflection among the professional security community about what is – and isn’t – working in our security arsenals. IT teams are being asked to do more with less, and security concerns can become squeezed in the process. Find out more about how our customizable Automation & Orchestration solutions can help streamline your IT and security operations, while reinforcing security at the same time.