Many organizations seem to be suffering from alert fatigue. In a recent EMA report, according to Infosecurity, 80% of organizations that receive 500 or more severe/critical alerts per day, happen to investigate less than 1% of them. A shocking number to say the least! But what are the obstacles organizations are facing that allows such neglect?
From the EMA report, we can conclude that organizations face four major issues when it comes down to their ability to tackle these severe/ critical alerts.
Issues Organizations Face
Recent surveys from the EMA report indicate that 92% of organizations receive up to 500 alerts a day. From all the organizations that took part in the survey, 88% said they receive up to 500 “critical” or “severe” alerts per day. Yet, 93% of those respondents would rate their endpoint prevention program as “competent”, “strong”, or even as “very strong”. So there either seems to be a big gap between perception and reality or alerts that are considered to be “severe” or “critical” should not be categorized as such. Either way alert management does not seem to be representative.
Even if organizations have detection systems in place that create massive alert volumes, what they often lack is human resources to manage the alerts. Organizations are clearly dealing with a large capacity gap. Of the surveyed organizations that receive 500 to 900 severe/critical alerts per day, 60% have only 3-5 FTE’s working on the alerts.
On top of that, 67% of those surveyed indicate that only 10 or fewer sever/critical alerts are investigated per day and 87% of the participants told that their teams have the capacity to only investigate 25 or fewer severe/critical events per day. For most of the participants the alert volumes are high, however, the resources at their disposal are critically low. As a result, less than 1% of the incidents end up being investigated.
The research assumes a need for prioritization and classification into severe/critical buckets, which is understandable given the traditional, manual approach to Incident Response.
“In truth, any prioritization is a compromise, and the act of classifying by priority is merely a justification to ignore alerts.”
However, in doing so, the numbers are even worse and new questions arise. If less than 1% of severe/critical alerts are ever investigated, what percent of all alerts are investigated? What percentage of alerts are incorrectly categorized and how many alerts are classified as benign and ignored completely, yet warrant follow-up?
In truth, any prioritization is a compromise, and the act of classifying by priority is merely a justification to ignore alerts.
The three prior problems seem to indicate a substandard, broken incident response process. If there are too many alerts to investigate, but not nearly enough people to follow-up and the need to classify all alerts is maintained. All of this just to be able to act on less than 1% of the total number of alerts. However, 92% of respondents indicated that their Incident Response programs for endpoint incidents were “competent” or better.
The only way this makes sense is if respondents felt that when their Incident Response teams were finally able to actually take action on the small percentage of alerts that get to this point and they were successful in addressing the issue.
Detailed analysis showed that in aggregate 80% of the organizations were only able to investigate 11 to 25 events per day, leaving them a huge, and frankly insurmountable, daily gap.
Either due to a lack of tools to collect data or a lack of tools with the ability to analyze data, this issue is created by a lack of high-fidelity security information.
Information isn’t the problem. This and similar surveys show the depth and breadth of the problem facing cybersecurity teams today. However, simply gathering more information to hand off to analysts isn’t the answer.
Automation is a key aspect of creating an effective and mature security program. It improves productivity and, given the lack of staff and the abundance of incidents in most organizations, automation should be a priority in the evolution of prevention and detection.
“Automation is the answer!”
When asked about automation of tasks such as data capture and/or analysis as they related to prevention, detection, and response for both network and endpoint security programs, 85% of the respondents said it was either important or very important.
Thus the only viable approach to the increase in alerts and scarcity of capacity is to use security orchestration and automation tools to:
Automatically investigate every alert as an alternative to prioritizing alerts to match capacity, use a solution to investigate every alert.
Gather additional context from other systems by automating the collection of contextual information from other network detection systems, logs, etc.
Exonerate or incriminate threats by using both known threat information and by inspection, decide whether what was detected is benign or malicious.
Automate the remediation process, once a verdict has been made, automatically remediate (quarantine a file, kill a process, shut down a CNC connection, etc.).
While we’re biased, this approach is the only way.
Hexadite, the only agentless intelligent security orchestration and automation platform for Global 2000 companies also states that automation is the only real answer by saying “it is impossible for organizations to hire enough people to create an adequate context for the data – and thus provide high fidelity security information.”
“Less Than 1% of Severe/Critical Security Alerts Are Ever Investigated” By Tara Seals for InfoSecurityMagazine.com, Retrieved April 8, 2018.
“White Paper: EMA Report Summary: Achieving High-Fidelity Security” EMA Research, Retrieved April 8, 2018.
Building up to the release of System Center 2016 we wrote multiplearticles and provided you with a few webcasts (here & here) to help you prepare your transition from System Center 2012 to System Center 2016 and how to design your environment according to best practises. You probably have taken your time to evaluate and plan your next steps for this purpose or maybe you’re not there yet. We at OpsLogix like to provide the System Center community with top shelf Management Packs & OMS solutions (here & here) and additional information in order to help you find your way through the maze.
That’s why we went and found some fascinating keynote speeches you can (re)watch and maybe rethink or better plan your strategy for System Center 2016. So here’s a nice recap!
1. Meet Windows Server 2016 and System Center 2016!
In this demo-heavy presentation for the unveiling of the brand new Windows Server 2016 and Microsoft System Center 2016, Mike Neil & Erin Chapple walk you through how the business transformation in the cloud-first world – Windows Server and System Center have evolved. To accelerate value and stay ahead of the competition, you need infrastructure that unleashes application innovation and you need your IT to meet increasingly demanding business requirements. Learn about Windows Servers enhanced security capabilities, software-defined data center technologies and application features to build cloud-native applications, and how System Center 2016 streamlines management across your datacenter.
2. Monitor your changing data center using Microsoft System Center 2016 Operations Manager
Rapid innovation in the data center requires an agile approach to management. With System Center 2016 Operations Manager, you can take advantage of all the rich features coming in Windows Server 2016, including Nano Server, new networking capabilities, Storage Spaces Direct, Windows Server Containers, and much more. Operations Manager also offers an improved monitoring experience, with enhanced lifecycle management for management packs, data-driven alert management, and improvements in fundamentals like scale, performance, UI responsiveness, extensibility, and heterogeneity. Hear about the new features and learn how integration between Operations Manager and Operations Management Suite offers insight and analytics across your hybrid cloud.
3. Monitor and diagnose web apps & services with Application Insights & System Center 2016 Operations Manager
Learn how monitoring solutions from Microsoft come together to helps you manage, identify, understand, and resolve problems in your web apps & services, irrespective of the platform. With Application Insights rich open-source SDKs, machine learning analysis, big-data querying, and topology discovery, you can fix issues in your apps even before they occur. Microsoft System Center 2016 Operations Manager helps you collect, analyze, and search millions of records across all your workloads and servers no matter where they are. You can diagnose problems right from within your development environment and incorporate monitoring in your existing ALM workflows.
4. Take advantage of new capabilities in System Center 2016
Microsoft System Center 2016 keeps you in control of your IT operations, giving you enterprise-grade management across Microsoft Windows Server and Linux. With substantial new capabilities to help you manage and monitor your data center and workloads, System Center 2016 offers you the opportunity to simplify management and increase agility. Learn about the provisioning, configuration, automation, monitoring, protection, self-service and virtualization management capabilities in the latest version of System Center. Find out more about how System Center and Microsoft Operations Management Suite work together to give you the management and security tools for the hybrid cloud.
5. Manage your software-defined data center using System Center 2016 Virtual Machine Manager
To make your software-defined data center strategy a reality, you need the right management tools. Get hands-on with the latest version of Microsoft System Center, and find out how to increase efficiency and simplify the process in running your software-defined data center. Microsoft System Center 2016Virtual Machine Manager was designed to make it easier to provision and manage the fundamental building blocks of your software-defined data center, including compute, storage, networks, and security. Learn about the central new features in this session filled with demos.
In this second blog post, I’ll be talking about the monitoring possibilities with Microsoft OMS.
If you missed my first blog post ‘System Center 2016 Operations Manager and Beyond Part 1’, then click here to read it.
Disclaimer: Cloud solutions are always on the move so the details in this blog post are of high level. Updates and features occur regularly, so something which isn’t available today might very well be in preview tomorrow. This is completely different from on-premise solutions where you need to wait until the next wave of updates, just like System Center Operations Manager 2016, which brought you all the new goodies or baddies.
Microsoft OMS is Microsoft’s Operations Management Suite, which means that it does allot more than just monitoring. It’s a management solution for your hybrid cloud solutions, where it manages private and public cloud solutions. At the same time, it’s extending your current solutions. It extends, so it doesn’t replace! This is very important.
“No, it extends your current environments. The structure and mechanism of Microsoft OMS is totally different compared to SCOM.”
Microsoft OMSis a cloud solution. It’s basically a ‘management as-a-service’, where you have all the benefits of a cloud solution, only without an infrastructure, where the latest version, updates, and features are added as time goes by with no interruption.
How does Microsoft OMS work?
Before you start, you can add servers to Microsoft OMS by using two different approaches:
Install the Microsoft OMS agent, this agent communicates directly with the OMS platform, so no other requirements are needed. There is also no correlation with SCOM.
SCOM connected approach:
Connect your SCOM management group to Microsoft OMS, this way the information will be gathered by SCOM and sent to the OMS platform.
The main difference between these two approaches is whether you send the data through SCOM to Microsoft OMS or directly to OMS:
The OMS agent connects to HTTPS, so this can be a reason to choose the SCOM connected approach when your servers cannot access the Internet directly.
High data volume will require directly connected agents into Microsoft OMS, such as security events or wired data.
Another option might be user role-based access. SCOM enables detailed role-based access, meaning that you can scope specific data sets.
There are several areas which make up Microsoft OMS, I will describe each area and their purpose.
What does Microsoft OMS monitor?
Log Analytics (event logs and events)
The first area is Log Analytics where it collects information from your servers and event logs. With the collected data you can run an analysis across the gathered information to track trends, errors and other information. You can extend this by collecting custom data sources like IIS Logs and syslogs for your non-Microsoft solutions or logs. Currently, you can also create alerts based on the collected data. OpsLogix has created great Microsoft OMS Log Analytics Solutions to monitor your Oracle & VMwareenvironment.
Performance Data Collection
This area is focused on collecting performance data from Windows performance counters. It collects predefined counters and sends them to Microsoft OMS for further inspection or analysis. This will deliver a solution to predict trends from your collected performance data. You can generate alerts based on the data gathered. For example, you can send an alert when the performance reaches x.
Security and Audit logs
Security and audit logs collection will collect your security logs. You can compare this to Audit Collection Services (ACS) in SCOM. It collects security events and uses analysis to investigate your data. One of the reasons why ACS has always been a #$%@! to maintain is because of the amount of data it collects. You need one hell of a DB for all the collected data. The cool part about cloud solutions is that this part is covered by Microsoft, so you only need to worry about how you present your data!
Another thing to note is that the OMS solution provides you with allot of data traffic. Sending this data to SCOM first would kill your environment and therefore every Microsoft OMS connected server will always send the information directly to OMS!
“Although Microsoft now takes care of your data, you need to be aware that your OMS bill will be affected by the amount of data. However, this is only a fraction of the costs you make when you’re setting up and maintaining ACSonsite.”
With wired data, you can collect network data and send this to OMS for further analysis. This solution will let you discover patterns in the network communications for further analysis. Again this solution provides you with allot of data traffic and therefore agents using the wired data solution will communicate directly to Microsoft OMS. This solution cannot replace any detailed network traffic analysis tool, but it will provide you with insight on network communications and what processes communicate over the “wire”.
“Although Microsoft now takes care of your data, you need to be aware that your Microsoft OMS bill will be affected by the amount of data. However, again, this is only a fraction of the costs you make when you’re setting up and maintaining ACSonsite.”
This is it you might ask? Well no! But this is a small overview of the data collected by OMS. Microsoft OMS uses “solutions” that you can compare to SCOM Management Packs. A solution provides pre-configured dashboards and data queries to analyze data. These solutions are created by Microsoft to analyze certain components. The solutions are closely connected to data delivered and maintained by Microsoft, just like premier support data on common issues or security and malware data, collected by security teams. This data is completely integrated into these solutions and provides you an easy (almost one-click) access to tons of valuable information.
Below you can see a subset of currently active OMS solutions: Legenda: Available – solutions which you can use currently. Coming – solutions which are still in private preview. Preview – solutions which are currently in public preview. Owned – solutions which are already installed.
There are allot of types of solutions, however, please note that I only explained four types of these solutions, in ‘How does Microsoft OMS work?’
Why? Might you ask? Well like I said before, SCOM is NOT Microsoft OMS and even though this is true, I need to correlate the two, to give my personal view on the future. On top of that, there are several Microsoft OMS solutions which are on OMS, which require extra components. It’s hard to see when looking at the solutions page, but one would need to divide the solutions and note the prerequisites in a list…hmm wait:
Available solutions (both on premise as in Public Cloud)
AD Assessment: this is the Active Directory Assessment solution where it assesses your AD for common configuration, security and health issues and it will present you with options on how to resolve these issues. For this, you only need Microsoft OMS agents on your Domain Controllers.
Alert Management of your OMS and SCOM alerts: you can create alerts based on performance or error logs. The notifications can be sent to an email address. You’ll optionally need a SCOM or Microsoft OMS agent.
Anti-Malware Assessment: this solution uses Microsoft’s Anti-Malware tools to analyze your system. For this, you need a Microsoft OMS agent & Microsoft Anti-Malware tooling.
Change Tracking: track and analyze configuration changes on your servers. For this, only a Microsoft OMS Agent is required.
Security and Audit: this is the ACS Solution from Microsoft OMS. Only an OMS agent required.
SQL Assessment: this solution gathers information from your SQL Servers and informs you regarding common configuration, security and health issues and it will present you with options on how to resolve these issues. For this, again only a Microsoft OMS agent is required.
System Update Assessment: this solution assesses your server and gives you an overview of the current update status. For this again, only a Microsoft OMS agent is required.
Network Performance Monitor: this solution provides you the ability to monitor and collect network performance data. Only a Microsoft OMS agent required.
Coming soon solutions
Wired Data: this solution provides you with data for analyzing network traffic. Only an OMS agent is required.
Containers: this solution will provide you with information regarding the performance of your containers setup in both private as well as the public cloud. A Microsoft OMS, optionally an Azure subscription is required.
Azure Based Solutions (public Cloud)
Azure Automation: this solution hooks into Azure Automation and shows you the status of Azure Automation. Management and configuration of Azure Automation require an Azure portal. This only provides you with the status overview. An Azure subscription of Azure Automation is required.
Azure Site Recovery: this solution hooks into Azure Site recovery and shows you the status of Azure Site recovery. The Management and configuration of ASR require an Azure portal, this also only provides you with the status overview. An Azure subscription with ASR is required.
Backup: this solution hooks into Azure Backup and shows you the status of Azure Backup. Management and configuration of Azure Backup require an Azure portal, this, again provides you only with the status overview. An Azure Subscription with Azure Backup is required.
Azure Networking analytics: this solution provides you with information regarding your Application Gateway server logs and Network security groups in Azure. An Azure subscription is required.
Key Vault: the key vault hooks into Azure for your Key vault logs. An Azure subscription is required.
Office 365: this solution hooks into Office 365 where it provides you all the Office 365 related data, for example, your user activities. An Office 365 subscription is required.
Service Fabric: this solution will provide you with insight into your service fabric cluster running on Azure. An Azure subscription is required.
Upgrade Analytics: this solution will provide you with information regarding your upgrade strategy. This component requires Microsoft telematics to be activated in Microsoft OMS.
Please note that solutions in preview do not comply with the SLA levels of generally available solutions! More on Microsoft OMS Solutions.
Monitoring examples with Microsoft OMS
There are several blogs that provide examples of how you can leverage Microsoft OMS to monitor components.
Oracle and Microsoft OMS
The first example is from the guys at OpsLogix. I know they have been playing with Microsoft OMS ever since it was called “Atlanta” and they have in-depth knowledge of the workings and customization of Microsoft OMS. They’ve provided a management pack which extends your OpsLogix Oracle management pack into Microsoft OMS!
VMWare with log analytics
This blog shows you how to connect Microsoft OMS to your Vcenter server and collect the VMWare logs into Microsoft OMS.
Tao Yang explains how you can leverage Microsoft OMS to collect your nondefault data into Microsoft OMS by using custom management packs in SCOM:
In this blog I’ve tried to capture the capabilities of Microsoft OMS and to be honest, there is a ton of stuff which you can use in Microsoft OMS and just give it a try. Microsoft OMS is a suite of components for managing your hybrid solutions and provides you a “Single Pane” for several management solutions, both on-premise as well as public. It’s safe to say that if it is logged on a server, Microsoft OMS can pick up the logs and lets you use its analytics services to analyze the data. Microsoft OMS uses metadata and analysis to work with the data collected.
The data collection is almost real time, and when it come to large amounts of data, Microsoft OMS is a perfect solution. Microsoft OMS is a cloud solution, so changes and features are added at a rapid speed which is extending the platform while I’m writing this blog! For more advanced stuff it is wise to dive into analytics, in general, to query your data in a smart way, although Microsoft OMS provides the default ones for you. Microsoft OMS is an extension of your System Center solution, not a replacement!
If you are serious and want to know more please read this excellent white paper by my MVP buddies (StanislavZhelyazkov, Tao Yang, Pete Zerger and Anders Bengtsson).
Are you wondering how the future of SCOM and Microsoft OMS will look like? And why Microsoft OMS will not be the replacement of SCOM? In the third and final blog of this series, I will give you my personal insights on this and explain how it all ties together. So stay on the lookout and we’ll keep you posted!
System Center Operations Manager 2016 (SCOM 2016) is still in its technical preview (TP5).
As Microsoft announced, SCOM 2016 will be launched in September during Microsoft Ignite in Atlanta. This gives you about four weeks’ time to look at how to prepare your environment, design an installation or upgrade, if you are already running a previews version of Operations Manager. By the way, our booth number during Microsoft Ignite is 1972.
When designing your SCOM 2016 environment, there are 12 key components you should care about.
In this blog we’ll guide you through all of them step by step. If after these steps you still feel that you need assistance, feel free to contact us.
Component 1 – Operational Database
First and foremost, there is going to be a central operational database, holding all the database information regarding your environment.
Component 2 – Management Server
The Management Server is a central part of your environment. It houses your configuration, it talks to agents and provides access to the consoles. You can have one Management Server or multiple. These multiple Management Servers together are called a Resource Pool.
Component 3 – RMS Role
The RMS also known as Root Management Server, since SCOM 2012 R2 is a role that has been dynamically placed in the Management Servers and Resource pools. The RMS used to be a separate server in previous versions of SCOM. Component 4 – Console
Your operators will be working with the console within SCOM 2016.
Components 5 & 6 – Data Warehouse Database & Reporting Server
Ever since 2012 it’s a mandatory installation part. Here you get the reporting side of Operations Manager by installing both the database side and the Reporting Server!
Component 7 – Web Console Server
The Web Console Server allows you to access your information through the web.
Component 8 – Agent
You will be running different sorts of agents, such as Windows agents, Linux agents and all kinds of machines providing you with information regarding your environment.
Component 9 – Gateway Server
Depending on your configuration, network or other challenges you might be facing, it’s possible that you might need to add a Gateway Server. A Gateway Server allows you to deal with untrustworthy domains and helps you with latency.
This is probably the least known part of Operations Manager. The Audit collector receives the information, the Audit Database is where the information is written into and there is also the Audit Forwarder, that is a part of Operations Manager’s agents. This infrastructure part is used for collecting security events.
So if you want to maintain the security of your Windows or Linux environment and you have your Operations Manager agents running there, simultaneously you can actually collect the security event and do reporting.
System Center Operations Manager allows 360° monitoring of your infrastructure
System Center Operations Manager allows you to do 360° monitoring of your infrastructure, applications and end-users experiences. It provides you with a clear UI and makes it possible for you to see the same information in a consistent way, presented to you the way that you need it for the role that you have.
There are three levels in System Center Operations Manager 2016 that correspond to Business Service Monitoring (BMS):
Oracle Performance Dashboard
Oracle Auditing Dashboard
Key parts of System Center Operations Manager 2016 are:
Synthetic Transactions - To measure the end-user experience.
DevOps Integration - Here we can actually integrate with TFS and other systems, so that development and operations can work together on one view.
.NET Monitoring with APM helps you to do in-depth application monitoring, the same goes for JAVA (at this point, Microsoft also provides you with Java APM).
Management Packs - Adding or creating your own management packs allows you to monitor even more components within your infrastructure.
Dashboard Framework - Import your own widgets such as State view & Top N view.
Cloud Monitoring - Now there is even cloud monitoring through management packs that can monitor Azure, Office 365 or other services that you might be using today.
Want to learn more? Watch our webcast on What’s new in SCOM 2016 & Microsoft OMS
In this webcast, 10-year Microsoft System Center MVP Maarten Goet will talk you through all the design points for your SCOM 2016 environment. He will talk about both new installations as well as preparing for the upgrade, and will share some of the best practices for your Management Group design. There will be an opportunity for Q&A at the end of the session where you leverage Maarten’s experience for tailoring the design points to your specific environment. This is a webcast that no System Centeradministrator should miss!
Please note that registrations are limited to 25 people per session, so it’s first come first served!
We’re happy to announce a new update release of our Ping Management Pack V 188.8.131.52 for SCOM 2012/2016. We’ve added a three more Performance Monitors and reworked all the previous monitors in the Management Pack. The post Our Ping Management Pack Just Got An Update – V 184.108.40.206 appeared first on OpsLogix.
At the end of 2017, OpsLogix was a Microsoft key launch partners among Cisco and Xcalar on Managed Applications with the OpsLogix Oracle Managed Application. The post The OpsLogix Oracle Managed Application Mentioned During Microsoft Build 2018 appeared first on OpsLogix.
We’re happy to announce a new update release of our VMware Management Pack for SCOM 2012/2016. We’ve added a lot of new great reporting & datastore capacity functionalities to the Management Pack, improving investigative analysis and giving more insight into the monitoring of your VMware environment. The post New Update: OpsLogix VMware Management Pack V220.127.116.11 […]
Introducing machine learned automation involves short-handed removal of manual processes that can be performed by a machine according to predetermined consistent routines. The post VIDEO: AI For IT-Operations: How To Classify, Train & Escalate Alerts From SCOM appeared first on OpsLogix.
80% of organizations that receive 500 or more severe/critical alerts per day, happen to investigate less than 1% of them. The post Why Are Less Than 1% Of Critical Alerts Investigated? appeared first on OpsLogix.