What’s New With The Latest EZalert Update by OpsLogix?

What’s New With The Latest EZalert Update by OpsLogix?

OpsLogix is happy to announce the latest update of the EZalert solution, version 2. It does a lot more than just closing alerts, in comparison to the first release of the solution.

So What’s New With EZalert?

EZalert is a tool that uses machine learning to help you manage your SCOM environment in a more efficient and effective way, it helps you filter out noisy alerts and lower the total cost of ownership.

When starting out with EZalert, you will have to start “training” EZalert to handle new incoming alerts. By setting the resolution state on new incoming alerts EZalert “learns” what resolution state you would like to set for the same or similar alerts when a new alert is generated.

Eventually, with enough training, EZalert will start predicting with an increasingly higher accuracy what resolution state to set for an alert. When you are confident that EZalert predicts the resolution state for incoming alerts accurately, you can turn on auto apply and EZalert will automatically apply the predicted resolution state to the new incoming alerts in real-time. Not only is it possible to let EZalert set the resolution state on new incoming alerts, but you can also attach a PowerShell script (and use properties of the alert as parameters) to a specific resolution state as an action.

 

Watch The Demo Here

QuickStart Guide

Mandatory
  1.  Configure your resolution states based on where your alerts should be transferred to. This could be different groups within your IT-organization that should work with the alerts.
  2. Train your model with all incoming alerts to match your desired resolution state. Consider closing alerts (255) that you might use for reporting purpose that doesn’t require immediate attention to the cleanup noise.
  3. Don’t train simple alerts with too many entries when the confidence level is high. This will only slow down the training and consume more memory.
  4. Enable auto-apply. We recommend a confidence level above 85% in order to get good results. This will also ensure that alerts that don’t match won’t be forwarded to a trained resolution state.
  5. Use the low confidence filter to find the alerts that need more training and train them until they reach the configured confidence level.
Optional
  1. Pre-actions to set Custom fields for example Management Pack, Operated By – used by the machine learning algorithm.
  2. Post actions Custom fields for a statistical purpose on what resolution state was applied.
VIDEO: AI For IT-Operations: How To Classify, Train & Escalate Alerts From SCOM

VIDEO: AI For IT-Operations: How To Classify, Train & Escalate Alerts From SCOM

WHAT’S IT ALL ABOUT?

The evergrowing amount of devices to be monitored in combination with high availability requirements makes it more urgent to review internal processes.

Introducing machine learned automation involves short-handed removal of manual processes that can be performed by a machine according to predetermined consistent routines.

In this webinar you will get an introduction and real world scenario how to:

  • Use pre-actions to classify and enrich your alert data
  • Train a machine learning model
  • Escalate to different channels depending on the predicted destination
  • Integration to ServiceNow with a bi-directional connector
  • Tag and analyze your escalated alerts
Why Are Less Than 1% Of Critical Alerts Investigated?

Why Are Less Than 1% Of Critical Alerts Investigated?

Many organizations seem to be suffering from alert fatigue. In a recent EMA report, according to Infosecurity, 80% of organizations that receive 500 or more severe/critical alerts per day, happen to investigate less than 1% of them. A shocking number to say the least! But what are the obstacles organizations are facing that allows such neglect?

From the EMA report, we can conclude that organizations face four major issues when it comes down to their ability to tackle these severe/ critical alerts.

 

 

Issues Organizations Face

Alert Volume

Recent surveys from the EMA report indicate that 92% of organizations receive up to 500 alerts a day. From all the organizations that took part in the survey, 88% said they receive up to 500 “critical” or “severe” alerts per day. Yet, 93% of those respondents would rate their endpoint prevention program as “competent”, “strong”, or even as “very strong”. So there either seems to be a big gap between perception and reality or alerts that are considered to be “severe” or “critical” should not be categorized as such. Either way alert management does not seem to be representative.

Capacity

Even if organizations have detection systems in place that create massive alert volumes, what they often lack is human resources to manage the alerts. Organizations are clearly dealing with a large capacity gap. Of the surveyed organizations that receive 500 to 900 severe/critical alerts per day, 60% have only 3-5 FTE’s working on the alerts.

On top of that, 67% of those surveyed indicate that only 10 or fewer sever/critical alerts are investigated per day and 87% of the participants told that their teams have the capacity to only investigate 25 or fewer severe/critical events per day. For most of the participants the alert volumes are high, however, the resources at their disposal are critically low. As a result, less than 1% of the incidents end up being investigated.

Priority

The research assumes a need for prioritization and classification into severe/critical buckets, which is understandable given the traditional, manual approach to Incident Response.

“In truth, any prioritization is a compromise, and the act of classifying by priority is merely a justification to ignore alerts.”

However, in doing so, the numbers are even worse and new questions arise. If less than 1% of severe/critical alerts are ever investigated, what percent of all alerts are investigated? What percentage of alerts are incorrectly categorized and how many alerts are classified as benign and ignored completely, yet warrant follow-up?

In truth, any prioritization is a compromise, and the act of classifying by priority is merely a justification to ignore alerts.

Incident Response

The three prior problems seem to indicate a substandard, broken incident response process. If there are too many alerts to investigate, but not nearly enough people to follow-up and the need to classify all alerts is maintained. All of this just to be able to act on less than 1% of the total number of alerts. However, 92% of respondents indicated that their Incident Response programs for endpoint incidents were “competent” or better.

The only way this makes sense is if respondents felt that when their Incident Response teams were finally able to actually take action on the small percentage of alerts that get to this point and they were successful in addressing the issue.

 

 

Conclusions

  • Detailed analysis showed that in aggregate 80% of the organizations were only able to investigate 11 to 25 events per day, leaving them a huge, and frankly insurmountable, daily gap.
  • Either due to a lack of tools to collect data or a lack of tools with the ability to analyze data, this issue is created by a lack of high-fidelity security information.
  • Information isn’t the problem. This and similar surveys show the depth and breadth of the problem facing cybersecurity teams today. However, simply gathering more information to hand off to analysts isn’t the answer.

 

 

The Solution

Automation is a key aspect of creating an effective and mature security program. It improves productivity and, given the lack of staff and the abundance of incidents in most organizations, automation should be a priority in the evolution of prevention and detection.

“Automation is the answer!”

When asked about automation of tasks such as data capture and/or analysis as they related to prevention, detection, and response for both network and endpoint security programs, 85% of the respondents said it was either important or very important.

Thus the only viable approach to the increase in alerts and scarcity of capacity is to use security orchestration and automation tools to:

  • Automatically investigate every alert as an alternative to prioritizing alerts to match capacity, use a solution to investigate every alert.
  • Gather additional context from other systems by automating the collection of contextual information from other network detection systems, logs, etc.
  • Exonerate or incriminate threats by using both known threat information and by inspection, decide whether what was detected is benign or malicious.
  • Automate the remediation process, once a verdict has been made, automatically remediate (quarantine a file, kill a process, shut down a CNC connection, etc.).

While we’re biased, this approach is the only way.

Hexadite, the only agentless intelligent security orchestration and automation platform for Global 2000 companies also states that automation is the only real answer by saying “it is impossible for organizations to hire enough people to create an adequate context for the data – and thus provide high fidelity security information.”

 

 

 

References

  • “Less Than 1% of Severe/Critical Security Alerts Are Ever Investigated” By Tara Seals for InfoSecurityMagazine.com, Retrieved April 8, 2018.
  • “White Paper: EMA Report Summary: Achieving High-Fidelity Security” EMA Research, Retrieved April 8, 2018.

 

Join Our Webcast With Approved: AI For IT-Operations: How To Classify, Train & Escalate Alerts From SCOM

Join Our Webcast With Approved: AI For IT-Operations: How To Classify, Train & Escalate Alerts From SCOM

WHAT’S IT ALL ABOUT?

The evergrowing amount of devices to be monitored in combination with high availability requirements makes it more urgent to review internal processes.

Introducing machine learned automation involves short-handed removal of manual processes that can be performed by a machine according to predetermined consistent routines.

In this webinar you will get an introduction and real world scenario how to:

  • Use pre-actions to classify and enrich your alert data
  • Train a machine learning model
  • Escalate to different channels depending on the predicted destination
  • Integration to ServiceNow with a bi-directional connector
  • Tag and analyze your escalated alerts

WHEN?

 

WEDNESDAY 4TH OF APRIL 2018

 

1st session

  • Amsterdam (Netherlands) 10:00 CEST
  • New York (USA – New York) 04:00 EDT
  • London (United Kingdom – England) 09:00 BST
  • Melbourne (Australia – Victoria) 18:00 AEST

2nd session

  • Amsterdam (Netherlands) 19:00 CEST
  • New York (USA – New York) 13:00 EDT
  • London (United Kingdom – England) 18:00 BST

Melbourne (Australia – Victoria) 03:00 AEST

PLEASE NOTE, ONLY 25 SPOTS PER SESSION. FIRST COME FIRST SERVE!

CONQUER YOUR SPOT NOW! CLICK HERE

 

3 Reasons To Implement Automation & Machine Learning For IT-Operations

3 Reasons To Implement Automation & Machine Learning For IT-Operations

A guest blog by Jonas Lenntun from Approved Sweden.

Clearly, we’ll automate!

Automation and efficiency go hand in hand and is something that has been mentioned in IT since the 70’s. Nevertheless, 40 years on, and the majority of companies still have to internalize and embrace automated processes.

The growing amount of devices to be monitored in combination with higher availability requirements makes it more urgent to review their internal processes. Especially when digitization is introduced with more and more critical e-services that are expected to be available 24 hours a day.

Introducing automation involves short-handed removal of manual processes that can easily be performed by a machine according to predetermined routines – in a shorter and the same way, each time.

Some processes have already come a long way in this. Among other things, orders of equipment, user setup or server update, along with a lot of administrative work.

At the IT department, there are three interesting areas with high potential to automate manual processes to become more efficient, reduce shorter lead times and reduce repetitive work.

What can machine learning add?

Machine learning has previously been perceived as not directly relevant to traditional monitoring and incident management. But more and more people realize that it is a matter of highest relevance to simplify everyday life, in every aspect.

Instead of manually escalating incidents or sending out notifications to readiness through complex and blunt regulations, machine learning can be applied.

We can relatively easily train a machine to automatically identify patterns and then perform the actions we want in a very short time.

We have already begun with automation.

Most likely, you have already begun implementing automation in several areas. Since automation is such a wide-ranging area, this article focuses on activities that increase the value of what the monitoring delivers and is more relevant to you in IT operations.

Three important automation areas

Escalation

At first sight, escalation is considered a rather simple process to automate. However, the more complex the rules are for different types of alarms to be distributed to different groups, depending on certain criteria, the more difficult it will be to easily control these rights through a static regulatory framework.

Instead of building complex script or programs, you can instead look at an alarm and train where to send. How it then comes to the conclusion is where machine learning comes in its right place. It finds patterns we did not know.

Large time savings can be made by shortening the processing time due to the fact that the cases are sent to the correct grouping without having to wait for a manual decision.

Recovery

Many errors that occur at the operating system level or around inadvertently stopped services can be easily reset.

Even though it is possible to configure it on a Windows service to start up if it is stopped, it is better to allow a monitoring system to capture the error. Since a monitoring system can both restore and maintain statistics, it will be easier to monitor any recurring interference. These statistics also provide a good basis for the problem process with the supplier – the dialogue is based on data instead of rumors and empathy.

Many restorations need to be clearly defined, but there is also the possibility to train a model that learns which rescues are to run in order to minimize complexity through machine learning.

Diagnostics

Many errors that occur may be difficult to automatically reset, but this does not mean we should exclude automation.

If a disc indicates that it is running out of space, then the human factor may be needed to determine what can be cleaned. But that does not prevent us from collecting diagnostic information of the person who will be performing the task.

Automation of diagnostics can be to look at which of the largest directories contain the largest files, or to insert a graph of disk usage into the analysis process.

Here too we can use Machine Learning to determine what to run or not.

How do we show results?

Introducing automation and machine learning in IT operations has many advantages. Since many things happen without anyone even discovering it, follow-up is one of the most important parts to improve results after the introduction.

There are many important key figures to look for before and after the introduction, but the most important thing is of course “Mean Time To Repair”, shortened MTTR. In short, the time it takes for the alarm to be resolved and closed.

Because we can divide  automation into three different categories, we can measure:

  • Recovery time overall on the alarms that are automated compared to those that are not
  • Automation degree overall – What is the percentage of alarms automated
  • Automation rate per queue – What is the percentage of alarms automated per destination
  • Recovery time of automatically escalated alarms compared to those done manually
  • Recovery time per escalated destination
  • Recovery time of automatic reset compared to manual handling
  • Recovery time of automated diagnostics compared to manual handling

These are just a few key figures that have a great effect in detecting the results of automation and machine learning.

Below you will find an example of the Approved operational analysis tool “IT Service Analytics” (in Swedish) which, with data from the Microsoft System Center Operations Manager, can show results after the introduction of automation.

 

Summary

Automation of IT operations is a topic that can not be ignored if you don’t want to risk getting lost. The challenge at first is to decide how and where to start. Building down and up and analyzing where to put the effort is a common tactic. With automation, basically, you suddenly get action that runs 24/7 on all your deliveries, reducing the need for emergency preparedness.

We hope you had a good introduction to why you just need to look at automation and machine learning in your organization.`

For more information, have a look at Approved’s concept of Digital Operations or email us at info@opslogix.com.

How To Train EZalert For Optimal SCOM Alert Automation

How To Train EZalert For Optimal SCOM Alert Automation

In this post, we’ll show you how to start training EZalert in order for the machine learning software to learn your SCOM alert handling behavior for optimal automation.

When EZalert has been freshly installed and is untrained, its default behavior is to ignore all alerts in Operations Manager. Because EZalert is untrained it is unable to predict what state you would assign to a SCOM alert, so the default “Suggested State” on the “Training” tab is “Unable to predict” as shown below.

 

 

 

 

 

 

 

 

Training an Alert State

On the training tab, all the active alerts in Operations Manager are listed. To start training EZalert right-click the alert and then from the context menu click “Train State as” and then click the state you would usually assign to the alert for example “Closed”. After doing so an “apply States” dialog box appears. This box enables you to apply the training state in operations manager immediately, setting the state of the alert to “Closed” if “yes” is clicked. If you click “no”, the alert will remain unchanged in operations manager while EZalert is being trained.

 

 

 

 

 

 

 

 

 

 

You can also train EZalert to apply other resolutions states to the open SCOM alerts in Operations Manager. To do this, you do not select closed as we previously did. Instead, you select a different state from the “Train State As” context menu, for example, “Resolved”.

Assigning the resolution state “New” will cause EZalert to leave a particular new incoming alert in the resolution state “New”. Assigning this state might seem counterintuitive at first, but EZalert needs to be trained to know which Alerts should be left open and remain in the “New” resolution state.

Once you have started training EZalert, you will notice that new incoming alerts will have a suggested state. If the suggested state for a particular alert is not the state you would have expected, you can go through the same cycle and assign the desired state for that particular alert. Do this by using the context menu again.

Please note that during a training cycle the suggested resolution state is never applied to the incoming alerts, this is only done when we set “Enable Auto Apply” on the “Settings” tab.

After the training cycle, assuming you are satisfied that the suggested state for the alert is correct, you can set EZalert to automatic by clicking the settings tab, selecting the “Enable Auto Apply” checkbox and clicking Apply.

Retraining a SCOM Alert

After “Enable Auto Apply” has been set in the “Settings” tab, the “Training” tab will be disabled. You can click on the history tab in order to keep an eye on what state EZalert is automatically applying to the new alerts that are coming in. In the history tab, a log is kept for the state that is applied to each alert. If EZalert learned the wrong behavior and applied the wrong state to an alert, it can be corrected. To do this select the alert with the wrong state, right click it and set the correct state by selecting “Retrain State As”. When you repeat this cycle, EZalert will learn and become increasingly accurate over time.

 

 

 

 

 

 

 

 

 

Watch the how to use EZalert video to learn how to manage your SCOM alerts more easily!

Also, make sure to read the following blog by MVP Tao Yang