3 Reasons To Implement Automation & Machine Learning For IT-Operations

3 Reasons To Implement Automation & Machine Learning For IT-Operations

A guest blog by Jonas Lenntun from Approved Sweden.

Clearly, we’ll automate!

Automation and efficiency go hand in hand and is something that has been mentioned in IT since the 70’s. Nevertheless, 40 years on, and the majority of companies still have to internalize and embrace automated processes.

The growing amount of devices to be monitored in combination with higher availability requirements makes it more urgent to review their internal processes. Especially when digitization is introduced with more and more critical e-services that are expected to be available 24 hours a day.

Introducing automation involves short-handed removal of manual processes that can easily be performed by a machine according to predetermined routines – in a shorter and the same way, each time.

Some processes have already come a long way in this. Among other things, orders of equipment, user setup or server update, along with a lot of administrative work.

At the IT department, there are three interesting areas with high potential to automate manual processes to become more efficient, reduce shorter lead times and reduce repetitive work.

What can machine learning add?

Machine learning has previously been perceived as not directly relevant to traditional monitoring and incident management. But more and more people realize that it is a matter of highest relevance to simplify everyday life, in every aspect.

Instead of manually escalating incidents or sending out notifications to readiness through complex and blunt regulations, machine learning can be applied.

We can relatively easily train a machine to automatically identify patterns and then perform the actions we want in a very short time.

We have already begun with automation.

Most likely, you have already begun implementing automation in several areas. Since automation is such a wide-ranging area, this article focuses on activities that increase the value of what the monitoring delivers and is more relevant to you in IT operations.

Three important automation areas

Escalation

At first sight, escalation is considered a rather simple process to automate. However, the more complex the rules are for different types of alarms to be distributed to different groups, depending on certain criteria, the more difficult it will be to easily control these rights through a static regulatory framework.

Instead of building complex script or programs, you can instead look at an alarm and train where to send. How it then comes to the conclusion is where machine learning comes in its right place. It finds patterns we did not know.

Large time savings can be made by shortening the processing time due to the fact that the cases are sent to the correct grouping without having to wait for a manual decision.

Recovery

Many errors that occur at the operating system level or around inadvertently stopped services can be easily reset.

Even though it is possible to configure it on a Windows service to start up if it is stopped, it is better to allow a monitoring system to capture the error. Since a monitoring system can both restore and maintain statistics, it will be easier to monitor any recurring interference. These statistics also provide a good basis for the problem process with the supplier – the dialogue is based on data instead of rumors and empathy.

Many restorations need to be clearly defined, but there is also the possibility to train a model that learns which rescues are to run in order to minimize complexity through machine learning.

Diagnostics

Many errors that occur may be difficult to automatically reset, but this does not mean we should exclude automation.

If a disc indicates that it is running out of space, then the human factor may be needed to determine what can be cleaned. But that does not prevent us from collecting diagnostic information of the person who will be performing the task.

Automation of diagnostics can be to look at which of the largest directories contain the largest files, or to insert a graph of disk usage into the analysis process.

Here too we can use Machine Learning to determine what to run or not.

How do we show results?

Introducing automation and machine learning in IT operations has many advantages. Since many things happen without anyone even discovering it, follow-up is one of the most important parts to improve results after the introduction.

There are many important key figures to look for before and after the introduction, but the most important thing is of course “Mean Time To Repair”, shortened MTTR. In short, the time it takes for the alarm to be resolved and closed.

Because we can divide  automation into three different categories, we can measure:

  • Recovery time overall on the alarms that are automated compared to those that are not
  • Automation degree overall – What is the percentage of alarms automated
  • Automation rate per queue – What is the percentage of alarms automated per destination
  • Recovery time of automatically escalated alarms compared to those done manually
  • Recovery time per escalated destination
  • Recovery time of automatic reset compared to manual handling
  • Recovery time of automated diagnostics compared to manual handling

These are just a few key figures that have a great effect in detecting the results of automation and machine learning.

Below you will find an example of the Approved operational analysis tool “IT Service Analytics” (in Swedish) which, with data from the Microsoft System Center Operations Manager, can show results after the introduction of automation.

 

Summary

Automation of IT operations is a topic that can not be ignored if you don’t want to risk getting lost. The challenge at first is to decide how and where to start. Building down and up and analyzing where to put the effort is a common tactic. With automation, basically, you suddenly get action that runs 24/7 on all your deliveries, reducing the need for emergency preparedness.

We hope you had a good introduction to why you just need to look at automation and machine learning in your organization.`

For more information, have a look at Approved’s concept of Digital Operations or email us at info@opslogix.com.

20% Discount On Our Capacity Reports Management Pack

20% Discount On Our Capacity Reports Management Pack

IT’S REPORTING SEASON!

Since it’s reporting season again, we’re offering a 20% discount, valid until March 15, 2018 on our Capacity Reports Management Pack!

Our Capcity Management Pack accesses the OpsMgr data warehouse and forecasts a scenario of a set of selected objects based on usage.

All OpsLogix products are native to Operations Manager 2012 & 2016 and fully integrate into the System Center IT infrastructure.

BUY IT TODAY

You can’t access the UNIX/Linux computers view in the Administration pane in Microsoft System Center 2012 R2 Operations Manager?

You can’t access the UNIX/Linux computers view in the Administration pane in Microsoft System Center 2012 R2 Operations Manager?

If you can’t access the UNIX/Linux computers view in the Administration pane in Microsoft System Center 2012 R2 Operations Manager, then you probably receive the following error message:

 

Date: 12/30/2017 7:48:49 PM Application: Operations Manager Application Version: 7.1.10226.1360 Severity: Error Message: System.NullReferenceException: Object reference not set to an instance of an object. at Microsoft.SystemCenter.CrossPlatform.UI.OM.Integration.UnixComputerOperatingSystemHelper.JoinCollections(IEnumerable`1 managementServers, IEnumerable`1 resourcePools, IEnumerable`1 unixcomputers, IEnumerable`1 operatingSystems) at Microsoft.SystemCenter.CrossPlatform.UI.OM.Integration.UnixComputerOperatingSystemHelper.GetUnixComputerOperatingSystemInstances(String criteria) at Microsoft.SystemCenter.CrossPlatform.UI.OM.Integration.Administration.UnixAgentQuery.DoQuery(String criteria) at Microsoft.EnterpriseManagement.Mom.Internal.UI.Cache.Query`1.DoQuery(String criteria, Nullable`1 lastModified) at Microsoft.EnterpriseManagement.Mom.Internal.UI.Cache.Query`1.FullUpdateQuery(CacheSession session, IndexTable& indexTable, Boolean forceUpdate, DateTime queryTime) at Microsoft.EnterpriseManagement.Mom.Internal.UI.Cache.Query`1.InternalSyncQuery(CacheSession session, IndexTable indexTable, UpdateReason reason, UpdateType updateType) at Microsoft.EnterpriseManagement.Mom.Internal.UI.Cache.Query`1.InternalQuery(CacheSession session, UpdateReason reason) at Microsoft.EnterpriseManagement.Mom.Internal.UI.Cache.Query`1.TryDoQuery(UpdateReason reason, CacheSession session) at Microsoft.EnterpriseManagement.Mom.Internal.UI.Console.ConsoleJobExceptionHandler.ExecuteJob(IComponent component, EventHandler`1 job, Object sender, ConsoleJobEventArgs args)

 

Cause

The issue occurs if the UNIX/Linux monitoring resource pool is deleted

How to solve it!

To resolve the issue, follow these steps:

  1. Create a resource pool for UNIX/Linux monitoring. Give the new pool a different name than the name of the deleted resource pool.
  2. Add the management servers that perform UNIX/Linux monitoring to the new resource pool.
  3. Configure the UNIX/Linux Run As accounts to be distributed by the new resource pool. To do this, follow these steps:
    • In the Operations console, go to Administration Run As Configuration > UNIX/Linux Accounts.
    • For each account, follow these steps:
      – Right-click the account, and then select Properties.
      –  On the Distribution Security page of the UNIX/Linux Run As Accounts Wizard, select More Secure.
      –  In Selected computers and resource pools, select Add.
      –  Select Search by resource pool name, and then select Search.
      –  Select the new resource pool that is created in step 1, select Add, and then select OK.
  4. Run the following PowerShell cmdlet to retrieve the managed UNIX and Linux computers:
    Get-SCXAgent
  5. Verify that the agents that are associated with the deleted resource pool still exist and that the relationship remains.
  6. Run the following command to change the managing resource pool to the one that is created in step 1:

    $SCXPool = Get-SCOMResourcePool -DisplayName "<New Resource Pool Name>"
    Get-SCXAgent | Set-SCXResourcePool -ResourcePool $SCXPool

Original article.
Education In SCOM Management Pack Development

Education In SCOM Management Pack Development

In May, our own Vincent de Vries from OpsLogix will visit Sweden to host a 2-day training in Management Pack development together with Jonas Lenntun from Approved. As in previous years, the education event will be held at the Radisson Blu hotel at Lindholmen in Gothenburg. Vincent has over 10 years of experience in developing Management Pack for SCOM and is one of the most prominent developers in the System Center community.

Vincent de Vries, SCOM DAY 2017 Vincent de Vries, SCOM DAY 2017

The participants will have the opportunity to get tips and tricks from real-world scenarios and learn how SCOM works behind the scenes. When you have a deeper understanding of how Management Packs work, you’ll be better equipped to trim it in the best possible way.

Would you like for us to organize a similar event in Amsterdam?

Fill in the Quick Yes/No Poll below!

After this course, you will learn the most important basic skills to get started with Management Pack development. The training is for those who already work as a SCOM administrator or IT developers who want to start building their own monitoring tools for their organization(s). The course will be held in English.

I'd like to join the Amsterdam Event. Please keep me informed.

14 + 14 =

VMware Monitoring For Service Providers: Local Customer Setup For SCOM 2012 & 2016

VMware Monitoring For Service Providers: Local Customer Setup For SCOM 2012 & 2016

In a series of VMware “How To” videos, released weekly, we’ll be showing you how to set up your VMware Monitoring for Service Providers.

In this video, we show you how to do the Local Customer Setup.

Want to try our Management Pack? Got to the VMware Management Pack page and fill in the contact form. Or drop us an email at sales@opslogix.com 

Using The OpsLogix Oracle Two-State Monitor Template

Using The OpsLogix Oracle Two-State Monitor Template

Although the OpsLogix Oracle Management Pack covers the most common availability, health and performance metrics that are important to an Oracle environment, occasionally you might need other metrics to suit your particular monitoring needs. With this in mind, the OpsLogix Oracle Management Pack contains templates that allow you to add custom monitors and rules in order to monitor your Oracle environment.

OpsLogix Oracle Two-State Monitor Template

One of those templates is the OpsLogix Oracle Two-State Monitor Template, allowing you to create a Rule that checks values from your Oracle environment and generates alerts in an event when a value is detected or missing, depending on the configuration you’ve specified.

Step 1

After you have clicked Create, you can then check your target’s Health Explorer on your monitor, shown under Performance.

 

 

 

 

Step 2

Select the Oracle Two-State Monitor Rule from the Select the monitoring type list.

 

Step 3

Set a name and a description for your Two-State Monitor.

 

Step 4

Write the query to be executed and pick the column name of the value you would like to monitor. Then pick the target of your monitor and the frequency of monitoring.

 

Step 5

Select if you want the monitor to alert Warning or Critical when it discovers a value Greater, Greater or Equal, Less or Equal or Less than a specified Value.

 

Step 6

Finally, configure the alert details. Choose an alert name, a description, the priority of the alert and the severity. To create the Two-State Monitor press Create.

 

Step 7

After you have clicked Create, you can then check your target’s Health Explorer on your monitor, shown under Performance.

For any other questions or inquiries, please contact sales@opslogix.com