We’re Going To Vegas! Come Meet Us At Microsoft Inspire 2018

We’re Going To Vegas! Come Meet Us At Microsoft Inspire 2018

We’re happy to announce that OpsLogix will be an exhibitor at the Microsoft Inspire 2018 event in Las Vegas! The OpsLogix team will be in Las Vegas in the period of June 13th – June 20th, so there may be some delay in response to your queries.

Booth 1838

Our boothnumber is 1838. Come by! Say hi!

 

Team OpsLogix

 

 

 

Our Ping Management Pack Just Got An Update – V 3.0.14.0

Our Ping Management Pack Just Got An Update – V 3.0.14.0

We’re happy to announce a new update release of our Ping Management Pack V 3.0.14.0 for SCOM 2012/2016. We’ve added three more Performance Monitors and reworked all the previous monitors in the Management Pack.

The full list of the changes we’ve made to the Ping Management Pack V 3.0.14.0:

  • Replaced the WMI with a managed module to be able to handle a larger load
  • Added a performance monitor for average jitter
  • Added a performance monitor for average latency
  •  Added A performance monitor for average packet loss
  •  Reworked all monitors so the TTL, Payload, and number of an averaging point can be configured
  •  Improved configuration UI

Update Instructions

  1. Go the Ping Management Pack Product Page & download the Management Pack.
  2. Import the updated management packs.
  3. When imported wait +/- 10 minutes to get the updated MPs distributed in your SCOM environment.
  4. Restart the SCOM agent
  5. If you still don’t see the dashboard please follow the instructions here.

Team OpsLogix

The OpsLogix Oracle Managed Application Mentioned During Microsoft Build 2018

The OpsLogix Oracle Managed Application Mentioned During Microsoft Build 2018

During their Microsoft Build 2018 session Making production deployments safe and repeatable using declarative infrastructure and Azure Resource Manager, Vlad Joanovic, and Brendan Burns share best practices for using Azure Resource Manager (ARM) to optimize application deployment agility and ensure compliance across infrastructures. ARM enables you to repeatedly deploy your app and have confidence your resources are deployed in a consistent state.

You define the infrastructure and dependencies for your app in a single declarative template that is flexible enough to use for all of your environments such as test, staging or production. We’ll look at ways that you can create, share and publish both ARM templates, as well as Azure Managed Applications.

OpsLogix Oracle Managed Application

Azure Managed Applications build upon ARM templates to enable Managed Service Providers (MSPs), Independent Software Vendors (ISVs), and corporate central IT teams to deliver turnkey solutions through the Azure Marketplace or Service Catalog. These are self-contained and sealed to the consumer, allowing the provider to deliver a higher quality of service.

At the end of 2017, OpsLogix was a Microsoft key launch partners among Cisco and Xcalar on Managed Applications. At around 1:13:00 minutes into the session Vlad Joanovic mentions the OpsLogix Oracle Managed Application as part of his demo. We at OpsLogix highly appreciate this and would like to thank Vlad and Kristian Nese for this.

New Update: OpsLogix VMware Management Pack V1.3.8.46

New Update: OpsLogix VMware Management Pack V1.3.8.46

We’re happy to announce a new update release of our VMware Management Pack for SCOM 2012/2016. We’ve added a lot of new great reporting & datastore capacity functionalities to the Management Pack, improving investigative analysis and giving more insight into the monitoring of your VMware environment.

This latest release is upgradable starting from V1.3.0.0 or later.

The update is downloadable from the customer download area.

VMware Management Pack V1.3.8.46

Datastore Capacity

The new VMware Management Pack has the following additions regarding Datastore Capacity:

  • Datastore Capacity dashboard
  • Datastore Capacity performance collection
  • Host to datastore performance collection for Highest latency
  • Host to datastore performance collection for Storage I/O Control aggregated
  • Host to datastore performance collection for Storage I/O Control active time
  • Host to datastore performance collection for Storage I/O Control normalized latency
  • Host to datastore performance collection for Storage I/O Control datastore maximum queue depth

Reporting Galore!

So what are the new reporting additions of our OpsLogix VMware Management Pack?

  • A new generic Matrix and TopN Reports. Generally usable for every performance counter in SCOM.
  • Linked report: VMware Datacenter Availability
  • Linked report: VMware Datastore Availability
  • Linked report: VMware Datastore Usage Matrix
  • Linked report: VMware Host CPU and Memory Usage Matrix
  • Linked report: VMware Host CPU and Memory Usage TopN
  • Linked report: VMware VM CPU and Memory Usage Matrix
  • Linked report: VMware VM CPU and Memory Usage TopN

For more additions, changes, and fixes please refer to release notes.

Update Instructions

  1. Import the updated management packs.
  2. When imported wait +/- 10 minutes to get the updated MPs distributed in your SCOM environment.
  3. In the SCOM console, go to the “Administration” folder -> “Resource Pools” and select the resource pool(s) that is responsible for the VMware monitoring.
  4. Select the “view resource pool members..”. For every member, access to the server and follow the step below:
  5. Restart the SCOM agent

Team OpsLogix

VIDEO: AI For IT-Operations: How To Classify, Train & Escalate Alerts From SCOM

VIDEO: AI For IT-Operations: How To Classify, Train & Escalate Alerts From SCOM

WHAT’S IT ALL ABOUT?

The evergrowing amount of devices to be monitored in combination with high availability requirements makes it more urgent to review internal processes.

Introducing machine learned automation involves short-handed removal of manual processes that can be performed by a machine according to predetermined consistent routines.

In this webinar you will get an introduction and real world scenario how to:

  • Use pre-actions to classify and enrich your alert data
  • Train a machine learning model
  • Escalate to different channels depending on the predicted destination
  • Integration to ServiceNow with a bi-directional connector
  • Tag and analyze your escalated alerts
Why Are Less Than 1% Of Critical Alerts Investigated?

Why Are Less Than 1% Of Critical Alerts Investigated?

Many organizations seem to be suffering from alert fatigue. In a recent EMA report, according to Infosecurity, 80% of organizations that receive 500 or more severe/critical alerts per day, happen to investigate less than 1% of them. A shocking number to say the least! But what are the obstacles organizations are facing that allows such neglect?

From the EMA report, we can conclude that organizations face four major issues when it comes down to their ability to tackle these severe/ critical alerts.

 

 

Issues Organizations Face

Alert Volume

Recent surveys from the EMA report indicate that 92% of organizations receive up to 500 alerts a day. From all the organizations that took part in the survey, 88% said they receive up to 500 “critical” or “severe” alerts per day. Yet, 93% of those respondents would rate their endpoint prevention program as “competent”, “strong”, or even as “very strong”. So there either seems to be a big gap between perception and reality or alerts that are considered to be “severe” or “critical” should not be categorized as such. Either way alert management does not seem to be representative.

Capacity

Even if organizations have detection systems in place that create massive alert volumes, what they often lack is human resources to manage the alerts. Organizations are clearly dealing with a large capacity gap. Of the surveyed organizations that receive 500 to 900 severe/critical alerts per day, 60% have only 3-5 FTE’s working on the alerts.

On top of that, 67% of those surveyed indicate that only 10 or fewer sever/critical alerts are investigated per day and 87% of the participants told that their teams have the capacity to only investigate 25 or fewer severe/critical events per day. For most of the participants the alert volumes are high, however, the resources at their disposal are critically low. As a result, less than 1% of the incidents end up being investigated.

Priority

The research assumes a need for prioritization and classification into severe/critical buckets, which is understandable given the traditional, manual approach to Incident Response.

“In truth, any prioritization is a compromise, and the act of classifying by priority is merely a justification to ignore alerts.”

However, in doing so, the numbers are even worse and new questions arise. If less than 1% of severe/critical alerts are ever investigated, what percent of all alerts are investigated? What percentage of alerts are incorrectly categorized and how many alerts are classified as benign and ignored completely, yet warrant follow-up?

In truth, any prioritization is a compromise, and the act of classifying by priority is merely a justification to ignore alerts.

Incident Response

The three prior problems seem to indicate a substandard, broken incident response process. If there are too many alerts to investigate, but not nearly enough people to follow-up and the need to classify all alerts is maintained. All of this just to be able to act on less than 1% of the total number of alerts. However, 92% of respondents indicated that their Incident Response programs for endpoint incidents were “competent” or better.

The only way this makes sense is if respondents felt that when their Incident Response teams were finally able to actually take action on the small percentage of alerts that get to this point and they were successful in addressing the issue.

 

 

Conclusions

  • Detailed analysis showed that in aggregate 80% of the organizations were only able to investigate 11 to 25 events per day, leaving them a huge, and frankly insurmountable, daily gap.
  • Either due to a lack of tools to collect data or a lack of tools with the ability to analyze data, this issue is created by a lack of high-fidelity security information.
  • Information isn’t the problem. This and similar surveys show the depth and breadth of the problem facing cybersecurity teams today. However, simply gathering more information to hand off to analysts isn’t the answer.

 

 

The Solution

Automation is a key aspect of creating an effective and mature security program. It improves productivity and, given the lack of staff and the abundance of incidents in most organizations, automation should be a priority in the evolution of prevention and detection.

“Automation is the answer!”

When asked about automation of tasks such as data capture and/or analysis as they related to prevention, detection, and response for both network and endpoint security programs, 85% of the respondents said it was either important or very important.

Thus the only viable approach to the increase in alerts and scarcity of capacity is to use security orchestration and automation tools to:

  • Automatically investigate every alert as an alternative to prioritizing alerts to match capacity, use a solution to investigate every alert.
  • Gather additional context from other systems by automating the collection of contextual information from other network detection systems, logs, etc.
  • Exonerate or incriminate threats by using both known threat information and by inspection, decide whether what was detected is benign or malicious.
  • Automate the remediation process, once a verdict has been made, automatically remediate (quarantine a file, kill a process, shut down a CNC connection, etc.).

While we’re biased, this approach is the only way.

Hexadite, the only agentless intelligent security orchestration and automation platform for Global 2000 companies also states that automation is the only real answer by saying “it is impossible for organizations to hire enough people to create an adequate context for the data – and thus provide high fidelity security information.”

 

 

 

References

  • “Less Than 1% of Severe/Critical Security Alerts Are Ever Investigated” By Tara Seals for InfoSecurityMagazine.com, Retrieved April 8, 2018.
  • “White Paper: EMA Report Summary: Achieving High-Fidelity Security” EMA Research, Retrieved April 8, 2018.

 

New Update: OpsLogix VMware Management Pack V1.3.8.0

New Update: OpsLogix VMware Management Pack V1.3.8.0

We’re happy to announce a new update release of our VMware Management Pack for SCOM 2012/2016. This latest release is upgradable starting from V1.3.0.0 or later.

The update is downloadable from the customer download area.

VMware Management Pack V1.3.8.0

The new VMware Management Pack has been optimized on multiple fronts, such as:

  • The Snapshot Age monitor does not fail anymore due to incorrect regional settings (if not set as US).
  • The License monitor no longer shows a critical unhealthy state when 100% of the licenses are used.
  • After running a connection test in the UI, the first healthy server pool is selected instead of the first server pool.
  • When a data-store is set inactive in vCenter the data-store monitor workflows does not fail.

For more additions, changes, and fixes please refer to release notes.

 

Update Instructions

  1. Import the updated management packs.
  2. When imported wait +/- 10 minutes to get the updated MPs distributed in your SCOM environment.
  3. In the SCOM console, go to the “Administration” folder -> “Resource Pools” and select the resource pool(s) that is responsible for the VMware monitoring.
  4. Select the “view resource pool members..”. For every member, access to the server and follow the step below:
  5. Restart the SCOM agent

Team OpsLogix

 

Join Our Webcast With Approved: AI For IT-Operations: How To Classify, Train & Escalate Alerts From SCOM

Join Our Webcast With Approved: AI For IT-Operations: How To Classify, Train & Escalate Alerts From SCOM

WHAT’S IT ALL ABOUT?

The evergrowing amount of devices to be monitored in combination with high availability requirements makes it more urgent to review internal processes.

Introducing machine learned automation involves short-handed removal of manual processes that can be performed by a machine according to predetermined consistent routines.

In this webinar you will get an introduction and real world scenario how to:

  • Use pre-actions to classify and enrich your alert data
  • Train a machine learning model
  • Escalate to different channels depending on the predicted destination
  • Integration to ServiceNow with a bi-directional connector
  • Tag and analyze your escalated alerts

WHEN?

 

WEDNESDAY 4TH OF APRIL 2018

 

1st session

  • Amsterdam (Netherlands) 10:00 CEST
  • New York (USA – New York) 04:00 EDT
  • London (United Kingdom – England) 09:00 BST
  • Melbourne (Australia – Victoria) 18:00 AEST

2nd session

  • Amsterdam (Netherlands) 19:00 CEST
  • New York (USA – New York) 13:00 EDT
  • London (United Kingdom – England) 18:00 BST

Melbourne (Australia – Victoria) 03:00 AEST

PLEASE NOTE, ONLY 25 SPOTS PER SESSION. FIRST COME FIRST SERVE!

CONQUER YOUR SPOT NOW! CLICK HERE

 

3 Reasons To Implement Automation & Machine Learning For IT-Operations

3 Reasons To Implement Automation & Machine Learning For IT-Operations

A guest blog by Jonas Lenntun from Approved Sweden.

Clearly, we’ll automate!

Automation and efficiency go hand in hand and is something that has been mentioned in IT since the 70’s. Nevertheless, 40 years on, and the majority of companies still have to internalize and embrace automated processes.

The growing amount of devices to be monitored in combination with higher availability requirements makes it more urgent to review their internal processes. Especially when digitization is introduced with more and more critical e-services that are expected to be available 24 hours a day.

Introducing automation involves short-handed removal of manual processes that can easily be performed by a machine according to predetermined routines – in a shorter and the same way, each time.

Some processes have already come a long way in this. Among other things, orders of equipment, user setup or server update, along with a lot of administrative work.

At the IT department, there are three interesting areas with high potential to automate manual processes to become more efficient, reduce shorter lead times and reduce repetitive work.

What can machine learning add?

Machine learning has previously been perceived as not directly relevant to traditional monitoring and incident management. But more and more people realize that it is a matter of highest relevance to simplify everyday life, in every aspect.

Instead of manually escalating incidents or sending out notifications to readiness through complex and blunt regulations, machine learning can be applied.

We can relatively easily train a machine to automatically identify patterns and then perform the actions we want in a very short time.

We have already begun with automation.

Most likely, you have already begun implementing automation in several areas. Since automation is such a wide-ranging area, this article focuses on activities that increase the value of what the monitoring delivers and is more relevant to you in IT operations.

Three important automation areas

Escalation

At first sight, escalation is considered a rather simple process to automate. However, the more complex the rules are for different types of alarms to be distributed to different groups, depending on certain criteria, the more difficult it will be to easily control these rights through a static regulatory framework.

Instead of building complex script or programs, you can instead look at an alarm and train where to send. How it then comes to the conclusion is where machine learning comes in its right place. It finds patterns we did not know.

Large time savings can be made by shortening the processing time due to the fact that the cases are sent to the correct grouping without having to wait for a manual decision.

Recovery

Many errors that occur at the operating system level or around inadvertently stopped services can be easily reset.

Even though it is possible to configure it on a Windows service to start up if it is stopped, it is better to allow a monitoring system to capture the error. Since a monitoring system can both restore and maintain statistics, it will be easier to monitor any recurring interference. These statistics also provide a good basis for the problem process with the supplier – the dialogue is based on data instead of rumors and empathy.

Many restorations need to be clearly defined, but there is also the possibility to train a model that learns which rescues are to run in order to minimize complexity through machine learning.

Diagnostics

Many errors that occur may be difficult to automatically reset, but this does not mean we should exclude automation.

If a disc indicates that it is running out of space, then the human factor may be needed to determine what can be cleaned. But that does not prevent us from collecting diagnostic information of the person who will be performing the task.

Automation of diagnostics can be to look at which of the largest directories contain the largest files, or to insert a graph of disk usage into the analysis process.

Here too we can use Machine Learning to determine what to run or not.

How do we show results?

Introducing automation and machine learning in IT operations has many advantages. Since many things happen without anyone even discovering it, follow-up is one of the most important parts to improve results after the introduction.

There are many important key figures to look for before and after the introduction, but the most important thing is of course “Mean Time To Repair”, shortened MTTR. In short, the time it takes for the alarm to be resolved and closed.

Because we can divide  automation into three different categories, we can measure:

  • Recovery time overall on the alarms that are automated compared to those that are not
  • Automation degree overall – What is the percentage of alarms automated
  • Automation rate per queue – What is the percentage of alarms automated per destination
  • Recovery time of automatically escalated alarms compared to those done manually
  • Recovery time per escalated destination
  • Recovery time of automatic reset compared to manual handling
  • Recovery time of automated diagnostics compared to manual handling

These are just a few key figures that have a great effect in detecting the results of automation and machine learning.

Below you will find an example of the Approved operational analysis tool “IT Service Analytics” (in Swedish) which, with data from the Microsoft System Center Operations Manager, can show results after the introduction of automation.

 

Summary

Automation of IT operations is a topic that can not be ignored if you don’t want to risk getting lost. The challenge at first is to decide how and where to start. Building down and up and analyzing where to put the effort is a common tactic. With automation, basically, you suddenly get action that runs 24/7 on all your deliveries, reducing the need for emergency preparedness.

We hope you had a good introduction to why you just need to look at automation and machine learning in your organization.`

For more information, have a look at Approved’s concept of Digital Operations or email us at info@opslogix.com.

VMware Monitoring For Service Providers: Setting Up A Normal Gateway

VMware Monitoring For Service Providers: Setting Up A Normal Gateway

In this OpsLogix “How To” video we’ll be showing you how to set up your VMware monitoring as a Service Provider using a normal Gateway.

ALL SCRIPTS & COMMANDS CAN BE FOUND BELOW!

If you’d like to know more, request a quote or get a free evaluation of our VMware Management Pack, visit: opslogix.com/vmware-management-pack

GENERATE CERTIFICATES 

To generate Root, Management Server and GateWay certificates go to https://technet.microsoft.com/en-us/library/hh212810(v=sc.12).aspx

ADD A GATEWAY RUN TWO COMMANDS
 

COMMAND 1

Cd “C:\Program Files\System Center 2012\Operations Manager\Server”

COMMAND 2 

Microsoft.EnterpriseManagement.GatewayApprovalTool /ManagementServerName=accom2012sp1ms1.contoso.com /GatewayName=accom2012sp1gw1.Contoso.com /SiteName=Customer01 /Action=Create

  

POWERSHELL SCRIPT TO CREATE A NEW CUSTOMER SCOM RESOURCE POOL

Import-Module Operationsmanager

## Get the customer Gateway Server(s)
$GatewayServer01 = Get-SCOMManagementServer -Name “accom2012sp1gw1.Contoso.com”

## create a new Opslogix VMware resource pool
New-SCOMResourcePool -DisplayName “Remote Customer Site A – VMWare Pool” -Member $GatewayServer01

Try it for free

Want to try our Management Pack? Got to the VMware Management Pack page and fill in the contact form. Or drop us an email at sales@opslogix.com 

Also, read: 

VMware Monitoring For Service Providers: Local Customer Setup