Alert Escalation Scripts

Define and customize how NetCrunch generates events, triggers alerts, and executes actions based on monitoring detections, metrics, and device status. Given the size, complexity, and sprawl of modern IT infrastructures, your monitoring system should deliver automation and consistency to device discovery, monitoring strategy, and event/alert generation. In NetCrunch, Alert Escalation Scripts describe how 'scripted' actions are taken in response to an undesirable detection, or change in device status. Automation of monitoring system responses is the final and most critical component of a well-executed Business Continuity Strategy. This is where IT administrators across all disciplines benefit from NetCrunch's comprehensive abilities to respond and take automated actions. A device, or node, exists in NetCrunch for several reasons:

Routinely monitor and collect service responses, performance counters, role-based metrics or any other device attribute that may reflect a condition
Synthesize a measure of a device's overall status, availability, and performance
Automatically generate events, alerts, and actions based on a detection or change in device status

NetCrunch uses 2 core mechanisms to monitor your devices: Alerts and Data Collections. This article will focus on how NetCrunch uses Alert conditions and associated condition severity to implement notifications and actions. NetCrunch node Alert Summary

Monitoring Packs, Alert & Data Collection basics

Nodes (devices) discovered and monitored in NetCrunch are assigned Monitoring Packs automatically in the following categories, available in the device Node Settings:

Node: Device status, Service status, layer-2 location (if using Physical Segments)
Detected Operating System: Windows, VMware, Linux, etc...
SNMP (if detected)

The monitoring pack describes a collection of monitoring routines comprising of alerts/escalations and data collections, that can be applied either manually or automatically based on device meta-data definitions. By designating a monitored metric in the form of an Alert, NetCrunch provides the function of associating a granular metric with the logic to describe a condition and assign a condition severity. The device inherits and reflects this condition severity when the Alert logic evaluates as true. This creates a logged detection and invokes the Alert's associated Escalation Script. The purpose of an Alert is to provide the following key elements:

Definition of Alert condition: metric and associated condition logic to set the alert
Classification of a condition severity: provides action granularity based on the condition
Designation of an Alert Escalation Script: a structured routine of actions supporting severity filtering
Alert condition persistence: maintain the alert condition to support escalations until reset criteria are met

Data Collections are a simple mechanism for harvesting metrics and counters to be included in NetCrunch's trend records or reports and are exempt from any condition logic.

Complete Action and Escalation Management

The Alert Escalation Script is a specified routine that describes a series of sequential actions to be taken once an alert condition is met, as well as subsequent actions to be taken as the condition persists. It is how NetCrunch connects detections to actions and provides a comprehensive toolset to coordinate and generate events or actions on behalf of, or on a targeted device. These actions include:

Simple event logging
Detection-based alerting and notifications
Persistent event generation
Structured escalation of events, actions or notifications
Upstream integrations
- Notification services
- Service Desk software
Self-healing via automated actions on target
- Device reboots
- Service start/stop/restarts
- Custom script execution
Webhooks and API calls

Every individual alert description in a NetCrunch is assigned an Alert Escalation Script. This can provide one-to-one granularity per detection and allows NetCrunch to extend beyond simple notifications into complex remediation and self-healing. Alert Escalations Scripts are the primary integration point for NetCrunch's participation in a portfolio approach to managing your IT infrastructure. Whether you choose to use NetCrunch as a stand-alone system, or pair its capabilities with legacy systems or existing Busines Continuity strategies, you will find a capable and competent system, that has the strengths to go-it-alone or integrate nicely into your existing software portfolio.

The Default Action Script

All alerts in NetCrunch are associated with the Default Alerting Script... by default. For most small to mid-sized IT teams, leveraging a single, standardized escalation is more than enough. The Default escalation script provides a 3-step escalation, for Critical severity events only. It is important to observe that Alert Escalation Actions are filtered according to severity. It is possible to make an action occur regardless of severity, but be warned: We will honor this configuration and alert you accordingly. It is considered a best practice to postpone connecting NetCrunch to your email system until:

The NetCrunch Atlas is fully populated with all the devices in your monitoring scope
You are comfortable with the level of alerting, by severity, presented in the Network Atlas Event Log

For a quick list of preconfigured alerts by severity, navigate to Atlas > Run Configuration Manager > Automatic Monitoring Packs. This wizard is the best location to tune NetCrunch's alerting to match your requirements. This wizard provides on/off for individual alerts across all automatic monitoring packs. This will also turn off the polling of the associated metric, making it unavailable in the trend record. Until you feel comfortable, it is best to reserve email notifications for critical severity detections only.

Alert Severity: Tuning escalations

NetCrunch provides 4 alert severity classifications that provide escalation scripts the necessary granularity to tune how response actions are taken. Severity is used as a filtering criterion and matches the scripted action to the alert. If you do not designate a severity in an Action Script, be aware that this action will be taken regardless of alert severity.

'Blank' - perform on every alert
Critical - reserved for highest importance alerts (node, critical service down)
Warning - important and informative, but not critical (warnings are the bread crumbs in troubleshooting)
Informational - nice to know
Minor - lowest priority

Building and Alert Escalation Script

From the Alerting Scripts dialog, select 'Add Alerting Script' to initiate the Edit Alerting Script dialog.

Provide a unique name to the script to be built
Add a minimum of 2 script components
- Action to Run Immediately
- Action to Run on Alert Close
Assign an action or event to each of you script component.
Add an Action to Run After...
Be sure to select the 'Until Alert cleared, repeat last every...' checkbox if you wish to receive persistent alerts until the condition is remedied
Click OK to save and make globally available

The last step is to assign the Alerting Script to an alert condition in either a Monitoring Pack, custom alert, or sensor configured in a device's node settings. Building a custom script

alertalert escalationalerting