The SPS/Alert Manager

Automate the detection, classification and resolution of system and application errors

Questioner
Test Your Management Style (article)
User Testimonial (AOL)
User Testimonial (HDX)
SPS/Alert Manager Report

While operating modern sophisticated systems, it is extremely difficult, if not impossible, to notice and resolve all system and application events as they occur. Applications may produce any number of logs, errors and output files. The operating system alone may generate thousands of messages every day. Often, serious production problems may be prevented if alerts are noticed and responded to in time.

The SPS/Alert Manager is designed to solve this common problem. It monitors system elements, reads messages originating from various systems, modules and applications and funnels the information to a single terminal. Message filters assure that only important information is passed to the operator. Critical alerts cannot be ignored; messages will remain posted on the screen until the problem is attended to, acknowledged and given a proper entry by the operator.

The SPS/Alert Manager automatically executes user-defined recovery and corrective procedures (macros or programs) and submits critical information to selected users.


Click to enlarge.

Captures application messages and system events in real-time and reports critical conditions to the network control center
Monitors: Running processes,CPU utilization,page faults rate,I/O rate, interrupt level, available memory system overhead of individual processes, communication, TP rate, processing bottlenecks, application disk I/O distribution, disk space utilization, application alerts (ON/2, TCAM, file Creation/Arrival events, analyze_system meters and user-defined events.
Classifies and reports only significant alerts and critical problems based on simple, user-defined filtering rules
Automatically executes user-defined recovery procedures and creates a hands-free, self-correcting environment
Allows assignment of any number of terminals as system or network consoles
Enforces, tracks and reports operator acknowledgments to critical system alerts
Sends alert messages to selected terminals
Monitors application queuing activity, messages processed, messages pending, queue depth and transaction rate
Provides configuration flexibility in supporting a variety of applications, system software, and interaction with other platforms

Default Settings

System Meters
Monitored ObjectWarning LevelCritical Level
CPU40%80%
Empty-Idle30%10%
Memory Used60%80%
Paging Used60%80%
Critical I/O rateNot SetNot Set
Critical Page FaultsNot SetNot Set
Critical InterruptsNot SetNot Set
Other Meters
Monitored ObjectWarning LevelCritical Level
Disk Space Used80%90%
Disk Read Busy40%60%
Disk Write Busy40%60%
Disk I/O RateNot SetNot Set
VOS Queues100 msgs.200 msgs.
Process(es)Conditions: running / not-running on scdedule
Resources: CPU, I/O rate, memory, page-faults, Interrupts, idle too long.
File WatchdogConditions: file arrived / created, file missing
VOS System Log (syserr)Conditions: any user-supplied fltering / conditions.
VOS Security Log (syserr)Conditions: any user-supplied fltering / conditions.
Application LogsConditions: any user-supplied fltering / conditions.


VOS Alerts

VOS Console

System Usage

Performance History

Disk Usage

List Users

Queue Monitor-1

Queue Monitor-2

Explorer-1

Explorer-2

Batch Monitor

Network Monitor #1

Network Monitor - Devices