Software User's Guide
The SPS/Alert Manager

Table of contents.

  1. Introduction
  2. Definitions
    1. Objects
    2. Classes
    3. Thresholds
    4. Time Windows
    5. Intervals
    6. Filters
    7. Acknowledgements
    8. Escalation
    9. Carry-Over
    10. Special Keywords
    11. WEB Reports
    12. Features
    13. Environments
  3. Alert Manager Components
  4. Default Settings
  5. The alert_manager_server.pm program
  6. The Alert Manager Console
  7. The Alerts Monitor
  8. The System Usage Monitor
  9. The Disk Space Monitor
  10. The Grouped List-Users Monitor
  11. The List-Users Monitor
  12. The Queue Monitor
  13. The Batch Monitor
  14. The Networkd Monitor
  15. The Explorer Menu
  16. The Main Control Table (alert_manager.table)
    1. output_device
    2. max_allowed_subprocesses
    3. max_carry_over
    4. network_normal_class
    5. network_critical_class
    6. user_logins_class
    7. priv_command_class
    8. netstat_device_NN
    9. netstat_critical
    10. netstat_warning
    11. netstat_normal_class
    12. netstat_critical_class
    13. netstat_warning_class
    14. netstat_options
    15. netstat_states
    16. output_ip_addr
    17. output_ip_port
    18. output_ip_udp
    19. output_ip_for_lams
    20. disks_interval
    21. performance_interval
    22. processes_interval
    23. qmon_interval
    24. fwd_interval
    25. files_interval
    26. tcp_interval
    27. web_window_id
    28. web_session_timeout
    29. web_command_timeout
    30. scaling_factor
    31. web_xxx_interval
    32. web_no_lines
    33. web_lui_no_lines
    34. web_show_scrollbar
    35. web_initial_menu
    36. web_require_vos_pass
    37. web_console_menu_text
    38. web_console_menu_command
    39. drms_options
    40. max_processes
    41. retain_logs
    42. trace
  17. The Classes Table (sps_classes.table)
    1. class_code
    2. days_mask
    3. active_from-to
    4. dups_interval
    5. help_text
    6. wait_for_acknowledgment
    7. acknowledgment_escalation_time
    8. acknowledgment_escalation_class
    9. output_q_message
    10. email_user_1-10
    11. start_job_1-10
    12. scheduler_env
    13. start_command
    14. minimum_reporting_interval
    15. minimum_execution_interval
    16. require_min_occurrences
    17. require_min_occurrences_interval
    18. daily_max
    19. color
  18. The Performance Table (sps_performance.table)
    1. logical_name
    2. active_from-to
    3. cpu_usage
    4. empty_idle
    5. io_rate
    6. page_fault
    7. interrupts
    8. core
    9. used_memory
    10. used_paging
    11. normal_class
    12. warning/critical_class
  19. The Processes Table (sps_processes.table)
    1. logical_name
    2. process_name
    3. user_name
    4. active_from-to
    5. no_of_processes
    6. must_be_up_from-to
    7. must_be_down_from-to
    8. critical_cpu
    9. critical_io_rate
    10. critical_memory
    11. process_down_critical_class
    12. process_up_critical_class
    13. process_busy_critical_class
  20. The Queues Table (sps_qmon.table)
    1. logical_name
    2. q_path
    3. active_from-to
    4. pending_alert
    5. normal_class
    6. warning/critical_class
  21. The File Watchdog Table (sps_fwd.table)
    1. logical_name
    2. active_from-to
    3. input_path
    4. must_arrive
    5. allow_lockers
    6. normal_class
    7. file_arrived_class
    8. critical_class
  22. The Disks Table (sps_disks.table)
    1. logical_name
    2. path
    3. active_from-to
    4. critical_percent_used
    5. critical_io_rate
    6. critical_read_busy
    7. critical_write_busy
    8. file_max_blocks
    9. dir_max_objects
    10. normal_class
    11. warning/critical_class
  23. The Files Table (sps_files.table)
    1. logical_name
    2. input_path
    3. input_event_path
    4. input_ip_addr
    5. input_ip_port
    6. start_pos
    7. end_pos
    8. tcam_log
    9. classify_once
    10. use_filter_01..16
    11. include_classes, exclude_classes
    12. an_alert_manager_input
  24. The Filters Table (sps_filters.table)
    1. filter_name
    2. class_code
    3. match_and_01..16
    4. match_or_01..16
    5. omit_and_01..16
    6. omit_or_01..16
    7. msg_prefix
    8. caseless
  25. The WEB Table (sps_web.table)
    1. logical_name
    2. category
    3. report_path
    4. notes
    5. acl_name
  26. The Reports Table (sps_reports.table)
    1. report
    2. from
    3. to
    4. include_class_NN
    5. exclude_class_NN
    6. sources
    7. modules
    8. match
    9. output_path
    10. email
  27. The E-Mail Facility
    1. server_name
    2. server_ip_address
    3. server_port_number
    4. user_name
    5. password
    6. use_domain
    7. nick_name_xx
    8. e_address_xx
    9. debug
    10. trace
  28. The AlertManager-Netview/Tivoli interface
  29. Setup instructions
  30. Using Environments


Introduction

While operating modern sophisticated systems, it is extremely difficult, if not impossible, to notice and resolve all system and application events as they occur. Applications may produce any number of logs, errors and output files. The operating system alone may generate thousands of messages every day. Often, serious production problems may be prevented if alerts are noticed and responded to in time.

The SPS/Alert Manager is designed to solve this common problem. It monitors system elements, reads messages originating from various systems, modules and applications and funnels the information to a single terminal. Message filters assure that only important information is passed to the operator. Critical alerts cannot be ignored; messages will remain posted on the screen until the problem is attended to, acknowledged and given a proper entry by the operator.

The SPS/Alert Manager automatically executes user-defined recovery and corrective procedures and submits critical information to selected terminals, E-Mail users or other presentation platforms.


Definitions

Objects

An object is any monitored element on the system. For example, CPU level, disk-space, a file, communication line etc. An object is identified with a unique logical-name

Classes (Alerts)

Classes identifies the severity level and define what actions the Server must take for specific conditions. AlertManager assigns each message a class code. A Class code can also be referred to as the state of a monitored Object or simply an Alert. For example CPU level can change from Normal (NOR) to Warning (WAR) and then back to Normal. The user may define any number of new Class codes and use them throughout the system.

Access to certain Classes may restricted on a per-user basis. To restrict access to a class code, create a [class-code].ftr file in the alert_manager directory and set standard access rights (give/remove_access) to determine who has access to this Class.

Thresholds

A set of numeric values that is used to indicate a certain condition. Most Objects, will have three levels of thresholds - normal, warning and critical.

Time Windows

A period during which some known Thresholds apply. Time windows allow the use of different Threshold depending on the expected system load.

Intervals

Intervals are short timeframes, specified in seconds or minutes, between two monitored events.

Filters

Filters are predefined set of keywords and phrases. Filters are used to process operating system and application messages that may appear in any file on the system. Filters determine how messages are classified and treated.

Acknowledgements

Certain important conditions may be configured to require operator's Acknowledgement. AlertManager audits this activity and record the name of the operator and time of the acknowledgement.

Escalation

Messages that require operator's acknowledgement, may be escalated, if not acknowledged in time, to a higher severity level thus triggering different actions.

Carry-Over

Ever day at midnight, AlertManager creates a new daily log file. Carry-Over is a process that AlertManager performs at this time, and is basically the copying over of all un-acknowledged messages to the new daily log flle.

Special Keywords

AlertManager uses special keywords which are replaced during run-time when recording Alerts and when building command lines for execution (see =start_command). The following Special Keywords are used:

WEB Reports

Alert Manager can be configured to deliver reports, out files, configuration tables and directory listings directly to your web Browser. For more information on this WEB-Server functionality, see
The WEB Table (sps_web.table)

Features

Features allow the SysAdmin to tailor the Alert-Manager interface by enabling and disabling different features and functions based on the user name. Features are enabled or disabled by setting up standard access rights (give-access) to the FTR (feature) files. For example: you can remove John's access to the Scheduler interface by removing the Scheduler Tab as follows:

      give_access null Scheduler_tab.ftr -user John_Smith.*

A user can be limited to a single function. For example, Tim might need access only to the Scheduler interface. In this case, Tim will have null access to all FTR files and read access to the Scheduler_tab.ftr and go directly to the Scheduler window as soon as he logs into the system.

Features files:

Alerts_Monitor_tab.ftr
Batch_tab.ftr
cpu_performance_box.ftr
Disk_Usage_tab.ftr
drms_operations_menu_box.ftr
DRMS_tab.ftr
Explorer_tab.ftr
List_Users_tab.ftr
Menu_System_tab.ftr
Network_tab.ftr
operations_menu_box.ftr
performance_graphs.ftr
process_breakdown_box.ftr
Queue_Monitor_tab.ftr
recent_status_box.ftr
Scheduler_tab.ftr
System_Usage_tab.ftr
vos_daily_logs_box.ftr

Features can also be used to designate certain Classes of messages to individual users. The idea here is that some users may need to look only at some specific errors while others are interested or should have access to other error types. To create a Feature file for a Class of messages, create a [class-code].ftr file and assign standard access-right (acls) to it. While looking at the Alerts Monitor or when running reports, records with class codes to which the user has no access to will be masked off and will not appear on the Monitor or the report.

Environments

Environments are complete or partial sets of configuration files. Using multiple Environments allow running multiple instances of AlertManager servers each working with its own separate configuration set (Environment).


Alert Manager Components


Click to enlarge.


Default Settings

System Meters
Monitored ObjectWarning LevelCritical LevelAlert Message(s)
CPU40%80%CPU level at XX%
Core level at XX%.
Empty-Idle30%10%Empty Idle level at XX%.
Memory Used60%80%Low on memory. Used: XX%.
Paging Used60%80%Low on paging area. Used: XX%.
I/O rateNot SetNot SetI/O rate at XX per second.
Page FaultsNot SetNot SetPage Faults at XX%.
InterruptsNot SetNot SetInterrupts at XX%.
All SYSTEM meteres are within normal rangePerformance meters are back to normal levels.
 
Other Meters
Monitored ObjectWarning LevelCritical LevelAlert Message(s)
Disk Space Used80%90%Disk [disk-name] is at XX% used.
Disk Read Busy40%60%Disk [disk-name] is read-busy at XX%.
Disk Write Busy40%60%Disk [disk-name] is write-busy at XX%.
Disk I/O RateNot SetNot SetDisk [disk-name] is busy at XX I/O per second.
All DISK meteres are within normal rangeFree space on [disk-name] and I/O rate back to normal levels.
VOS Queues100 msgs.200 msgs.Pending message count on [queue-name]is XXX.
Pending message count on [queue-name] is back to normal levels.
Process(es)Conditions: running / not-running on schedule
Resources: CPU, I/O rate, memory, idle too long.
Process[es] [process-name] not running.
Process[es] [process-name] is running; Not intended to run now.
Process[es] [process-name] has been idle in the last XX minute(s).
Process[es] [process-name] consuming XX% cpu.
Process[es] [process-name] performing XXX I/Os per second.
Process[es] [process-name] using XX% of total memory.'
Process[es] [process-name] running now. (as planned!).
Process[es] [process-name] are not running. (as planned!).
File WatchdogConditions: file arrived / created, file missingFile arrived/created: [nick-name]
File [nick-name] has not arrived/created.
VOS System Log (syserr)Conditions: any user-supplied filtering / conditions.Original syserr_log.(date) message.
VOS Security Log (syserr)Conditions: any user-supplied filtering / conditions.Original security_log.(date) message.
Application LogsConditions: any user-supplied filtering / conditions.Original log message.
 
Security Filters
Filter nameClass CodeAction taken / Notes
Internal filterLIORecords all login and logout events; records user name and session length in the daily log and in a separate database.
Internal filterPRIRecords all privileged commands executed by non-privileged users.
Security_01S01Detected changes to login_admin; AM automatically restores the default settings.
Security_02S02Detected changes to logout_admin; AM automatically restores the default settings.
Security_03S03Detected changes to audit_admin; AM automatically restores the default settings.
Security_04S04Detected changes to password-security; AM automatically restores the default settings.
Security_05S05Detected changes to system tuning parameters; AM automatically restores the default settings.
Security_06S06Detected an unauthorized attempt to modify user's start_up.cm macro. AM automatically logs the user out of the system and optionally bans him from the system.
Security_07S07Detected an unauthorized attempt to access one of the SPS audit trace files. AM automatically logs the user out of the system and optionally bans him from the system.
Security_08S08Detected an unauthorized or authorized access to a monitored file , marked for object-audit.
Security_09S09Detected a failed attempt to log into the system. This includes unauthorized attempts, bad-passwords and users whose account has been terminated trying to get in.
Security_10S10Detected an access-right violation - user attempting to access a directory or file to which he has no sufficient access.


The alert_manager_server.pm program

alert_manager_server.pm

Purpose

This is the server that runs in the background. It is highly recommended that it would be started by a privileged user. Start this program by executing start_alert_manager. Note that only one AlertManager Server can run on any given environment. For this reason, the Console is normally configured as an On-Top window - it will reveal itself after a short period if left covered by other windows.

CRT Form

------------------------------ SPS/Alert Manager -----------------------------
 vos_console: 0                   
 -command:     
 -name:      

Explanation

You can use the following Commans:

  • To suspend execution of a specific Class: alert_manager_sever.pm -command suspend -name [class-code]

  • To resume execution of a specific Class: alert_manager_sever.pm -command resume -name [class-code]

  • To run a predefined report: alert_manager_sever.pm -command report -name [report_name]


    The Alert Manager Console

    The Console is the first screen that is presented to the operator. It is used to open all other system monitors but the most significant area is its Recent Status box that lists any objects that are marked as having a problem - i.e, not in their expected Normal State.

    As a SysAdmin you can configure the Console to disable selected tabs based on a user name. This is done by giving null access to one or more of the *.tab files. For example, the following command will remove Joe's access to the Alerts Monitor tab: give_access null Alerts_Monitor.tab -user joe_smith.dev


    The Alerts Monitor

    The Alerts Monitor is the heart of the system. All system and application events that were recorded into the database can researched here. The operator can use the monitor to detect potential problem, react to critical events and acknowledge certain important messages.

    The Monitor has two mode or operations:


    VOS Alerts

    The Search Mode

    The Search Mode can be recognized by a new line that offers some input fields - MsgID, Class and Match followed by a Go button. The Search Mode allows the operator the freely move around the database and search for any message(s). The Monitor stops tailing today's log and goes into the Search Mode simply by clicking one of the following:

    1. The Search link
    2. One of the navigation links to change the page displayed - first, last, next, back.
    3. Switching the date displayed - previous-day, next-day links.

    In Search Mode you can:

    1. Position on a known record by typing its message-id in MsgID and clicking Go.
    2. Look at a consolidated list of messages by a given Class code. Enter the class code in the Class field and click the Go button.
    3. Using the Match field to look for certain messages.
    4. Switch dates by clicking the navigation arrows next to the log date field.

    To switch back to Tail Mode either click the Tail link or simply wait for one minute and the system will switch it automatically to Tail Mode for you.

    If there are any messages that require acknowledgement and have not yet been acknowledged, the Pending-Ack counter will turn red and will become a clickable link. Clinking it will present a screen with only Ack-Required messages. The operator can then Ack. individual messages or click the Ack-all-pending link to acknowledge all pending messages in the database.

    Creating reports

    Clicking the Reports link will open the following form. Note that dates and time fields must follow VOS standards, i.e yy-mm-dd and hh:mm:ss format. To Email the report to an authorized recipient, you may used the Email-To field and specify a Nick-Name as defined in the sps_email_server configuration table.

    Here is an example for a HTML-formatted report (click to enlarge):


    The System Usage Monitor

    The System Usage Monitor shows standard VOS meters: CPU, page-faults, Interrupts, empty-idle, file-io, disk-io over the last 10 seconds, 1 minute, 5 minutes and one hour. The Monitor refreshes the information every 10 seconds. For more information, see the sps_performance configuration table.

    The second area of the Monitor shows graphs and information on:

    1. Available Cache
    2. Available memory with a breakdown by System, Cache and User.
    3. Available paging partitions.
    4. TP-rate. Transaction processing rate per second.

    The third part of the display lists any performance related conditions that have been identified by Alert Manager. If all meters are within their Normal operating range this area will not be displayed.


    System Usage


    The Disk Space Monitor

    The Disk Space Monitor shows two graphs for every disk on the system - Free/used space and I/O rate with a breakdown of reads and writes. For more information see the sps_disks configuration table. Note the scale for the I/O rate graphs is determined by the =max_disk_io field in the alert_manager.tin configuration file.

    The second part of the display lists any disk-related conditions that have been identified by Alert Manager. If all meters are within their Normal operating range this area will not be displayed.


    Disk Usage


    The Grouped List-Users Monitor

    The List-Users monitor is based on logical groups of processes that are defined in the sps_processes configuration file. A Group is defined by a standard VOS user name (*.System) and a process-name. Each group is graphed to show its relative CPU, page-faults, memory consumption as well as I/O rate, and number of processes running.

    The information may be sorted by CPU, Faults, Memory, I-O Rate, User-name and number of processes. To change the sort order, simply click on any of the table's headings.

    In the example below we can see that the 3 DRMS servers are taking 26.5% of the module's CPU and 8.3% of its memory.


    List Users-1

    The second part of the display lists any performance related conditions that have been identified by Alert Manager. If all meters are within their Normal operating range this area will not be displayed.

    In the example below we can see that the DRMS servers are not running.


    List Users-2


    The List-Users Monitor

    The List-Users (All) monitor lists the processes that are taking to most resources. The information may be sorted by clicking on one of the table's headers: CPU, Faults, Memory, Reads, Writes etc.


    List Users-3


    The Queue Monitor

    The Queue monitor is based on queues defined in the sps_qmon configuration configuration table. Queues can be server queues (1 or 2 way) and message queued.

    The information may be sorted by Queue-name, ON-queue, Pending count, Highest, Total, TXN (transaction processing rate). To change the sort order, simply click on any of the table's headings.

    In the example below we can see that DRMS-X queues are processing at a rate of 70 messages per second and that there are only one or two messages in the queue (pending).


    Queue Monitor-1

    The second part of the display lists any performance related conditions that have been identified by Alert Manager. If all meters are within their Normal operating range this area will not be displayed.

    In the example below we can see that the DRMS_App queue went into his Critical state because there are 5973 messages pending processing on it.


    Queue Monitor-2


    The Batch Monitor

    The Batch monitor does not require any configuration as it automatically detects the active queues on the system.


    Batch Monitor


    The Network Monitor

    The Network monitor does not require any configuration. It monitors all STCP and UDP activities on the system.


    Network Monitor #1


    Network Monitor - Devices


    The Explorer Menu

    The Explorer Menu is created based on a list of allowed files, reports and directory listings as specified in the sps_web configuration table.


    Explorer-1

    In the example below we can see a directory listing all TIN files. You may change the sorting order of the table by clicking any of the table's headings.


    Explorer-1


    The Main Control Table (alert_manager.table)

    The Main Control Table is used to define some global configuration parameters.

    organization: relative;
    fields:    
    output_device                 char (66) var,
    max_allowed_subprocesses      bin (15) default ('5'),
    max_carry_over                bin (15) default ('-1'),  
    network_normal_class          char (3) default ('NOR'),
    network_critical_class        char (3) default ('CRI'),  
    user_logins_class             char (3) default ('LIO'),
    priv_command_class            char (3) default ('PRI'),
    netstat_device_01             char (32) var,
    netstat_device_02             char (32) var,
    netstat_device_03             char (32) var,
    netstat_device_04             char (32) var,
    netstat_device_05             char (32) var,
    netstat_critical              bin (15),
    netstat_warning               bin (15),
    netstat_normal_class          char (3) default ('NOR'),
    netstat_critical_class        char (3) default ('CRI'),
    netstat_warning_class         char (3) default ('WAR'),
    netstat_options               bin (15) default ('7'),
    netstat_states                bin (15) default ('1'),
    output_ip_addr                char (32) var,
    output_ip_port                bin (15),
    output_ip_udp                 bit,
    output_ip_for_lams            bit,
    disks_interval                bin (15) default ('60'),
    performance_interval          bin (15) default ('60'),
    processes_interval            bin (15) default ('60'),
    qmon_interval                 bin (15) default ('60'),
    fwd_interval                  bin (15) default ('60'),
    files_interval                bin (15) default ('10'),
    tcp_interval                  bin (15) default ('60'),
    web_window_id                 bin (15) default ('1'),
    web_session_timeout           bin (15) default ('10'),
    web_command_timeout           bin (15) default ('10'),
    scaling_factor                dec (15,3) default (-1),
    web_console_interval          bin (15) default ('10'),
    web_alerts_interval           bin (15) default ('10'),
    web_dsu_interval              bin (15) default ('10'),
    web_ddi_interval              bin (15) default ('10'),
    web_lui_interval              bin (15) default ('10'),
    web_qmon_interval             bin (15) default ('10'),
    web_batch_interval            bin (15) default ('10'),
    web_network_interval          bin (15) default ('60'),
    web_no_lines                  bin (15) default ('20'),
    web_lui_no_lines              bin (15) default ('20'),
    web_show_scrollbar			bit,
    web_initial_menu              char (32) var default ('Operations Menu'),
    web_require_vos_pass          bit  (1),
    web_console_menu_text_01      char (32) var,
    web_console_menu_text_02      char (32) var,
    web_console_menu_text_03      char (32) var,
    web_console_menu_text_04      char (32) var,
    web_console_menu_text_05      char (32) var,
    
    web_console_menu_command_01   char (256) var,
    web_console_menu_command_02   char (256) var,
    web_console_menu_command_03   char (256) var,
    web_console_menu_command_04   char (256) var,
    web_console_menu_command_05   char (256) var,
    drms_options				bin (15) default (63),
    max_processes                 bin (15) default (2000),
    retain_logs                   bin (15) default (90),
    trace                         bin (15);
    end;
    

    Field Definitions

    output_device

    AlertManager may write messages to a dedicated device like a dedicated printer device.

    max_allowed_subprocesses

    Based on configuration and filtering rules the AlertManager server may start additional sub-processes as configured by the user in the sps_clasess.table. To limit the risk of starting too many sub-processes (filtering rules needs adjustment), you may specify a limit beyond AlertManger will not start any additional sub-processes.

    max_carry_over

    max_carry_over defines how ack-required messages are copied over to the next day:

    • -1 Carry over all messages
    • 0 Dont carry over any messages
    • N Carry over N messages

    network_normal_class

    The class/state code that should be assigned when all modules are online.

    network_critical_class

    The class/state code that should be assigned when one or more modules become offline.

    user_logins_class

    If given a valid class code, AlertManager will create entries whenever users log into or out of the system. To disable this feature, remove the class code (blank).

    priv_command_class

    If given a valid class code, AlertManager will create entries whenever users execute a Privileged command. class code (blank).

    netstat_device_NN

    A name of a Streams devices you wish to monitor.

    netstat_critical

    The critical threshold of device utilization as a percentage of the line's capacity.

    netstat_warnging

    The warnging threshold of device utilization as a percentage of the line's capacity.

    netstat_normal_class

    The class code given to a message written to the log once the Streams meters are back to normal levels - below the warning threshold.

    netstat_critical_class

    The class code given to a message written to the log once the Streams meters are higher than the the critical threshold.

    netstat_warning_class

    The class code given to a message written to the log once the Streams meters are higher than the the warning threshold.

    netstat_options

    A 16-bit encoding that sets the initial options in the Network-monitor window as follows:

    1Show TCP connections
    2Show UDP connections
    4Show Statistics

    netstat_states

    A 16-bit encoding that sets the initial State options in the NetworK-monitor window as follows:

    CodeState
    1ALL
    2LAST_ACK
    4TIME_WAIT
    8CLOSING
    16FIN_WAIT_2
    32CLOSE_WAIT
    64FIN_WAIT_1
    128ESTABLISHED
    256SYN_RECV
    512SYN_SEND
    1024LISTEN
    2048CLOSED

    output_ip_addr

    AlertManager may be configured to send message via TCP/IP. Specify the IP address of the target computer.

    output_ip_port

    Specify the port number of the target computer.

    output_ip_udp

    Set to 1 if you are forwarding messages to a UNIX/syslog system.

    output_ip_for_lams

    By default, when sending messages via TCP/IP, AlertManager formats the message based on the =output_q_message field in the Classes table. Set this switch to 1 if the target module is your Master AlertManager. This applies only to environment with multiple Stratus modules each running its own copy of AlertManager.

    disks_interval

    An interval, specified in seconds, that defines how often system's disks should be checked (see sps_disks table).

    performance_interval

    An interval, specified in seconds, that defines how often system's performance meters should be checked (see sps_performance table).

    processes_interval

    An interval, specified in seconds, that defines how often running processes should be checked (see sps_processes table).

    qmon_interval

    An interval, specified in seconds, that defines how often application queues should be checked (see sps_qmon table).

    fwd_interval

    An interval, specified in seconds, that defines how often the File-Watchdog layer should be evoked to look for new file arrivals(see sps_fwd table).

    files_interval

    An interval, specified in seconds, that defines how often the application log files should checked (see sps_files table).

    tcp_interval

    An interval, specified in seconds, that defines how often the Streams devicesshould checked.

    web_window

    A unique-id number for the modlue.

    web_session_timeout

    An interval, specified in minutes , that defines how the session's inactivity timeout after which the user will be forced to log into the system.

    web_command_timeout

    An interval, specified in seconds, that defines the maximum allowed execution time for a command triggered by a Menu item.

    scaling_factor

    The scaling factor to be used on V-Series modules. For more information see VOS documentation on display_system_usage and how to adjust the scaling factor for hyper-threaded processors.

    web_xxx_interval

    An interval, specified in seconds, between each screen refresh. These intervals apply to all Browser sub-windows monitors - console, alerts, system-usage, disk-info, list-users, queue monitor, batch monitor and network monitor.

    web_no_lines

    The number of lines you wish to set for the Alerts sub-window.

    web_lui_no_lines

    The number of lines you wish to set for the List-Users-All sub-window.

    web_show_scrollbar

    By default, most windows are opened without a scrollbar. Set this switch to enable scrollbars.

    web_initial_menu

    The SPS/Menu can be access via the web interface. If you are licensed to use this package, you may specify any existing menu name as the initial menu.

    web_require_vos_pass

    By default, the initial login screen requires the user to enter a valid user-id and a valid password that is checked against VOS registration file. If this is not required, you can reset this switch to zero and the user can just enter any name that will identify him to the system in which case it will not have to be a real user-id and a password will not be required.

    web_console_menu_text

    You may configure up to 5 commands that the user can execute from the console window. Each command has a text description that appears on the screen and the associated VOS command-line.

    web_console_menu_command

    See explanation on web_console_text.

    drms_options

    A 16-bit encoding that sets the initial options for the DRMS window as follows:

    1DRMS Processes
    2CPU Performance
    4User Application Processes
    8Operations Menu
    16Queue Monitor
    32I/O Monitor
    64Critical Files

    max_processes

    The maximum number of processes running on the system.

    retain_logs

    As part of Alert-Manager's Midnight processing, it will automatically purge old logs depending on the number of days specified in retain_logs.

    trace

    Use the following values:

    0no trace
    1events
    2 file switch trace
    4 messages
    8 memory trace
    16 communications trace


    The Classes Table (sps_classes.table)

    The Classes Table defines how messages and alerts should be treated.

    organization: relative;
    index:         class_code no_duplicates;
    fields:    
    class_code                          char (3),           
    ignore_on_holidays                  bit (1) aligned,
    days_mask                           char (7) default ('yyyyyyy'), 
    active_from                         char (5),           
    active_to                           char (5),           
    dups_interval                       bin  (15),        
    help_text                           char (256)  var,
    wait_for_acknowledgment             bin (15) default ('0'),
    acknowledgment_escalation_time      bin (15) default ('0'),
    acknowledgment_escalation_class     char (3) default (''),
    output_q_message 			char (256) var default ('@time: @msg'),
    email_user_1     			char (32)  var,  
    email_user_2     			char (32)  var,  
    email_user_3     			char (32)  var,
    email_user_4     			char (32)  var, 
    email_user_5     			char (32)  var, 
    email_user_6     			char (32)  var,  
    email_user_7     			char (32)  var,  
    email_user_8     			char (32)  var,
    email_user_9     			char (32)  var, 
    email_user_10    			char (32)  var, 
    start_job_1                         char (32)  var,  
    start_job_2                         char (32)  var,  
    start_job_3                         char (32)  var,
    start_job_4                         char (32)  var, 
    start_job_5                         char (32)  var, 
    start_job_6                         char (32)  var,  
    start_job_7                         char (32)  var,  
    start_job_8                         char (32)  var,
    start_job_9                         char (32)  var, 
    start_job_10                        char (32)  var, 
    scheduler_env                       char (32)  var, 
    start_command    			char (256) var,  
    minimum_reporting_interval		bin (15), 
    minimum_execution_interval		bin (15),
    require_min_occurrences 		bin (15),
    require_min_occurrences_interval     bin (15),
    daily_max                            bin (15) default (2000),
    color 				char (32) var,
    end;
    

    Field Definitions

    class_code

    A unique code that identifies the class. You may make up your own codes or use a numeric scheme. Suggested codes are:

    • INF - Information
    • WAR - Warning
    • SEV - Severe
    • CRI - Critical
    • FAT - Fatal

    ignore_on_holidays

    When set, Class processing will suspended during holidays.

    days_mask

    Use to suppress the Class during certain days such as weekends. For example,to suppress on Saturday/Sunday, use nyyyyyn.

    active_from-to

    Use hh:mm format These fields are optional. By default Classes are always active. When disabled,there will be no logging and no action taken for the specified Class.

    dups_interval

    A time in minutes. Any duplicate (IDENTICAL) message that arrive within this interval will be ignored.

    help_text

    If given, the text appear on the terminal's status line (25th) when the users tabs to the message. When using a Browser, clicking on the CLASS code link will open a message box with the help_text.

    wait_for_acknowledgment

    Used to set messages that require the operator's acknowledgement.

    • 0 = No Ack. required.
    • -1 = Wait for Operator's Ack. forever.
    • N = Automatically Ack. message by AlertManager after N seconds.

    acknowledgment_escalation_time

    For alerts that require acknowledgement, you can set an escalation procedure where after a certain time (if not acknowledged), AlertManager will produce another alert with a different class code acknowledgment_escalation_class. acknowledgment_escalation_time is specified in minutes. AlertManager executes the escalation feature every 2 minutes so messages may be escalated within up to 2 minutes of their assigned escalation time.

    acknowledgment_escalation_class

    The new Class code that AlertManager will use for the Escalated message.

    output_q_message

    Send to output_q if given in the sps_lam.table control rec. Also used to reply to a Client reading the database. Standard keywords can be used (see below under start_command).

    email_user_1-10

    Specify up to 10 email addresses to receive AlertManager messages.

    start_job_1-10

    Specify up to 10 jobs / process-names to be started by the Scheduler. For more information, refer to the The SPS/Application Scheduler & Monitor (PCS) User's Guide.

    scheduler_env

    The name of the Scheduler's Environment that governs the started job(s). For more information, refer to the The SPS/Application Scheduler & Monitor (PCS) User's Guide.

    start_command

    Commands may be started by the Logs Manager. You may include the following keywords that will be substituted at execution time:

    • @ID - The message ID
    • @from - the source of the message
    • @obj - The object's logical name
    • @class - The class code
    • @module- The module name
    • @ack - Y=Yes, N=No.
    • @time - Current time
    • @msg - The message buffer
    • @user - The user name (only for security_log msgs)

    minimum_reporting_interval

    A time interval given in minutes within if a message is classified with this Class code, it will not be logged or executed.

    minimum_execution_interval

    A time interval given in minutes within if a message is classified with this Class code, it will be logged but the associated command (=start_command) if any will not be executed.

    require_min_occurrences
    require_min_occurrences_interval

    The require_min occurrences is used to throttle execution of commands (see =start_command). The interval is specified in minutes.

      Example:

    =require_min_occurrences               10
    =require_min_occurrences_interval      5
    
    Means that if there are 10 occurrences of the Alert within the last 5 minutes, then the start_command will be executed. Otherwise (less than 10 messages or over more than 5 minutes timeframe, AlertManager will only log the alert and will NOT execute the command. Be extra careful with this feature or you WILL loose important messages.

    daily_max

    The maximum number of messages that can be recorded daily for the class code.

    color

    The unique color you wish to assign the Class. To find out more on choosing colors goto http://www.w3schools.com/html/html_colornames.


    The Performance Table (sps_performance.table)

    The Performance Table governs all aspects of performance monitoring and alerts. You may define multiple records, each identified by a unique logical_name. Each record may represent a different time frame (see active_from, active_to). The following meters are monitored:

    For every type of meter you may define up to 2 thresholds: Warning and Critical.

    organization: relative;
    index :          			logical_name  no_duplicates;
    fields:    
    logical_name             		char (32) var,      
    
    active_from    			char (5),
    active_to      			char (5), 
    
    critical_cpu_usage            	bin(15) default ('80'),
    warning_cpu_usage             	bin (15) default ('40'),
    critical_empty_idle           	bin (15) default ('10'),
    warning_empty_idle            	bin (15) default ('30'),
    critical_io_rate              	bin (15) default ('0'),
    critical_page_fault           	bin (15) default ('20'),
    warning_io_rate               	bin (15) default ('0'),
    warning_page_fault            	bin (15) default ('0'),
    critical_interrupts           	bin (15) default ('10'),
    warning_interrupts            	bin (15) default ('0'),
    critical_core                 	bin (15) default ('10'),
    warning_core                  	bin (15) default ('0'),
    critical_used_memory          	bin (15) default ('60'),
    warning_used_memory           	bin (15) default ('80'),
    critical_used_paging          	bin (15) default ('60'),
    warning_used_paging            	bin (15) default ('80'),
    normal_class                  	char (3) default ('NOR'),
    critical_class                	char (3) default ('CRI').
    warning_class                 	char (3) default ('WAR');
    end;
    

    Field Definitions

    logical_name

    A unique name that identifies the Object type. For example, use "pre-market-open" for peak hours.

    active_from-to

    Use the HH:MM format to define the time window for this check. This feature allows multiple settings for peak/low periods.

    cpu_usage

    The percent CPU used by the current module during the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold.

    Example:

    CPU level is at [percent CPU]%.

    empty_idle

    The percent empty-idle used by the current module during the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold.

    Example:

    Empty Idle level is at [percent E.I.]%.

    io_rate

    The critical level of I/O rate performed by the current module during the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold.

    Example:

    I/O rate is at [I/O rate] per second

    page_fault

    The critical level of I/O rate performed by the current module in the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold. AlertManager will use the "performance_critical_class" class code and post the following message: Page Fault rate is at [PF Rate] per second.

    interrupts

    The critical level of interrupt rate performed by the current module in the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold.

    Example:

    Interrupt rate is at [Int. Rate] per second.

    core

    The critical level of percent core used by the current module in the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold.

    Example:

    Core level is at [percent Core]%.

    used_memory

    The critical level of percent-used memory pages. AlertManager will produce an alert when the meter goes beyond this threshold.

    used_paging

    The critical level of percent-used of Paging area. AlertManager will produce an alert when the meter goes beyond this threshold.

    normal_class

    The class/state code that should be assigned to the system performance object when all meters are within the allowed ranges. The code must identify a valid record in the Classes table.

    warning/critical_class

    The class/state code that should be assigned to the system performance object when one of the meters violates its threshold and an alert is produced. The Performance object will remain in the critical state until all meters are back to normal levels. The code must identify a valid record in the Classes table.


    The Processes Table (sps_processes.table)

    The Processes Table governs all aspects of Process monitoring and alerts. You may define multiple records, each identified by a unique logical_name. The following conditions are monitored:

    For every type of meter you may define up to three thresholds: Warning, Severe and Critical.

    organization: relative; 
    index           : 			logical_name  no_duplicates; 
    fields:    
    logical_name        		char (32) var, 
    process_name             		char (32) var, 
    user_name           		char (65) var,      
    active_from    			char (5),
    active_to      			char (5), 
    no_of_processes     		bin(15) default ('1'),
    must_be_up_from   		char (5),
    must_be_up_to       		char (5), 
    must_be_down_from 		char (5),
    must_be_down_to     		char (5),
    critical_cpu        		bin  (15),
    critical_io_rate    		bin  (15),
    critical_memory       		bin  (15), 
    process_normal_class  		char (3) default ('NOR'),
    process_down_critical_class   	char (3) default ('CRI'),
    process_up_critical_class     	char (3) default ('CRI'),
    process_busy_critical_class   	char (3) default ('CRI');
    end;
    

    Field Definitions

    logical_name

    A unique logical name for the Object. Any name that starts with Any_Process allow to create an alerts if any of all running processes exceed the critical_cpu threshold. Multiple Any_Process records can be set for different time-windows using the active_from/active_to fields - for example:

    =logical_name		Any_Process_morning
    =active_from		08:00
    =active_to		17:00
    =critical_cpu		40
     
    =logical_name		Any_Process_night
    =active_from 		17:00
    =active_to		23:00
    =critical_cpu		30
    

    process_name

    A unique name that identifies the process. The name must be identical to the process name as started by VOS. To get an accurate, up-to-date list of valid process names, execute the "list_users" command and record the process names that appear in parenthesis. You may used star-names.

    user_name

    The name or the starname (e.g. *.Operator) of the user that is executing the processes. To get an accurate, up-to-date list of valid user names, execute the "list_users" command. You may used star-names.

    active_from-to

    Use the HH:MM format to define the time window for this check. This feature allows multiple settings for peak/low periods.

    no_of_processes

    The number of processes you expect to be up and running under the same user-name and process-name.

    must_be_up_from-to

    Use the hh:mm format to define the time of day the process is expected to be up and running. Note that if "must_be_up_from" and "must_be_up_to" are not specified, then AlertManager assumes that the process should be up and running at all times. When the process violates this condition, AlertManager will post a message with the "process_up_critical_class" class code along with the message:

    Process [logical_name] is not running.

    must_be_down_from-to

    Use the hh:mm format to define the time of day the process is not expected to execute. This feature is useful to cover situations where a running process may interfere with normal production activity. When the process violates this condition, AlertManager will post a message with the "process_down_critical_class" class code along with the message:

    Process [logical_name] is running; Not intended to run now.

    critical_cpu

    The Critical CPU consumption threshold represented as the percentage CPU by the process(es) every interval. AlertManager will bypass with check if you do not supply a value (zero). To find what is the current consumption of a process:

    When the process violates this condition, then AlertManager will post a message with the "process_busy_critical_class" class code along with the message:

    Process [logical_name] is consuming XX %CPU.

    critical_io_rate

    The I/O rate threshold represented as the number of disk I/Os performed by the process every second. AlertManager will bypass with check if you do not supply a value (zero). To find what is the current consumption of a process:

    Start "list_users -interval 10". Record the typical number of Reads(DDKR) and Writes (DDLW) the process performs. The average I/O rate is:

    DDKR+DDKW / interval-of-list_users (10).

    When the process violates this condition, then AlertManager will post a message with the "process_busy_critical_class" class code along with the message:

    Process [logical_name] performing XX I/Os per second.

    critical_memory

    The critical level of memory utilization as a percentage of the memory pages on the system.

    process_normal_class

    The class/state code that should be assigned to the monitored process when the process is in its Normal state. The code must identify a valid record in the Classes table.

    process_down_critical_class

    The class/state code that should be assigned to the monitored process when the process is in its Critical state, specifically when it is NOT running as expected. The code must identify a valid record in the Classes table. When a process is in violation of this condition, AlertManager produces the following message:

    Process [logical name] is not running.

    process_up_critical_class

    The class/state code that should be assigned to the monitored process when the process is in its Critical state, specifically when it is running during times when it is scheduled to be down. The code must identify a valid record in the Classes table. When a process is in violation of this condition, AlertManager produces the following message:

    Process [logical name] is running; Not intended to run now.

    process_busy_critical_class

    The class/state code that should be assigned to the monitored process when the process is in its Critical state, specifically when it is violating its performance related thresholds.


    The Queues Table (sps_qmon.table)

    The Queues Table governs all aspects of monitoring one-way and two-way and message queues. You may define multiple records, each identified by a unique logical_name.

    organization: relative;
    index: 			logical_name no_duplicates;
    fields:    
    logical_name     		char (32) var,   
    q_path           		char (256) var, 
    active_from    		char (5),  
    active_to      		char (5),
    critical_pending_alert 	bin (15),
    warning_pending_alert 	bin (15),
    normal_class        	char (3) default ('NOR'),
    critical_class      	char (3) default ('CRI'),
    warning_class		char (3) default ('WAR');
    end;
    

    Field Definitions

    logical_name

    A unique logical name for the Object.

    q_path

    The relative or full path-name of the monitored queue.

    active_from-to

    Use the HH:MM format to define the time window for this check. This feature allows multiple settings for peak/low periods.

    pending_alert

    Set the "Pending Message count" threshold. An alert will be produced when the queue's actual pending messages exceeds this limit.

    normal_class

    The class code that AlertManager assigns to monitored queues that are in the normal state. The code must identify a valid record in the Classes table.

    warning/critical_class

    The class code that AlertManager assigns to monitored queues that are in the critical state. The code must identify a valid record in the Classes table. When a monitored queue in violation of its threshold, AlertManager produces the following message:

    Pending message count on [logical name] [# of msgs.].


    The File Watchdog Table (sps_fwd.table)

    The File Watchdog Table governs all aspects of monitoring creation of files during certain timeframes. This is useful if your system receives files from different host and require special processing at that time. You may define multiple records, each identified by a unique logical_name.

    organization: relative; 
    index: 			logical_name  no_duplicates;
    fields:    
    logical_name                char (32) var,      
    active_from                 char (5),       
    active_to                   char (5),    
    input_path                  char (256) var,  
    must_arrive                 bit (1) aligned,
    allow_lockers               bit (1) aligned,
    normal_class                char (3) default ('NOR'),
    file_arrived_class          char (3) default ('FAR'),
    critical_class              char (3) default ('CRI');
    end;
    

    Field Definitions

    logical_name

    A unique logical name for the Object.

    active_from-to

    Use the HH:MM format to define the time window for this check. This feature allows multiple settings for peak/low periods.

    input_path

    The relative or full path-name of the monitored file.

    must_arrive

    Set to "1" if file must appear within the specified time window. If the file does not arrive within the specified window, AlertManager will log and execute the critical_class class-code.

    allow_lockers

    By default, AlertManager will not log or execute the file_arrival_class as long as there is one or more processes locking the file. When the switch is set, AlertManager will execute the file_arrival_class regardless if there are any lockers.

    normal_class

    A class code that AlertManager should set for the object while waiting for the file to arrive.

    file_arrived_class

    A class code that AlertManager should log and execute upon file arrival. Important: It is the responsibility of this class-code to delete or rename the file to prevent unnecessary executions.

    critical_class

    The class/state code that should be assigned to the File Monitor object when the file does not get created within the required time window provided that must_arrive is set.


    The Disks Table (sps_disks.table)

    The Disks Table governs all disk space monitoring and alerts. You may define multiple records, each identified by a unique logical_name. The following conditions are monitored:

    organization: relative;
    index :          			logical_name  no_duplicates;
    fields:    
    logical_name             		char (32) var,      
    path                		char (256) var,
    active_from    			char (5),          
    active_to      			char (5),          
    critical_percent_used    		bin  (15) default ('90'),
    warning_percent_used    		bin  (15) default ('80'),
    critical_io_rate         		bin  (15) default ('0'),
    warning_io_rate         		bin  (15) default ('0'),
    critical_read_busy      		bin  (15) default ('60'),
    warning_read_busy      		bin  (15) default ('40'),
    critical_write_busy     		bin  (15) default ('60'),
    warning_write_busy      		bin  (15) default ('40'),
    file_max_blocks          		bin  (31),
    dir_max_objects          		bin  (15),
    normal_class        		char (3) default ('NOR'),
    critical_class      		char (3) default ('CRI'),
    warning_class       		char (3) default ('WAR'),
    end;
    

    Field Definitions

    logical_name

    A unique logical name for the Object.

    path

    The actual name of the disk (%sys#d01, %sys#d02). Paths can also be any directory or file on the system. In the case of a file, AlertManager will create an alert when the number of block exceed either the warning_percent_used or the critical_percent_used meters using the =file_max_block (start-names are supported). In the case of a directory, AlertManager will create an alert when the number of objects (files,links,dirs) exceed either the warning_percent_used or the critical_percent_used meters using the =dir_max_objects.

    active_from-to

    Use the HH:MM format to define the time window for this check. This feature allows multiple settings for peak/low periods.

    critical_percent_used

    In the case of a disk - the percent of used space. When the disk pack is in violation of this condition, AlertManager produces the following message:

    Disk [logical name] is at [percent used]% used.

    In the case of a file - the percent of disk blocks used compared to =file_max_blocks. When the file is in violation of this condition, AlertManager produces the following message:

    File [logical name] is NNN blocks - NN% of its allowed size.

    In the case of a directory - the percent of number of objects compared to =dir_max_objects. When the dirctory is in violation of this condition, AlertManager produces the following message:

    Dir [logical name] has NNN objects - NN% of its allowed limit.

    critical_io_rate

    The number of I/O rate performed on any given disk on the current module during the current interval. AlertManager will produce an alert when the system meter exceeds the allowed threshold. When a disk pack is in violation of this condition, AlertManager produces the following message:

    Disk [logical name] is busy at [# of I/Os] I/Os per second.

    critical_read_busy

    A threshold for the disk's %read-busy meter.

    critical_write_busy

    A threshold for the disk's %write-busy meter.

    file_max_blocks

    A threshold for file size given in block when monitoring a file. When the file exceeds the given threshold (actual number of blocks reaches the warning/critical% used settings), AlertManager produces the following message:

    File [logical name] is NNN blocks - NN% of its allowed size.

    dir_max_objects

    A threshold for maximum number of objects (files,links,dirs) that is expected for a directory. When the directory exceeds the given threshold (actual number of objects reaches the warning/critical% used settings), AlertManager produces the following message:

    Dir [logical name] has NNN objects - NN% of its allowed limit.

    normal_class

    The class code that AlertManager assigns to monitored disks that are in the normal state. The code must identify a valid record in the Classes table.

    warning/critical_class

    The class code that AlertManager assigns to monitored disks that are in the critical state. The code must identify a valid record in the Classes table.

    The Files Table (sps_files.table)

    The Files Table lists the files that AlertManager should monitor.

    organization: relative;
    index :          		logical_name  no_duplicates;
    fields:    
    logical_name     		char (32)   var,  
    input_path       		char (256)  var, 
    input_event_path 		char (256)  var,  
    input_ip_addr       	char (32) var,
    input_ip_port       	bin (15),
    start_pos        		bin  (15),   
    end_pos			bin  (15),
    tcam_log			bit (1) aligned,    
    
    classify_once		bit (1) aligned,    
    user_exit			char (32) var,
    use_filter_01   		char (32) var,    
    use_filter_02   		char (32) var,    
    use_filter_03   		char (32) var,    
    .
    .
    use_filter_16   		char (32) var;
    include_classes  		char (256) var,
    exclude_classes  		char (256) var;
    end;
    

    Field Definitions

    logical_name

    A unique logical name for the Object.

    input_path

    A path of a file or a 1-way-server-q from which AlertManager will read incoming messages. You may use star names (e.g xxx*.out). AlertManager will automatically use and switch to the most recently created file that matches the star-name. If possible, the user should avoid using star-names. Use star-names when a time-stamp is used as part of the file name(s).

    input_event_path

    The path of an event file that is associated with the input file. For the syserr_log. files, the event file is syserr_log_event. Using this field is optional, if the user does not specify an event file, AlertManager will use the interval parameter.

    input_ip_addr

    The TCP/IP address if Alert Manager is required to read alerts form another computer. This could be a to another system or a network-ed Stratus running AlertManager.

    input_ip_port

    The TCP/IP port number if Alert Manager is required to read alerts form another computer. This could be a to another system or a network-ed Stratus running AlertManager.

    start_pos

    Specifies the position from which AlertManager should examine the messages. The VOS syserr_log files have a time stamp in the first 8 position followed by two spaces. The user should therefore set the starting position to 11.

    end_pos

    Specifies the last position that AlertManager should use when examining the contents of messages. By default, the server will use the entire message buffer.

    tcam_log

    When set (on TCAM system only), AlertManager will strip off leading data fields of the message that are not necessary for display.

    Example:

    07:09:50 CTPS: (TPServer_1) [task 5] tp$server.pm :%INFO- Session starting.

    AlertManager will log:

    TPServer_1(i): Session starting.

    classify_once

    If classify_once is set (the default): In this mode, each message will be classified only once. As soon as the message complies with a filter, any other subsequent filters are not evaluated. Using this feature may simplify the configuration process and is therefore recommended. When using this method, make sure that the more important filters (e.g. Fatal, Critical) will appear first, before any information-level filters.

    If classify_once is NOT set, all filters will be examined. With this approach, a message can comply with one or more filters in which case, they can be classified more than once.

    use_filter_01..16

    The use_filter_XX fields are links to Filter records (see the Filters Table). Filters are used to classify messages. You may use up to 10 filters.

    include_classes, exclude_classes

    Include/exclude are used ONLY when processing an AlertManager log on a remote module. These are lists of class codes separated by commas that you wish to be included or ignored (exclude_classes) by the "central" AlertManager Server.

    an_alert_manager_input

    Set to 1 (yes) if this is an incoming flow of messages that originate from a remote AlertManager server.


    The Filters Table (sps_filters.table)

    The Filters Table allow the user to control which messages are included and which should be filtered out.

    organization: relative;
    index :          		filter_name  no_duplicates;
    fields:    
    filter_name      		char (32)  var, 
    class_code       		char (3),
    match_and_01..16     	char (32)  var,  
    match_or_01..16      	char (32)  var,  
    omit_and_01--16      	char (32)  var, 
    omit_or_01--16       	char (32)  var,  
    msg_prefix       		char (32) var,   
    caseless         		bit  (1) aligned;
    end;
    

    Field Definitions

    filter_name

    A unique name that identifies the Filter.

    class_code

    A code that links the Filter to a Class record. This field is required. AlertManager uses it to classify messages and take certain actions.

    match_and_01..16

    All match_and strings must exist in the message, otherwise AlertManager will ignore the message.

    match_or_01..16

    At least one of the match_or strings must exist in the message, otherwise AlertManager will ignore the message.

    omit_and_01..16

    AlertManager will ignore messages if all omit_and keywords appear in the message.

    omit_or_01..16

    AlertManager will ignore messages if at least one omit_or keywords appear in the message.

    msg_prefix

    The user may specify a message prefix. AlertManager will append this text to the beginning of incoming message. This may be useful for easy message identification.

    caseless

    A switch that defines the case sensitivity. Set to 0 for case sensitive matching (default).


    The WEB Table (sps_web.table)

    The WEB Table list all reports, configuration, log files and directory listings that should be made available via the WEB/Browser interface.

    organization:  relative;
    index       :  logical_name        no_duplicates;
    fields      :
    logical_name   char (32) var,
    category       char (32) var,
    report_path    char (256) var,
    notes          char (256) var,
    acl_name       char (32) var;
    end;
    

    Field Definitions

    logical_name

    A unique name that identifies the Report.

    category

    User-defined categories, for example: reports, configuration, release_notes etc.

    report_path

    The full path name of the report file or files (star-names are supported but limited to 1024 files). At least one of the match_or strings must exist in the message, otherwise AlertManager will ignore the message.

    notes

    Notes, description, instructions etc.

    acl_name

    A name of a FTR control file that governs the user's access to the item. This will be best explained by an example. Consider the following entry:

    /
    =logical_name       my_file
    =category           special
    =report_path        #d01>special>file_1
    =note               This is my file.
    =acl_name           security_1
    

    In the SPS>alert_manager directory, create a security_1.ftr file. If the current user has access (read or write) to the security_1.ftr file, the item will be displayed on the list otherwise it will not. This provides an easy way to control privileges based on the user-id.


    The Reports Table (sps_reports.table)

    The Reports Table is used to define frequently-used reports. Each report is identified with a unique logical name.
    report                        char (32) var,       
    from                          char (32) var,
    to                            char (32) var,
    include_class_NN              char (3),
    exclude_class_NN              char (3),
    sources_NN                    char (32) var,
    modules_NN                    char (32) var,
    match                         char (256) var,
    output_path                   char (256) var,
    email                         char (256) var;
    end;
    

    Field Definitions

    report

    A unique name that identifies the Report. To produce a report execute:

    alert_manager_server.pm -command report -name [report-name]

    from

    A standard-VOS data-time that define the time start date/time selection criteria.

    to

    A standard-VOS data-time that define the time end date/time selection criteria.

    include_class_NN

    A list of one or more classes to be included in the report. Star-names may be used.

    exclude_class_NN

    A list of one or more classes to be excluded in the report. Star-names may be used.

    sources

    You can specify any number of Sources from which messages originated.

    modules

    You can specify any number of module names from which messages originated.

    match

    If you choose a match-string, then only messages that contain the string will appear in the report.

    output_path

    The relative or full path name of the report file.

    email

    You can choose to send the report via you E-Mail Server to selected users. Simply enter their email nick-names as defined in the sps_email_server.table.


    The E-Mail Facility

    The Email Facility sends you and your staff selected reports and performance graphs directly form your Stratus. Automated reports and graph generators with the Email capabilities are now integrated into all SPS reports. For more information and samples, click here.

    You'll need to define your Email Server in the SPS>alert_manager>sps_email_server.tin and then create the table.

    organization:            relative;
    index       :            server_name no_duplicates;
    fields      :
    server_name              char (32) var,
    server_ip_address        char (32) var,
    server_port_number       bin (15) default ('0'),
    user_name                char (32) var,
    password                 char (32) var,
    use_domain               char (32) var default ('softmark.com'),
    nick_name_XX             char (32) var,
    e_address_XX             char (256) var,
    debug                    bit (1) aligned default ('0'),
    trace                    bit (1) aligned default ('0');
    end;
    

    server_name

    A logical name for the Email server. The default settings should not be changed.

    server_ip_address

    The server IP address of the Server. You might need to get this from your Network manager.

    server_port_number

    The server IP address of the Server. You might need to get this from your Network manager.

    user_name

    The user_name is only required if the Email server requires authentication.

    password

    The password is only required if the Email server requires authentication.

    use_domain

    The domain name from which the message will arrive.

    nick_name_xx

    Set up to nick-names for persons authorized to receive reports and alerts from the system.

    e_address_xx

    The corresponding email addresses for everying nick-name that was defiened.

    debug

    Set to 1 if you neet to troubleshoot this service.

    trace

    Set to 1 if you neet to troubleshoot this service.

    Example:

    /
    =server_name                  sps_mail_server
    =server_ip_address            outgoing.myserver.net
    =user_name                    Bob
    =password                     my_pass123
    =nick_name_01                 Joe
    =e_address_01                 job@mycompany.com
    =trace                        0
    


    The AlertManager-Netview/Tivoli interface

    Alert Manager allows you to send alarms and notifications to UNIX/Netview/Tivoli/Openview. You can choose to send all meesages or restrict the message flow to a few selected Class Codes. Message can be formatted using special keywords as follows:

    Example:

    If your Class definition (sps_classes.tin) includes:

    =output_q_message '@time: @msg'

    then your messages on UNIX/Linux will look like the following example:

    Oct 7 14:40:08 localhost 14:40:58: All networked modules are online.

    Installation:

    In SPS>alert_manager>alert_manager.tin define the following fields where the =output_ip_saddr and =outupt_io_port are the IP Address of your UNIX. box and any port number you've assigned to LamsClient program.

    =output_ip_addr                XXX.XX.XX.XX
    =output_ip_port                514
    =output_ip_udp                 1
    


    Setup instructions

    In the sps_files.table you must include the following entry.

    /
    =logical_name       HTML_REPORTS
    =input_ip_addr      0.0.0.0
    =input_ip_port      5001
    =input_http         1
    

    Once you start the Alert Manager Server, you can get all your reports by using the following URL:

    http://[XX.XX.XXX.XXX]:[NNNN]/AlertManager

    XX The-ip-address-of-your-stratus
    NN The port-assigned-in-sps_files

    Example:

    http://68.161.237.188:1024/AlertManager

      To improve reposonse time add the following line as the first line of your start_up.cm:

        &if (index (process_info process_name) 'AlertManager') ^= 0 &then &return


    Using Environments

    Environments are complete or partial sets of configuration files. Using multiple Environments allow running multiple instances of AlertManager servers each working with its own separate configuration set (Environment).

    Example: Let's say you wish to create your own QA environment with your own configuration files, alerts etc. To keep it simple, let's just say that you wish to monitor your own queues. All you need to do is:

    1. Create your own table sps_qmon_QA.tin; create the table using sps_qmon.dd.
    2. Create your own table sps_files_QA.tin; change the port number so that you will not interfere with the main Server; create the table using sps_files.dd.
    3. start_alert_manager.cm QA
    The following table can be set for specific environments: