Software User's Guide
The SPS/Alert Manager

Table of contents.

  1. Introduction
  2. Definitions
    1. Objects
    2. Classes
    3. Thresholds
    4. Time Windows
    5. Intervals
    6. Filters
    7. Acknowledgements
    8. Escalation
    9. Carry-Over
    10. Special Keywords
    11. WEB Reports
    12. Features
    13. Environments
  3. Alert Manager Components
  4. Default Settings
  5. The alert_manager_server.pm program
  6. The Alert Manager Console
  7. The Alerts Monitor
  8. The System Usage Monitor
  9. The Disk Space Monitor
  10. The Grouped List-Users Monitor
  11. The List-Users Monitor
  12. The Queue Monitor
  13. The Batch Monitor
  14. The Network Monitor
  15. The Explorer Menu
  16. The Main Control Table (alert_manager.table)
    1. module_nick_name
    2. theme
    3. other_module_nick_name_xx
    4. other_module_ip_addr_xx
    5. output_device
    6. max_allowed_subprocesses
    7. max_carry_over
    8. network_normal_class
    9. network_critical_class
    10. user_logins_class
    11. priv_command_class
    12. netstat_device_NN
    13. netstat_critical
    14. netstat_warning
    15. netstat_normal_class
    16. netstat_critical_class
    17. netstat_warning_class
    18. netstat_options
    19. netstat_states
    20. send_to_syslog_ip
    21. send_to_syslog_port
    22. syslog_udp
    23. send_to_am_q
    24. disks_interval
    25. performance_interval
    26. processes_interval
    27. qmon_interval
    28. fwd_interval
    29. files_interval
    30. tcp_interval
    31. web_session_timeout
    32. web_command_timeout
    33. scaling_factor
    34. web_xxx_interval
    35. web_no_lines
    36. web_lui_no_lines
    37. web_show_scrollbar
    38. web_initial_menu
    39. web_require_vos_pass
    40. web_console_menu_text
    41. web_console_menu_command
    42. drms_options
    43. max_processes
    44. retain_logs
    45. trace
  17. The Classes Table (sps_classes.table)
    1. class_code
    2. days_mask
    3. active_from-to
    4. dups_interval
    5. help_text
    6. wait_for_acknowledgment
    7. acknowledgment_escalation_time
    8. acknowledgment_escalation_class
    9. output_q_message
    10. email_user_1-10
    11. start_job_1-10
    12. scheduler_env
    13. start_command
    14. minimum_reporting_interval
    15. minimum_execution_interval
    16. require_min_occurrences
    17. require_min_occurrences_interval
    18. daily_max
    19. send_to_remote_am
    20. color
  18. The Performance Table (sps_performance.table)
    1. logical_name
    2. active_from-to
    3. cpu_usage
    4. empty_idle
    5. io_rate
    6. page_fault
    7. interrupts
    8. core
    9. used_memory
    10. used_paging
    11. normal_class
    12. warning/critical_class
  19. The Processes Table (sps_processes.table)
    1. logical_name
    2. process_name
    3. user_name
    4. active_from-to
    5. no_of_processes
    6. must_be_up_from-to
    7. must_be_down_from-to
    8. critical_cpu
    9. critical_io_rate
    10. critical_memory
    11. process_down_critical_class
    12. process_up_critical_class
    13. process_busy_critical_class
  20. The Queues Table (sps_qmon.table)
    1. logical_name
    2. q_path
    3. active_from-to
    4. pending_alert
    5. normal_class
    6. warning/critical_class
  21. The File Watchdog Table (sps_fwd.table)
    1. logical_name
    2. active_from-to
    3. input_path
    4. must_exist
    5. must_not_exist
    6. must_arrive
    7. allow_lockers
    8. normal_class
    9. file_arrived_class
    10. critical_class
  22. The Disks Table (sps_disks.table)
    1. logical_name
    2. path
    3. active_from-to
    4. critical_percent_used
    5. critical_io_rate
    6. critical_read_busy
    7. critical_write_busy
    8. file_max_blocks
    9. dir_max_objects
    10. normal_class
    11. warning/critical_class
  23. The Files Table (sps_files.table)
    1. logical_name
    2. input_path
    3. input_ip_addr
    4. input_ip_port
    5. start_pos
    6. end_pos
    7. tcam_log
    8. toggle_status_filters
    9. use_filter_01..16
  24. The Filters Table (sps_filters.table)
    1. filter_name
    2. class_code
    3. match_and_01..16
    4. match_or_01..16
    5. omit_and_01..16
    6. omit_or_01..16
    7. msg_prefix
    8. caseless
  25. The Explorer Table (sps_explorer.table)
    1. logical_name
    2. category
    3. report_path
    4. notes
    5. acl_name
  26. The Reports Table (sps_reports.table)
    1. report
    2. from
    3. to
    4. include_class_NN
    5. exclude_class_NN
    6. sources
    7. modules
    8. match
    9. output_path
    10. email
  27. The E-Mail Facility
    1. server_name
    2. server_ip_address
    3. server_port_number
    4. user_name
    5. password
    6. use_domain
    7. nick_name_xx
    8. e_address_xx
    9. debug
    10. trace
  28. The AlertManager-Netview/Tivoli interface
  29. Setup instructions
  30. Using Environments


Introduction

While operating modern sophisticated systems, it is extremely difficult, if not impossible, to notice and resolve all system and application events as they occur. Applications may produce any number of logs, errors and output files. The operating system alone may generate thousands of messages every day. Often, serious production problems may be prevented if alerts are noticed and responded to in time.

The SPS/Alert Manager is designed to solve this common problem. It monitors system elements, reads messages originating from various systems, modules and applications and funnels the information to a single terminal. Message filters assure that only important information is passed to the operator. Critical alerts cannot be ignored; messages will remain posted on the screen until the problem is attended to, acknowledged and given a proper entry by the operator.

The SPS/Alert Manager automatically executes user-defined recovery and corrective procedures and submits critical information to selected terminals, E-Mail users or other presentation platforms.


Definitions

Objects

An object is any monitored element on the system. For example, CPU level, disk-space, a file, communication line etc. An object is identified with a unique logical-name

Classes (Alerts)

Classes identifies the severity level and define what actions the Server must take for specific conditions. AlertManager assigns each message a class code. A Class code can also be referred to as the state of a monitored Object or simply an Alert. For example CPU level can change from Normal (NOR) to Warning (WAR) and then back to Normal. The user may define any number of new Class codes and use them throughout the system.

Access to certain Classes may restricted on a per-user basis. To restrict access to a class code, create a [class-code].ftr file in the alert_manager directory and set standard access rights (give/remove_access) to determine who has access to this Class.

Thresholds

A set of numeric values that is used to indicate a certain condition. Most Objects, will have three levels of thresholds - normal, warning and critical.

Time Windows

A period during which some known Thresholds apply. Time windows allow the use of different Threshold depending on the expected system load.

Intervals

Intervals are short timeframes, specified in seconds or minutes, between two monitored events.

Filters

Filters are predefined set of keywords and phrases. Filters are used to process operating system and application messages that may appear in any file on the system. Filters determine how messages are classified and treated.

Acknowledgements

Certain important conditions may be configured to require operator's Acknowledgement. AlertManager audits this activity and record the name of the operator and time of the acknowledgement.

Escalation

Messages that require operator's acknowledgement, may be escalated, if not acknowledged in time, to a higher severity level thus triggering different actions.

Carry-Over

Every day at midnight, AlertManager creates a new daily log file. Carry-Over is a process that AlertManager performs at this time, and is basically the copying over of all un-acknowledged messages to the new daily log file.

Special Keywords

AlertManager uses special keywords which are replaced during run-time when recording Alerts and when building command lines for execution (see =start_command). The following Special Keywords are used:

WEB Reports

Alert Manager can be configured to deliver reports, out files, configuration tables and directory listings directly to your web Browser. For more information on this WEB-Server functionality, see The Explorer Table (sps_explorer.table)

Features

Features allow the SysAdmin to tailor the Alert-Manager interface by enabling and disabling different features and functions based on the user name. Features are enabled or disabled by setting up standard access rights (give-access) to the FTR (feature) files. For example: you can remove John's access to the Scheduler interface by removing the Scheduler Tab as follows:

      give_access null Scheduler_tab.ftr -user John_Smith.*

A user can be limited to a single function. For example, Tim might need access only to the Scheduler interface. In this case, Tim will have null access to all FTR files and read access to the Scheduler_tab.ftr and go directly to the Scheduler window as soon as he logs into the system.

Features files:

Alerts_Monitor_tab.ftr
Batch_tab.ftr
cpu_performance_box.ftr
other_modules_box.ftr
Disk_Usage_tab.ftr
drms_operations_menu_box.ftr
DRMS_tab.ftr
Explorer_tab.ftr
List_Users_tab.ftr
Menu_System_tab.ftr
Network_tab.ftr
operations_menu_box.ftr
performance_graphs.ftr
process_breakdown_box.ftr
Queue_Monitor_tab.ftr
recent_status_box.ftr
Scheduler_tab.ftr
System_Usage_tab.ftr
vos_daily_logs_box.ftr

Features can also be used to designate certain Classes of messages to individual users. The idea here is that some users may need to look only at some specific errors while others are interested or should have access to other error types. To create a Feature file for a Class of messages, create a [class-code].ftr file and assign standard access-right (acls) to it. While looking at the Alerts Monitor or when running reports, records with class codes to which the user has no access to will be masked off and will not appear on the Monitor or the report.

Environments

Environments are complete or partial sets of configuration files. Using multiple Environments allow running multiple instances of AlertManager servers each working with its own separate configuration set (Environment).


Alert Manager Components


Click to enlarge.


Default Settings

System Meters
Monitored Object Warning Level Critical Level Alert Message(s)
CPU 40% 80% CPU level at XX%
Core level at XX%.
Empty-Idle 30% 10% Empty Idle level at XX%.
Memory Used 60% 80% Low on memory. Used: XX%.
Paging Used 60% 80% Low on paging area. Used: XX%.
I/O rate Not Set Not Set I/O rate at XX per second.
Page Faults Not Set Not Set Page Faults at XX%.
Interrupts Not Set Not Set Interrupts at XX%.
All SYSTEM meters are within normal range Performance meters are back to normal levels.
 
Other Meters
Monitored Object Warning Level Critical Level Alert Message(s)
Disk Space Used 80% 90% Disk [disk-name] is at XX% used.
Disk Read Busy 40% 60% Disk [disk-name] is read-busy at XX%.
Disk Write Busy 40% 60% Disk [disk-name] is write-busy at XX%.
Disk I/O Rate Not Set Not Set Disk [disk-name] is busy at XX I/O per second.
All DISK meters are within normal range Free space on [disk-name] and I/O rate back to normal levels.
VOS Queues 100 msgs. 200 msgs. Pending message count on [queue-name]is XXX.
Pending message count on [queue-name] is back to normal levels.
Process(es) Conditions: running / not-running on schedule
Resources: CPU, I/O rate, memory, idle too long.
Process[es] [process-name] not running.
Process[es] [process-name] is running; Not intended to run now.
Process[es] [process-name] has been idle in the last XX minute(s).
Process[es] [process-name] consuming XX% cpu.
Process[es] [process-name] performing XXX I/Os per second.
Process[es] [process-name] using XX% of total memory.'
Process[es] [process-name] running now. (as planned!).
Process[es] [process-name] are not running. (as planned!).
File Watchdog Conditions: file arrived / created, file missing File arrived/created: [nick-name]
File [nick-name] has not arrived/created.
VOS System Log (syserr) Conditions: any user-supplied filtering / conditions. Original syserr_log.(date) message.
VOS Security Log (syserr) Conditions: any user-supplied filtering / conditions. Original security_log.(date) message.
Application Logs Conditions: any user-supplied filtering / conditions. Original log message.
 
Security Filters
Filter name Class Code Action taken / Notes
Internal filter LIO Records all login and logout events; records user name and session length in the daily log and in a separate database.
Internal filter PRI Records all privileged commands executed by non-privileged users.
Security_01 S01 Detected changes to login_admin; AM automatically restores the default settings.
Security_02 S02 Detected changes to logout_admin; AM automatically restores the default settings.
Security_03 S03 Detected changes to audit_admin; AM automatically restores the default settings.
Security_04 S04 Detected changes to password-security; AM automatically restores the default settings.
Security_05 S05 Detected changes to system tuning parameters; AM automatically restores the default settings.
Security_06 S06 Detected an unauthorized attempt to modify user's start_up.cm macro. AM automatically logs the user out of the system and optionally bans him from the system.
Security_07 S07 Detected an unauthorized attempt to access one of the SPS audit trace files. AM automatically logs the user out of the system and optionally bans him from the system.
Security_08 S08 Detected an unauthorized or authorized access to a monitored file , marked for object-audit.
Security_09 S09 Detected a failed attempt to log into the system. This includes unauthorized attempts, bad-passwords and users whose account has been terminated trying to get in.
Security_10 S10 Detected an access-right violation - user attempting to access a directory or file to which he has no sufficient access.


The alert_manager_server.pm program

alert_manager_server.pm

Purpose

This is the server that runs in the background. It is highly recommended that it would be started by a privileged user. Start this program by executing start_alert_manager. Note that only one AlertManager Server can run on any given environment. For this reason, the Console is normally configured as an On-Top window - it will reveal itself after a short period if left covered by other windows.

CRT Form

------------------------------ SPS/Alert Manager -----------------------------
vos_console: 0
-command:
-name:

Explanation

You can use the following Commands:


The Alert Manager Console

The Console is the first screen that is presented to the operator. It is used to open all other system monitors but the most significant area is its Recent Status box that lists any objects that are marked as having a problem - i.e, not in their expected Normal State.

As a SysAdmin you can configure the Console to disable selected tabs based on a user name. This is done by giving null access to one or more of the *.tab files. For example, the following command will remove Joe's access to the Alerts Monitor tab: give_access null Alerts_Monitor.tab -user joe_smith.dev


The Alerts Monitor

The Alerts Monitor is the heart of the system. All system and application events that were recorded into the database can researched here. The operator can use the monitor to detect potential problem, react to critical events and acknowledge certain important messages.

The Monitor has two mode or operations:

Image
VOS Alerts

The Search Mode

The Search Mode can be recognized by a new line that offers some input fields - MsgID, Class and Match followed by a Go button. The Search Mode allows the operator the freely move around the database and search for any message(s). The Monitor stops tailing today's log and goes into the Search Mode simply by clicking one of the following:

  1. The Search link
  2. One of the navigation links to change the page displayed - first, last, next, back.
  3. Switching the date displayed - previous-day, next-day links.

In Search Mode you can:

  1. Position on a known record by typing its message-id in MsgID and clicking Go.
  2. Look at a consolidated list of messages by a given Class code. Enter the class code in the Class field and click the Go button.
  3. Using the Match field to look for certain messages.
  4. Switch dates by clicking the navigation arrows next to the log date field.

To switch back to Tail Mode either click the Tail link or simply wait for one minute and the system will switch it automatically to Tail Mode for you.

If there are any messages that require acknowledgement and have not yet been acknowledged, the Pending-Ack counter will turn red and will become a clickable link. Clinking it will present a screen with only Ack-Required messages. The operator can then Ack. individual messages or click the Ack-all-pending link to acknowledge all pending messages in the database.

Creating reports

Clicking the Reports link will open the following form. Note that dates and time fields must follow VOS standards, i.e yy-mm-dd and hh:mm:ss format. To Email the report to an authorized recipient, you may used the Email-To field and specify a Nick-Name as defined in the sps_email_server configuration table.

Here is an example for a HTML-formatted report (click to enlarge):


The System Usage Monitor

The System Usage Monitor shows standard VOS meters: CPU, page-faults, Interrupts, empty-idle, file-io, disk-io over the last 10 seconds, 1 minute, 5 minutes and one hour. The Monitor refreshes the information every 10 seconds. For more information, see the sps_performance configuration table.

The second area of the Monitor shows graphs and information on:

  1. Available Cache
  2. Available memory with a breakdown by System, Cache and User.
  3. Available paging partitions.
  4. TP-rate. Transaction processing rate per second.

The third part of the display lists any performance related conditions that have been identified by Alert Manager. If all meters are within their Normal operating range this area will not be displayed.

Image
System Usage


The Disk Space Monitor

The Disk Space Monitor shows two graphs for every disk on the system - Free/used space and I/O rate with a breakdown of reads and writes. For more information see the sps_disks configuration table. Note the scale for the I/O rate graphs is determined by the =max_disk_io field in the alert_manager.tin configuration file.

The second part of the display lists any disk-related conditions that have been identified by Alert Manager. If all meters are within their Normal operating range this area will not be displayed.

Image
Disk Usage


The Grouped List-Users Monitor

The List-Users monitor is based on logical groups of processes that are defined in the sps_processes configuration file. A Group is defined by a standard VOS user name (*.System) and a process-name. Each group is graphed to show its relative CPU, page-faults, memory consumption as well as I/O rate, and number of processes running.

The information may be sorted by CPU, Faults, Memory, I-O Rate, User-name and number of processes. To change the sort order, simply click on any of the table's headings.

In the example below we can see that the 3 DRMS servers are taking 26.5% of the module's CPU and 8.3% of its memory.

Image
List Users-1

The second part of the display lists any performance related conditions that have been identified by Alert Manager. If all meters are within their Normal operating range this area will not be displayed.

In the example below we can see that the DRMS servers are not running.

Image
List Users-2


The List-Users Monitor

The List-Users (All) monitor lists the processes that are taking to most resources. The information may be sorted by clicking on one of the table's headers: CPU, Faults, Memory, Reads, Writes etc.

Image
List Users-3


The Queue Monitor

The Queue monitor is based on queues defined in the sps_qmon configuration configuration table. Queues can be server queues (1 or 2 way) and message queued.

The information may be sorted by Queue-name, ON-queue, Pending count, Highest, Total, TXN (transaction processing rate). To change the sort order, simply click on any of the table's headings.

In the example below we can see that DRMS-X queues are processing at a rate of 70 messages per second and that there are only one or two messages in the queue (pending).

Image
Queue Monitor-1

The second part of the display lists any performance related conditions that have been identified by Alert Manager. If all meters are within their Normal operating range this area will not be displayed.

In the example below we can see that the DRMS_App queue went into his Critical state because there are 5973 messages pending processing on it.

Image
Queue Monitor-2


The Batch Monitor

The Batch monitor does not require any configuration as it automatically detects the active queues on the system.

Image
Batch Monitor


The Network Monitor

The Network monitor does not require any configuration. It monitors all STCP and UDP activities on the system.

Image
Network Monitor #1

Image
Network Monitor - Devices


The Explorer Menu

The Explorer Menu is created based on a list of allowed files, reports and directory listings as specified in the sps_explorer configuration table.

Image
Explorer-1

In the example below we can see a directory listing all TIN files. You may change the sorting order of the table by clicking any of the table's headings.

Image
Explorer-1


The Main Control Table (alert_manager.table)

The Main Control Table is used to define some global configuration parameters.

organization: relative;
fields:

module_nick_name char (32) var,
theme char (32) var,
other_module_nick_name_xx char (32) var,
other_module_ip_addr_xx char (32) var,

output_device char (66) var,
max_allowed_subprocesses bin (15) default ('5'),
max_carry_over bin (15) default ('-1'),
network_normal_class char (3) default ('NOR'),
network_critical_class char (3) default ('CRI'),
user_logins_class char (3) default ('LIO'),
priv_command_class char (3) default ('PRI'),
netstat_device_01 char (32) var,
netstat_device_02 char (32) var,
netstat_device_03 char (32) var,
netstat_device_04 char (32) var,
netstat_device_05 char (32) var,
netstat_critical bin (15),
netstat_warning bin (15),
netstat_normal_class char (3) default ('NOR'),
netstat_critical_class char (3) default ('CRI'),
netstat_warning_class char (3) default ('WAR'),
netstat_options bin (15) default ('7'),
netstat_states bin (15) default ('1'),
send_to_syslog_ip char (32) var,
send_to_syslog_port bin (15) default ('514'),
syslog_udp bit (1) default ('1'),
send_to_am_q char (256)var,
disks_interval bin (15) default ('60'),
performance_interval bin (15) default ('60'),
processes_interval bin (15) default ('60'),
qmon_interval bin (15) default ('60'),
fwd_interval bin (15) default ('60'),
files_interval bin (15) default ('10'),
tcp_interval bin (15) default ('60'),
web_session_timeout bin (15) default ('10'),
web_command_timeout bin (15) default ('10'),
scaling_factor dec (15,3) default (-1),
web_console_interval bin (15) default ('10'),
web_alerts_interval bin (15) default ('10'),
web_dsu_interval bin (15) default ('10'),
web_ddi_interval bin (15) default ('10'),
web_lui_interval bin (15) default ('10'),
web_qmon_interval bin (15) default ('10'),
web_batch_interval bin (15) default ('10'),
web_network_interval bin (15) default ('60'),
web_no_lines bin (15) default ('20'),
web_lui_no_lines bin (15) default ('20'),
web_show_scrollbar bit,
web_initial_menu char (32) var default ('Operations Menu'),
web_require_vos_pass bit (1),
web_console_menu_text_01 char (32) var,
web_console_menu_text_02 char (32) var,
web_console_menu_text_03 char (32) var,
web_console_menu_text_04 char (32) var,
web_console_menu_text_05 char (32) var,
web_console_menu_command_01 char (256) var,
web_console_menu_command_02 char (256) var,
web_console_menu_command_03 char (256) var,
web_console_menu_command_04 char (256) var,
web_console_menu_command_05 char (256) var,
drms_options bin (15) default (63),
max_processes bin (15) default (2000),
retain_logs bin (15) default (90),
trace bin (15);
end;

Field Definitions

module_nick_name The name of the module. By default, the VOS module-name will be used.
theme The name of the report_template.html to be used. For example if you specify "blue" then the report_template_blue.html file will be used.
other_module_nick_name_xx The name of the remote module that is expected to report to AlertManager. Either a VOS module name or a nick name can be used. Up to 10 modules can be incorporated to this interface. Also a separate AlertManager server running using a different Environment on the same physical module can act as as "remote-module".
other_module_ip_addr_xx The TCP/IP address and port number of the remote module as follows: nnn.nnn.nnn.nn:NN where NN is the port number.
output_device AlertManager may write messages to a dedicated device like a dedicated printer device.
max_allowed_subprocesses Based on configuration and filtering rules the AlertManager server may start additional sub-processes as configured by the user in the sps_clasess.table. To limit the risk of starting too many sub-processes (filtering rules needs adjustment), you may specify a limit beyond AlertManger will not start any additional sub-processes.
max_carry_over max_carry_over defines how ack-required messages are copied over to the next day:

  • -1 Carry over all messages
  • 0 Dont carry over any messages
  • N Carry over N messages
network_normal_class The class/state code that should be assigned when all modules are online.
network_critical_class The class/state code that should be assigned when one or more modules become offline.
user_logins_class If given a valid class code, AlertManager will create entries whenever users log into or out of the system. To disable this feature, remove the class code (blank).
priv_command_class If given a valid class code, AlertManager will create entries whenever users execute a Privileged command. class code (blank).
netstat_device_NN A name of a Streams devices you wish to monitor.
netstat_critical The critical threshold of device utilization as a percentage of the line's capacity.
netstat_warnging The warnging threshold of device utilization as a percentage of the line's capacity.
netstat_normal_class The class code given to a message written to the log once the Streams meters are back to normal levels - below the warning threshold.
netstat_critical_class The class code given to a message written to the log once the Streams meters are higher than the the critical threshold.
netstat_warning_class The class code given to a message written to the log once the Streams meters are higher than the the warning threshold.
netstat_options A 16-bit encoding that sets the initial options in the Network-monitor window as follows:

1 Show TCP connections
2 Show UDP connections
4 Show Statistics
netstat_states A 16-bit encoding that sets the initial State options in the NetworK-monitor window as follows:

Code State
1 ALL
2 LAST_ACK
4 TIME_WAIT
8 CLOSING
16 FIN_WAIT_2
32 CLOSE_WAIT
64 FIN_WAIT_1
128 ESTABLISHED
256 SYN_RECV
512 SYN_SEND
1024 LISTEN
2048 CLOSED
send_to_syslog_ip AlertManager may be configured to send message to a remote Syslog server. Specify the IP address of the target computer.
send_to_syslog_port Specify the port number of the Syslog server (the default is 514).
syslog_udp
If the switch is set to 1 (ON), it will open UDP connection to remote syslog.
If its set to 0 (OFF), it will open TCP connection.
send_to_am_q The path name of the input-q of the primary AlertManager server. All Status messsage and alerts will be forwared to this queue.
disks_interval An interval, specified in seconds, that defines how often system's disks should be checked (see sps_disks table).
performance_interval An interval, specified in seconds, that defines how often system's performance meters should be checked (see sps_performance table).
processes_interval An interval, specified in seconds, that defines how often running processes should be checked (see sps_processes table).
qmon_interval An interval, specified in seconds, that defines how often application queues should be checked (see sps_qmon table).
fwd_interval An interval, specified in seconds, that defines how often the File-Watchdog layer should be evoked to look for new file arrivals(see sps_fwd table).
files_interval An interval, specified in seconds, that defines how often the application log files should checked (see sps_files table).
tcp_interval An interval, specified in seconds, that defines how often the Streams devicesshould checked.
web_session_timeout An interval, specified in minutes , that defines how the session's inactivity timeout after which the user will be forced to log into the system.
web_command_timeout An interval, specified in seconds, that defines the maximum allowed execution time for a command triggered by a Menu item.
scaling_factor The scaling factor to be used on V-Series modules. For more information see VOS documentation on display_system_usage and how to adjust the scaling factor for hyper-threaded processors.

web_xxx_interval An interval, specified in seconds, between each screen refresh. These intervals apply to all Browser sub-windows monitors - console, alerts, system-usage, disk-info, list-users, queue monitor, batch monitor and network monitor.

web_no_lines The number of lines you wish to set for the Alerts sub-window.

web_lui_no_lines The number of lines you wish to set for the List-Users-All sub-window.

web_show_scrollbar By default, most windows are opened without a scrollbar. Set this switch to enable scrollbars.

web_initial_menu The SPS/Menu can be access via the web interface. If you are licensed to use this package, you may specify any existing menu name as the initial menu.

web_require_vos_pass By default, the initial login screen requires the user to enter a valid user-id and a valid password that is checked against VOS registration file. If this is not required, you can reset this switch to zero and the user can just enter any name that will identify him to the system in which case it will not have to be a real user-id and a password will not be required.

web_console_menu_text You may configure up to 5 commands that the user can execute from the console window. Each command has a text description that appears on the screen and the associated VOS command-line.
web_console_menu_command See explanation on web_console_text.
drms_options A 16-bit encoding that sets the initial options for the DRMS window as follows:

1 DRMS Processes
2 CPU Performance
4 User Application Processes
8 Operations Menu
16 Queue Monitor
32 I/O Monitor
64 Critical Files
max_processes The maximum number of processes running on the system.
retain_logs As part of Alert-Manager's Midnight processing, it will automatically purge old logs depending on the number of days specified in retain_logs.
trace Use the following values:

0 no trace
1 events
2 file switch trace
4 messages
8 memory trace
16 communications trace


The Classes Table (sps_classes.table)

The Classes Table defines how messages and alerts should be treated.

organization: relative;
index: class_code no_duplicates;
fields:
class_code char (3),
ignore_on_holidays bit (1) aligned,
days_mask char (7) default ('yyyyyyy'),
active_from char (5),
active_to char (5),
dups_interval bin (15),
help_text char (256) var,
wait_for_acknowledgment bin (15) default ('0'),
acknowledgment_escalation_time bin (15) default ('0'),
acknowledgment_escalation_class char (3) default (''),
output_q_message char (256) var default ('@time: @msg'),
email_user_1 char (32) var,
email_user_2 char (32) var,
email_user_3 char (32) var,
email_user_4 char (32) var,
email_user_5 char (32) var,
email_user_6 char (32) var,
email_user_7 char (32) var,
email_user_8 char (32) var,
email_user_9 char (32) var,
email_user_10 char (32) var,
start_job_1 char (32) var,
start_job_2 char (32) var,
start_job_3 char (32) var,
start_job_4 char (32) var,
start_job_5 char (32) var,
start_job_6 char (32) var,
start_job_7 char (32) var,
start_job_8 char (32) var,
start_job_9 char (32) var,
start_job_10 char (32) var,
scheduler_env char (32) var,
start_command char (256) var,
minimum_reporting_interval bin (15),
minimum_execution_interval bin (15),
require_min_occurrences bin (15),
require_min_occurrences_interval bin (15),
daily_max bin (31) default (2000),
send_to_remote_am bit,
color char (32) var,
end;

Field Definitions

class_code A unique code that identifies the class. You may make up your own codes or use a numeric scheme. Suggested codes are:

  • INF - Information
  • WAR - Warning
  • SEV - Severe
  • CRI - Critical
  • FAT - Fatal
ignore_on_holidays When set, Class processing will suspended during holidays.
days_mask Use to suppress the Class during certain days such as weekends. For example,to suppress on Saturday/Sunday, use nyyyyyn.
active_from-to Use hh:mm format These fields are optional. By default Classes are always active. When disabled,there will be no logging and no action taken for the specified Class.
dups_interval A time in minutes. Any duplicate (IDENTICAL) message that arrive within this interval will be ignored.
help_text If given, the text appear on the terminal's status line (25th) when the users tabs to the message. When using a Browser, clicking on the CLASS code link will open a message box with the help_text.
wait_for_acknowledgment Used to set messages that require the operator's acknowledgement.

  • 0 = No Ack. required.
  • -1 = Wait for Operator's Ack. forever.
  • N = Automatically Ack. message by AlertManager after N seconds.
acknowledgment_escalation_time For alerts that require acknowledgement, you can set an escalation procedure where after a certain time (if not acknowledged), AlertManager will produce another alert with a different class code acknowledgment_escalation_class. acknowledgment_escalation_time is specified in minutes. AlertManager executes the escalation feature every 2 minutes so messages may be escalated within up to 2 minutes of their assigned escalation time.
acknowledgment_escalation_class The new Class code that AlertManager will use for the Escalated message.
output_q_message Send to output_q if given in the sps_lam.table control rec. Also used to reply to a Client reading the database. Standard keywords can be used (see below under start_command).
email_user_1-10 Specify up to 10 email nicknames from sps_email_server.tin to receive AlertManager messages.
start_job_1-10 Specify up to 10 jobs / process-names to be started by the Scheduler. For more information, refer to the The SPS/Application Scheduler & Monitor (PCS) User's Guide.

scheduler_env The name of the Scheduler's Environment that governs the started job(s). For more information, refer to the The SPS/Application Scheduler & Monitor (PCS) User's Guide.

start_command Commands may be started by the Logs Manager. You may include the following keywords that will be substituted at execution time:

  • @ID - The message ID
  • @from - the source of the message
  • @obj - The object's logical name
  • @class - The class code
  • @module- The module name
  • @ack - Y=Yes, N=No.
  • @time - Current time
  • @msg - The message buffer
  • @user - The user name (only for security_log msgs)
minimum_reporting_interval A time interval given in minutes within if a message is classified with this Class code, it will not be logged or executed.
minimum_execution_interval A time interval given in minutes within if a message is classified with this Class code, it will be logged but the associated command (=start_command) if any will not be executed.
require_min_occurrences
require_min_occurrences_interval
The require_min occurrences is used to throttle execution of commands (see =start_command). The interval is specified in minutes.

  Example:

=require_min_occurrences               10
=require_min_occurrences_interval 5
Means that if there are 10 occurrences of the Alert within the last 5 minutes, then the start_command will be executed. Otherwise (less than 10 messages or over more than 5 minutes timeframe, AlertManager will only log the alert and will NOT execute the command. Be extra careful with this feature or you WILL loose important messages.
daily_max The maximum number of messages that can be recorded daily for the class code.
send_to_remote_am If =send_to_remote_am is set, messages will be sent to the Primary AlertManager server via its input queue.
color The unique color you wish to assign the Class. To find out more on choosing colors goto http://www.w3schools.com/html/html_colornames.


The Performance Table (sps_performance.table)

The Performance Table governs all aspects of performance monitoring and alerts. You may define multiple records, each identified by a unique logical_name. Each record may represent a different time frame (see active_from, active_to). The following meters are monitored:

For every type of meter you may define up to 2 thresholds: Warning and Critical.

organization: relative;
index : logical_name no_duplicates;
fields:
logical_name char (32) var,

active_from char (5),
active_to char (5),

critical_cpu_usage bin(15) default ('80'),
warning_cpu_usage bin (15) default ('40'),
critical_empty_idle bin (15) default ('10'),
warning_empty_idle bin (15) default ('30'),
critical_io_rate bin (15) default ('0'),
critical_page_fault bin (15) default ('20'),
warning_io_rate bin (15) default ('0'),
warning_page_fault bin (15) default ('0'),
critical_interrupts bin (15) default ('10'),
warning_interrupts bin (15) default ('0'),
critical_core bin (15) default ('10'),
warning_core bin (15) default ('0'),
critical_used_memory bin (15) default ('60'),
warning_used_memory bin (15) default ('80'),
critical_used_paging bin (15) default ('60'),
warning_used_paging bin (15) default ('80'),
normal_class char (3) default ('NOR'),
critical_class char (3) default ('CRI').
warning_class char (3) default ('WAR');
end;

Field Definitions

logical_name A unique name that identifies the Object type. For example, use "pre-market-open" for peak hours.
active_from-to Use the HH:MM format to define the time window for this check. This feature allows multiple settings for peak/low periods.
cpu_usage The percent CPU used by the current module during the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold.

Example:

CPU level is at [percent CPU]%.

empty_idle The percent empty-idle used by the current module during the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold.

Example:

Empty Idle level is at [percent E.I.]%.

io_rate The critical level of I/O rate performed by the current module during the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold.

Example:

I/O rate is at [I/O rate] per second

page_fault The critical level of I/O rate performed by the current module in the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold. AlertManager will use the "performance_critical_class" class code and post the following message: Page Fault rate is at [PF Rate] per second.
interrupts The critical level of interrupt rate performed by the current module in the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold.

Example:

Interrupt rate is at [Int. Rate] per second.

core The critical level of percent core used by the current module in the last minute. AlertManager will produce an alert when the system meter exceeds the specified threshold.

Example:

Core level is at [percent Core]%.

used_memory The critical level of percent-used memory pages. AlertManager will produce an alert when the meter goes beyond this threshold.
used_paging The critical level of percent-used of Paging area. AlertManager will produce an alert when the meter goes beyond this threshold.
normal_class The class/state code that should be assigned to the system performance object when all meters are within the allowed ranges. The code must identify a valid record in the Classes table.
warning/critical_class The class/state code that should be assigned to the system performance object when one of the meters violates its threshold and an alert is produced. The Performance object will remain in the critical state until all meters are back to normal levels. The code must identify a valid record in the Classes table.


The Processes Table (sps_processes.table)

The Processes Table governs all aspects of Process monitoring and alerts. You may define multiple records, each identified by a unique logical_name. The following conditions are monitored:

For every type of meter you may define up to three thresholds: Warning, Severe and Critical.

organization: relative; 
index : logical_name no_duplicates;
fields:
logical_name char (32) var,
process_name char (32) var,
user_name char (65) var,
active_from char (5),
active_to char (5),
no_of_processes bin(15) default ('1'),
must_be_up_from char (5),
must_be_up_to char (5),
must_be_down_from char (5),
must_be_down_to char (5),
critical_cpu bin (15),
critical_io_rate bin (15),
critical_memory bin (15),
process_normal_class char (3) default ('NOR'),
process_down_critical_class char (3) default ('CRI'),
process_up_critical_class char (3) default ('CRI'),
process_busy_critical_class char (3) default ('CRI');
end;

Field Definitions

logical_name A unique logical name for the Object. Any name that starts with Any_Process allow to create an alerts if any of all running processes exceed the critical_cpu threshold. Multiple Any_Process records can be set for different time-windows using the active_from/active_to fields - for example:

=logical_name              Any_Process_morning
=active_from 08:00
=active_to 17:00
=critical_cpu 40

=logical_name Any_Process_night
=active_from 17:00
=active_to 23:00
=critical_cpu 30

process_name A unique name that identifies the process. The name must be identical to the process name as started by VOS. To get an accurate, up-to-date list of valid process names, execute the "list_users" command and record the process names that appear in parenthesis. You may used star-names.
user_name The name or the starname (e.g. *.Operator) of the user that is executing the processes. To get an accurate, up-to-date list of valid user names, execute the "list_users" command. You may used star-names.
active_from-to Use the HH:MM format to define the time window for this check. This feature allows multiple settings for peak/low periods.
no_of_processes The number of processes you expect to be up and running under the same user-name and process-name.
must_be_up_from-to Use the hh:mm format to define the time of day the process is expected to be up and running. Note that if "must_be_up_from" and "must_be_up_to" are not specified, then AlertManager assumes that the process should be up and running at all times. When the process violates this condition, AlertManager will post a message with the "process_up_critical_class" class code along with the message:

Process [logical_name] is not running.

must_be_down_from-to Use the hh:mm format to define the time of day the process is not expected to execute. This feature is useful to cover situations where a running process may interfere with normal production activity. When the process violates this condition, AlertManager will post a message with the "process_down_critical_class" class code along with the message:

Process [logical_name] is running; Not intended to run now.

critical_cpu The Critical CPU consumption threshold represented as the percentage CPU by the process(es) every interval. AlertManager will bypass with check if you do not supply a value (zero). To find what is the current consumption of a process:

When the process violates this condition, then AlertManager will post a message with the "process_busy_critical_class" class code along with the message:

Process [logical_name] is consuming XX %CPU.

critical_io_rate The I/O rate threshold represented as the number of disk I/Os performed by the process every second. AlertManager will bypass with check if you do not supply a value (zero). To find what is the current consumption of a process:

Start "list_users -interval 10". Record the typical number of Reads(DDKR) and Writes (DDLW) the process performs. The average I/O rate is:

DDKR+DDKW / interval-of-list_users (10).

When the process violates this condition, then AlertManager will post a message with the "process_busy_critical_class" class code along with the message:

Process [logical_name] performing XX I/Os per second.

critical_memory The critical level of memory utilization as a percentage of the memory pages on the system.
process_normal_class The class/state code that should be assigned to the monitored process when the process is in its Normal state. The code must identify a valid record in the Classes table.
process_down_critical_class The class/state code that should be assigned to the monitored process when the process is in its Critical state, specifically when it is NOT running as expected. The code must identify a valid record in the Classes table. When a process is in violation of this condition, AlertManager produces the following message:

Process [logical name] is not running.

process_up_critical_class The class/state code that should be assigned to the monitored process when the process is in its Critical state, specifically when it is running during times when it is scheduled to be down. The code must identify a valid record in the Classes table. When a process is in violation of this condition, AlertManager produces the following message:

Process [logical name] is running; Not intended to run now.

process_busy_critical_class The class/state code that should be assigned to the monitored process when the process is in its Critical state, specifically when it is violating its performance related thresholds.


The Queues Table (sps_qmon.table)

The Queues Table governs all aspects of monitoring one-way and two-way and message queues. You may define multiple records, each identified by a unique logical_name.

organization: relative;
index: logical_name no_duplicates;
fields:
logical_name char (32) var,
q_path char (256) var,
active_from char (5),
active_to char (5),
critical_pending_alert bin (31),
warning_pending_alert bin (31),
normal_class char (3) default ('NOR'),
critical_class char (3) default ('CRI'),
warning_class char (3) default ('WAR');
end;

Field Definitions

logical_name A unique logical name for the Object.
q_path The relative or full path-name of the monitored queue.
active_from-to Use the HH:MM format to define the time window for this check. This feature allows multiple settings for peak/low periods.
pending_alert Set the "Pending Message count" threshold. An alert will be produced when the queue's actual pending messages exceeds this limit.
normal_class The class code that AlertManager assigns to monitored queues that are in the normal state. The code must identify a valid record in the Classes table.
warning/critical_class The class code that AlertManager assigns to monitored queues that are in the critical state. The code must identify a valid record in the Classes table. When a monitored queue in violation of its threshold, AlertManager produces the following message:

Pending message count on [logical name] [# of msgs.].


The File Watchdog Table (sps_fwd.table)

The File Watchdog Table governs all aspects of monitoring creation of files during certain timeframes. This is useful if your system receives files from different host and require special processing at that time. You may define multiple records, each identified by a unique logical_name.

organization: relative; 
index: logical_name no_duplicates;
fields:
logical_name char (32) var,
active_from char (5),
active_to char (5),
input_path char (256) var,
must_exists bit (1) aligned,
must_not_exist bit (1) aligned,
must_arrive bit (1) aligned,
allow_lockers bit (1) aligned,
normal_class char (3) default ('NOR'),
file_arrived_class char (3) default ('FAR'),
critical_class char (3) default ('CRI');
end;

Field Definitions

logical_name A unique logical name for the Object.
active_from-to Use the HH:MM format to define the time window for this check. This feature allows multiple settings for peak/low periods.
input_path The relative or full path-name of the monitored file.
must_exist if =must_exists is set and the file is not there, the =critical_class will be executed. Once the file is created the =normal_class will be executed.
must_not_exist if =must_not_exists is set and the file is there, the =critical_class will be executed. Once the file is deleted the =normal_class will be executed.
must_arrive Set to "1" if file must appear within the specified time window. If the file does not arrive within the specified window, AlertManager will log and execute the critical_class class-code.
allow_lockers By default, AlertManager will not log or execute the file_arrival_class as long as there is one or more processes locking the file. When the switch is set, AlertManager will execute the file_arrival_class regardless if there are any lockers.
normal_class A class code that AlertManager should set for the object while waiting for the file to arrive.
file_arrived_class A class code that AlertManager should log and execute upon file arrival. Important: It is the responsibility of this class-code to delete or rename the file to prevent unnecessary executions.
critical_class The class/state code that should be assigned to the File Monitor object when the file does not get created within the required time window provided that must_arrive is set.


The Disks Table (sps_disks.table)

The Disks Table governs all disk space monitoring and alerts. You may define multiple records, each identified by a unique logical_name. The following conditions are monitored:

organization: relative;
index : logical_name no_duplicates;
fields:
logical_name char (32) var,
path char (256) var,
active_from char (5),
active_to char (5),
critical_percent_used bin (15) default ('90'),
warning_percent_used bin (15) default ('80'),
critical_io_rate bin (15) default ('0'),
warning_io_rate bin (15) default ('0'),
critical_read_busy bin (15) default ('60'),
warning_read_busy bin (15) default ('40'),
critical_write_busy bin (15) default ('60'),
warning_write_busy bin (15) default ('40'),
file_max_blocks bin (31),
dir_max_objects bin (15),
normal_class char (3) default ('NOR'),
critical_class char (3) default ('CRI'),
warning_class char (3) default ('WAR'),
end;

Field Definitions

logical_name A unique logical name for the Object.
path The actual name of the disk (%sys#d01, %sys#d02). Paths can also be any directory or file on the system. In the case of a file, AlertManager will create an alert when the number of block exceed either the warning_percent_used or the critical_percent_used meters using the =file_max_block (start-names are supported). In the case of a directory, AlertManager will create an alert when the number of objects (files,links,dirs) exceed either the warning_percent_used or the critical_percent_used meters using the =dir_max_objects.
active_from-to Use the HH:MM format to define the time window for this check. This feature allows multiple settings for peak/low periods.
critical_percent_used In the case of a disk - the percent of used space. When the disk pack is in violation of this condition, AlertManager produces the following message:

Disk [logical name] is at [percent used]% used.

In the case of a file - the percent of disk blocks used compared to =file_max_blocks. When the file is in violation of this condition, AlertManager produces the following message:

File [logical name] is NNN blocks - NN% of its allowed size.

In the case of a directory - the percent of number of objects compared to =dir_max_objects. When the dirctory is in violation of this condition, AlertManager produces the following message:

Dir [logical name] has NNN objects - NN% of its allowed limit.

critical_io_rate The number of I/O rate performed on any given disk on the current module during the current interval. AlertManager will produce an alert when the system meter exceeds the allowed threshold. When a disk pack is in violation of this condition, AlertManager produces the following message:

Disk [logical name] is busy at [# of I/Os] I/Os per second.

critical_read_busy A threshold for the disk's %read-busy meter.

critical_write_busy A threshold for the disk's %write-busy meter.

file_max_blocks A threshold for file size given in block when monitoring a file. When the file exceeds the given threshold (actual number of blocks reaches the warning/critical% used settings), AlertManager produces the following message:

File [logical name] is NNN blocks - NN% of its allowed size.

dir_max_objects A threshold for maximum number of objects (files,links,dirs) that is expected for a directory. When the directory exceeds the given threshold (actual number of objects reaches the warning/critical% used settings), AlertManager produces the following message:

Dir [logical name] has NNN objects - NN% of its allowed limit.

normal_class The class code that AlertManager assigns to monitored disks that are in the normal state. The code must identify a valid record in the Classes table.
warning/critical_class The class code that AlertManager assigns to monitored disks that are in the critical state. The code must identify a valid record in the Classes table.

The Files Table (sps_files.table)

The Files Table lists the files that AlertManager should monitor.

organization: relative;
index : logical_name no_duplicates;
fields:
logical_name char (32) var,
input_path char (256) var,
input_ip_addr char (32) var,
input_ip_port bin (15),
input_http bit,
start_pos bin (15),
end_pos bin (15),
tcam_log bit (1) aligned,
toggle_status_filters bit (1) aligned,

use_filter_01 char (32) var,
use_filter_02 char (32) var,
use_filter_03 char (32) var,
.
.
use_filter_16 char (32) var;
end;

Field Definitions

logical_name A unique logical name for the Object.
input_path A path of a file or a 1-way-server-q from which AlertManager will read incoming messages. You may use star names (e.g xxx*.out). AlertManager will automatically use and switch to the most recently created file that matches the star-name. If possible, the user should avoid using star-names. Use star-names when a time-stamp is used as part of the file name(s).
input_ip_addr The TCP/IP address if Alert Manager is required to read alerts form another computer. This could be a to another system or a network-ed Stratus running AlertManager.
input_ip_port The TCP/IP port number if Alert Manager is required to read alerts form another computer. This could be a to another system or a network-ed Stratus running AlertManager.
start_pos Specifies the position from which AlertManager should examine the messages. The VOS syserr_log files have a time stamp in the first 8 position followed by two spaces. The user should therefore set the starting position to 11.
end_pos Specifies the last position that AlertManager should use when examining the contents of messages. By default, the server will use the entire message buffer.
tcam_log When set (on TCAM system only), AlertManager will strip off leading data fields of the message that are not necessary for display.

Example:

07:09:50 CTPS: (TPServer_1) [task 5] tp$server.pm :%INFO- Session starting.

AlertManager will log:

TPServer_1(i): Session starting.

toggle_status_filters The idea behind this switch is to create a logical Object that is either in a Critical-State or in a Normal-State. This is like all other objects (disks, cpu etc.) only that now we're dealing with TEXT filters and something that is detected by looking at a log file. The first filter detects the critical mode; the second filter detects the normal mode. In the following example ABC will either be in a A_1 or A_0 mode.

/
=logical_name ABC
=input_path (master_disk)>system>syserr_log.(date)
=start_pos 11
=toggle_status_fitlers 1
=use_filter_01 ABC_BROKEN
=use_filter_02 ABC_FIXED

sps_filters.tin: /=filter_name ABC_BROKEN =class_code A_1 =match_and_01 abc_broken /=filter_name ABC_FIXED =class_code A_0 =match_and_01 abc_fixed

use_filter_01..16 The use_filter_XX fields are links to Filter records (see the Filters Table). Filters are used to classify messages. You may use up to 10 filters.


The Filters Table (sps_filters.table)

The Filters Table allow the user to control which messages are included and which should be filtered out.

organization: relative;
index : filter_name no_duplicates;
fields:
filter_name char (32) var,
class_code char (3),
match_and_01..16 char (32) var,
match_or_01..16 char (32) var,
omit_and_01--16 char (32) var,
omit_or_01--16 char (32) var,
msg_prefix char (32) var,
caseless bit (1) aligned;
end;

Field Definitions

filter_name A unique name that identifies the Filter.
class_code A code that links the Filter to a Class record. This field is required. AlertManager uses it to classify messages and take certain actions.
match_and_01..16 All match_and strings must exist in the message, otherwise AlertManager will ignore the message.
match_or_01..16 At least one of the match_or strings must exist in the message, otherwise AlertManager will ignore the message.
omit_and_01..16 AlertManager will ignore messages if all omit_and keywords appear in the message.
omit_or_01..16 AlertManager will ignore messages if at least one omit_or keywords appear in the message.
msg_prefix The user may specify a message prefix. AlertManager will append this text to the beginning of incoming message. This may be useful for easy message identification.
caseless A switch that defines the case sensitivity. Set to 0 for case sensitive matching (default).


The Explorer Table (sps_explorer.table)

The WEB Table list all reports, configuration, log files and directory listings that should be made available via the WEB/Browser interface.

organization:  relative;
index : logical_name no_duplicates;
fields :
logical_name char (32) var,
category char (32) var,
report_path char (256) var,
notes char (256) var,
acl_name char (32) var;
end;

Field Definitions

logical_name A unique name that identifies the Report.
category User-defined categories, for example: reports, configuration, release_notes etc.
report_path The full path name of the report file or files (star-names are supported but limited to 1024 files). At least one of the match_or strings must exist in the message, otherwise AlertManager will ignore the message.
notes Notes, description, instructions etc.
acl_name A name of a FTR control file that governs the user's access to the item. This will be best explained by an example. Consider the following entry:

/
=logical_name my_file
=category special
=report_path #d01>special>file_1
=note This is my file.
=acl_name security_1

In the SPS>alert_manager directory, create a security_1.ftr file. If the current user has access (read or write) to the security_1.ftr file, the item will be displayed on the list otherwise it will not. This provides an easy way to control privileges based on the user-id.


The Reports Table (sps_reports.table)

The Reports Table is used to define frequently-used reports. Each report is identified with a unique logical name.
report                        char (32) var,       
from char (32) var,
to char (32) var,
include_class_NN char (3),
exclude_class_NN char (3),
sources_NN char (32) var,
modules_NN char (32) var,
match char (256) var,
output_path char (256) var,
email char (256) var;
end;

Field Definitions

report A unique name that identifies the Report. To produce a report execute:

alert_manager_server.pm -command report -name [report-name]

from A standard-VOS data-time that define the time start date/time selection criteria.
to A standard-VOS data-time that define the time end date/time selection criteria.
include_class_NN A list of one or more classes to be included in the report. Star-names may be used.
exclude_class_NN A list of one or more classes to be excluded in the report. Star-names may be used.
sources You can specify any number of Sources from which messages originated.
modules You can specify any number of module names from which messages originated.
match If you choose a match-string, then only messages that contain the string will appear in the report.
output_path The relative or full path name of the report file.
email You can choose to send the report via you E-Mail Server to selected users. Simply enter their email nick-names as defined in the sps_email_server.table.


The E-Mail Facility

The Email Facility sends you and your staff selected reports and performance graphs directly form your Stratus. Automated reports and graph generators with the Email capabilities are now integrated into all SPS reports. For more information and samples, click here.

You'll need to define your Email Server in the SPS>alert_manager>sps_email_server.tin and then create the table.

organization:            relative;
index : server_name no_duplicates;
fields :
server_name char (32) var,
server_ip_address char (32) var,
server_port_number bin (15) default ('0'),
user_name char (32) var,
password char (32) var,
use_domain char (32) var default ('softmark.com'),
nick_name_XX char (32) var,
e_address_XX char (256) var,
debug bit (1) aligned default ('0'),
trace bit (1) aligned default ('0');
end;
server_name A logical name for the Email server. The default settings should not be changed.
server_ip_address The server IP address of the Server. You might need to get this from your Network manager.
server_port_number The server IP address of the Server. You might need to get this from your Network manager.
user_name The user_name is only required if the Email server requires authentication.
password The password is only required if the Email server requires authentication.
use_domain The domain name from which the message will arrive.
nick_name_xx Set up to nick-names for persons authorized to receive reports and alerts from the system.
e_address_xx The corresponding email addresses for everying nick-name that was defiened.
debug Set to 1 if you neet to troubleshoot this service.
trace Set to 1 if you neet to troubleshoot this service.

Example:

/
=server_name sps_mail_server
=server_ip_address outgoing.myserver.net
=user_name Bob
=password my_pass123
=nick_name_01 Joe
=e_address_01 job@mycompany.com
=trace 0


The AlertManager-Netview/Tivoli interface

Alert Manager allows you to send alarms and notifications to UNIX/Netview/Tivoli/Openview. You can choose to send all meesages or restrict the message flow to a few selected Class Codes. Message can be formatted using special keywords as follows:

Example:

If your Class definition (sps_classes.tin) includes:

=output_q_message '@time: @msg'

then your messages on UNIX/Linux will look like the following example:

Oct 7 14:40:08 localhost 14:40:58: All networked modules are online.

Installation:

In SPS>alert_manager>alert_manager.tin define the following fields where the =send_to_syslog_ip and =send_to_syslog_port are the IP Address of your UNIX box and a port number of syslog daemon program. If your syslog daemon is using UDP (mostly the case) then set syslog_udp switch to 1. If it is using TCP, then set syslog_udp switch to 0.

=send_to_syslog_ip               XXX.XX.XX.XX
=send_to_syslog_port             514
=syslog_udp                      1


Setup instructions

In the sps_files.table you must include the following entry.

/
=logical_name HTML_REPORTS
=input_ip_addr 0.0.0.0
=input_ip_port 5001
=input_http 1

Once you start the Alert Manager Server, you can get all your reports by using the following URL:

http://[XX.XX.XXX.XXX]:[NNNN]/AlertManager

XX The-ip-address-of-your-stratus
NN The port-assigned-in-sps_files

Example:

http://68.161.237.188:1024/AlertManager

  To improve reposonse time add the following line as the first line of your start_up.cm:

    &if (index (process_info process_name) 'AlertManager') ^= 0 &then &return


Using Environments

Environments are complete or partial sets of configuration files. Using multiple Environments allow running multiple instances of AlertManager servers each working with its own separate configuration set (Environment).

Example: Let's say you wish to create your own QA environment with your own configuration files, alerts etc. To keep it simple, let's just say that you wish to monitor your own queues. All you need to do is:

  1. Create your own table sps_qmon_QA.tin; create the table using sps_qmon.dd.
  2. Create your own table sps_files_QA.tin; change the port number so that you will not interfere with the main Server; create the table using sps_files.dd.
  3. start_alert_manager.cm QA
The following tables can be set for specific environments: