Contrail solution sends alarms and UVEs for monitoring and analytics. This provides complete view of the system to administrator. Alarms get created/reset when alarm-rules are satisfied. UVEs are added/updated and removed as internal states change. These objects display current state of the system.
If an alarm gets created and then it is reset after mitigation steps, the system is back to original good state. But the history of alarm triggers/reset over time is lost. Same issue with UVEs.
During alarm and UVE processing, increment set/reset or add/update/remove counters and save it in database. Later this database can be queried using contrail-status script.
None
2 new structures have been added to display alarm and UVE stats.
struct AlarmgenUVEStats {
1: u32 add_count
2: u32 change_count
3: u32 remove_count
}
struct AlarmgenAlarmStats {
1: u32 set_count
2: u32 reset_count
}
objectlog sandesh AlarmgenUpdate {
/** @display_name:Alarmgen UVE Stats */
7: map<string,AlarmgenUVEStats> uve_stats (tags="partition,table,.__key")
/** @display_name:Alarmgen Alarm Stats */
8: map<string,AlarmgenAlarmStats> alarm_stats (tags="partition,table,.__key")
}
Monitor >> Alarms >> Dashboard On current dashboard, we have an alarms page. This displays currently active alarms. In addition, new page would display alarms history with following fields – alarms set/reset. UVE history display for UI is not planned for this release, but it will be available via contrail-stats script.
None
Alarmgen process in analytics package collects set/reset (for alarms) and add/update/remove (for UVE) counters. These are pushed to database for storage. The counters are collected with following dimensions.
- alarm-stats[partition][table][alarm-name]
- uve-stats[partition][table][uve-type] So contrail-stats queries can be done per table (node-type) and uve/alarm-type.
Add support for alarms history display.
None
None
None
struct UVETableCount is removed.
None
contrail-stats --table AlarmgenUpdate.uve_stats
--select Source uve_stats.__key table T=30 "SUM(uve_stats.add_count)" "SUM(uve_stats.change_count)" "SUM(uve_stats.remove_count)"
--where "uve_stats.__key=NodeStatus"
contrail-stats --table AlarmgenUpdate.uve_stats
--select Source uve_stats.__key table uve_stats.add_count uve_stats.change_count uve_stats.remove_count
--where "uve_stats.__key=NodeStatus"
contrail-stats --table AlarmgenUpdate.alarm_stats
--select Source T=30 alarm_stats.__key table "SUM(alarm_stats.set_count)" "SUM(alarm_stats.reset_count)"
--where "alarm_stats.__key=process-connectivity"
contrail-stats --table AlarmgenUpdate.alarm_stats
--select Source alarm_stats.__key table alarm_stats.set_count alarm_stats.reset_count
--where "alarm_stats.__key=process-connectivity"