Sunday, May 6, 2012

Creating Alert Recovery with Hyperic HQApi

Some doc on the Alert Condition topic

In general, you should setup two AlertDefinitions, a [Down] and a [Fixed - Down] wchich recovers [Down]:

[Down]
If Condition: JMS Destination Availability = 0.0%
Enable Action(s): Each time conditions are met.
Generate one alert and then disable alert definition until fixed

and

[Fixed - Down]
If Condition: JMS Destination Availability = 100.0%
Recovery Alert: for [Down]
Enable Action(s): Each time conditions are met.


The AlertDefinitions are:

<AlertDefinition mtime="1336300002609" ctime="1336300002609" id="10737" name="[Down]" description="" priority="2" enabled="true" active="true" frequency="0" count="0" range="0" willRecover="true" notifyFiltered="false" controlFiltered="false">
        <Resource id="12469" name="/tmp/hyperictestdir"/>
        <AlertCondition required="true" type="1" thresholdValue="0.0" thresholdComparator="=" thresholdMetric="Availability"/>
    </AlertDefinition>
    <AlertDefinition mtime="1336300075132" ctime="1336300075132" id="10738" name="[Fixed- Down]" description="" priority="2" enabled="true" active="true" frequency="0" count="0" range="0" willRecover="false" notifyFiltered="false" controlFiltered="false">
        <Resource id="12469" name="/tmp/hyperictestdir"/>
        <AlertCondition required="true" type="1" thresholdValue="1.0" thresholdComparator="=" thresholdMetric="Availability"/>
        <AlertCondition required="true" type="5" recover="[Down]" recoverId="10737"/>
    </AlertDefinition>



The bit "Generate one alert and then disable alert definition until fixed" correspond to the attribute "willRecover=true" in the [Down] alert.

If you create an alert whenever a FileServerDirectory (or any other resource) goes DOWN, you will get alerts all the time until you fix it. You might want to have the alert triggered only once, and then another UP alert when it's fixed.

In the first scenario, you do:
Resource res = .... (retrieve or create the resource, can be a platform, a server or a service)
String name = "my first alert";
// create the AlertDefinition
AlertDefinition alertDefinition = new AlertDefinition();
alertDefinition.setName(name);
alertDefinition.setDescription(name);
alertDefinition.setPriority(AlertPriority.MEDIUM.getPriority());
alertDefinition.setActive(true);
alertDefinition.setWillRecover(true);

// assign the AlertDefinition to the Resource
alertDefinition.setResource(res);
alertDefinition.setEnabled(true);

// create the AlertCondition
AlertCondition thresholdCondition0 = AlertDefinitionBuilder.createThresholdCondition(true, ALERT_METRIC_AVAILABILITY, AlertComparator.EQUALS, 0.0);

// assign the AlertCondition to the AlertDefinition 
alertDefinition.getAlertCondition().add(thresholdCondition0);

List definitions = new ArrayList();
definitions.add(alertDefinition);

AlertDefinitionsResponse response = api.syncAlertDefinitions(definitions);

So far it works, but you will get an alert a minute (depending on your aggregation time)

To enable recovery, you must also:


AlertDefinition recoveryDefinition = .... create another AlertDefinition
AlertCondition recoveryCondition = AlertDefinitionBuilder.createRecoveryCondition(true, alertDefinition);
AlertCondition thresholdCondition100 = AlertDefinitionBuilder.createThresholdCondition(true, ALERT_METRIC_AVAILABILITY, AlertComparator.EQUALS, 1.0);
recoveryDefinition.getAlertCondition().add(thresholdCondition100);
recoveryDefinition.getAlertCondition().add(recoveryCondition);


and add also this recoveryDefinition to the List of AlertDefinition to sync:


definitions.add(recoveryDefinition);
AlertDefinitionsResponse response = api.syncAlertDefinitions(definitions);


This way you have 2 Alerts for the same resource, one triggers on DOWN and the other on UP to automatically fix the DOWN alert.

No comments: