Browse Source

Documentation/etcd-mixin: Fix EtcdInsufficientMembers alerting

Currently the EtcdInsufficientMembers alert fires, when more than (X/2)-1
instances are unavailable. This fixes it to fire at the correct limit of (X-1)/2
unavailable instances and $value now contains the number of available instances
instead of unavailable ones. Added unit test for EtcdInsufficientMembers alert.
Christian Beneke 7 years ago
parent
commit
c75ba98f81

+ 14 - 0
Documentation/etcd-mixin/README.md

@@ -9,3 +9,17 @@ Instructions for use are the same as the [kubernetes-mixin](https://github.com/k
 ## Background
 
 * For more information about monitoring mixins, see this [design doc](https://docs.google.com/document/d/1A9xvzwqnFVSOZ5fD3blKODXfsat5fg6ZhnKu9LK3lB4/edit#).
+
+## Testing alerts
+
+Make sure to have [jsonnet](https://jsonnet.org/) and [gojsontoyaml](https://github.com/brancz/gojsontoyaml) installed.
+
+First compile the mixin to a YAML file, which the promtool will read:
+```
+jsonnet -e '(import "mixin.libsonnet").prometheusAlerts' | gojsontoyaml > mixin.yaml
+```
+
+Then run the unit test:
+```
+promtool test rules test.yaml
+```

+ 1 - 1
Documentation/etcd-mixin/mixin.libsonnet

@@ -11,7 +11,7 @@
           {
             alert: 'EtcdInsufficientMembers',
             expr: |||
-              count(up{%(etcd_selector)s} == 0) by (job) > (count(up{%(etcd_selector)s}) by (job) / 2 - 1)
+              sum(up{%(etcd_selector)s} == bool 1) by (job) < ((count(up{%(etcd_selector)s}) by (job) + 1) / 2)
             ||| % $._config,
             'for': '3m',
             labels: {

+ 35 - 0
Documentation/etcd-mixin/test.yaml

@@ -0,0 +1,35 @@
+rule_files:
+  - mixin.yaml
+
+evaluation_interval: 1m
+
+tests:
+  - interval: 1m
+    input_series:
+      - series: 'up{job="etcd",instance="10.10.10.0"}'
+        values: '1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0'
+      - series: 'up{job="etcd",instance="10.10.10.1"}'
+        values: '1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0'
+      - series: 'up{job="etcd",instance="10.10.10.2"}'
+        values: '1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0'
+    alert_rule_test:
+      - eval_time: 3m
+        alertname: EtcdInsufficientMembers
+      - eval_time: 7m
+        alertname: EtcdInsufficientMembers
+      - eval_time: 11m
+        alertname: EtcdInsufficientMembers
+        exp_alerts:
+          - exp_labels:
+              job: etcd
+              severity: critical
+            exp_annotations:
+              message: 'Etcd cluster "etcd": insufficient members (1).'
+      - eval_time: 15m
+        alertname: EtcdInsufficientMembers
+        exp_alerts:
+          - exp_labels:
+              job: etcd
+              severity: critical
+            exp_annotations:
+              message: 'Etcd cluster "etcd": insufficient members (0).'