Browse Source

Merge pull request #444 from philips/document-stats-api

Documentation: document the stats API
Brandon Philips 12 years ago
parent
commit
68546de2cb
1 changed files with 142 additions and 4 deletions
  1. 142 4
      Documentation/api.md

+ 142 - 4
Documentation/api.md

@@ -1,6 +1,6 @@
-## etcd API
+# etcd API
 
-### Running a Single Machine Cluster
+## Running a Single Machine Cluster
 
 These examples will use a single machine cluster to show you the basics of the etcd REST API.
 Let's start etcd:
@@ -13,6 +13,10 @@ This will bring up etcd listening on port 4001 for client communication and on p
 The `-data-dir machine0` argument tells etcd to write machine configuration, logs and snapshots to the `./machine0/` directory.
 The `-name machine` tells the rest of the cluster that this machine is named machine0.
 
+## Key Space Operations
+
+The primary API of etcd is hierarchical key space.
+There are directories and keys which are generically referred to as "nodes".
 
 ### Setting the value to a key
 
@@ -353,7 +357,6 @@ curl -X GET http://127.0.0.1:4001/v2/keys/dir/asdf\?consistent\=true\&wait\=true
 }
 ```
 
-
 ### Atomic Compare-and-Swap (CAS)
 
 Etcd can be used as a centralized coordination service in a cluster and `CompareAndSwap` is the most basic operation used to build a distributed lock service.
@@ -581,7 +584,6 @@ curl -L http://127.0.0.1:4001/v2/keys/dir?recursive=true -XDELETE
 }
 ```
 
-
 ### Creating a hidden node
 
 We can create a hidden key-value pair or directory by add a `_` prefix.
@@ -656,4 +658,140 @@ curl -L http://127.0.0.1:4001/v2/keys/
 
 Here we see the `/message` key but our hidden `/_message` key is not returned.
 
+## Statistics
+
+An etcd cluster keeps track of a number of stastics including latency, bandwidth and uptime.
+These statistics are used in the `/mod/dashboard` to generate tables and graphs about the cluster state.
+
+### Leader Statistics
+
+The leader has a view of the entire cluster and keeps track of two interesting statistics: latency to each peer in the cluster and the number of failed and successful Raft RPC requests.
+You can find grab these stastistics from the `/v2/stats/leader` endpoint:
+
+```sh
+curl -L 127.0.0.1:4001/v2/stats/leader
+```
+
+```json
+{
+    "followers": {
+        "etcd-node1": {
+            "counts": {
+                "fail": 1212,
+                "success": 4163176
+            },
+            "latency": {
+                "average": 2.7206299430775007,
+                "current": 1.486487,
+                "maximum": 2018.410279,
+                "minimum": 1.011763,
+                "standardDeviation": 6.246990702203536
+            }
+        },
+        "etcd-node3": {
+            "counts": {
+                "fail": 1378,
+                "success": 4164598
+            },
+            "latency": {
+                "average": 2.707100125761001,
+                "current": 1.666258,
+                "maximum": 1409.054765,
+                "minimum": 0.998415,
+                "standardDeviation": 5.910089773061448
+            }
+        }
+    },
+    "leader": "etcd-node2"
+}
+```
+
+### Self Statistics
+
+Each node keeps a number of internal statistics:
+
+- `leaderInfo.leader`: name of the current leader machine
+- `leaderInfo.uptime`: amount of time the leader has been leader
+- `name`: this machine's name
+- `recvAppendRequestCnt`: number of append requests this node has processed
+- `recvBandwidthRate`: number of bytes per second this node is receiving (follower only)
+- `recvPkgRate`: number of requests per second this node is receiving (follower only)
+- `sendAppendRequestCnt`: number of requests that this node has sent
+- `sendBandwidthRate`: number of bytes per second this node is receiving (leader only)
+- `sendPkgRate`: number of requests per second this node is receiving (leader only)
+- `state`: either leader or folower
+- `startTime`: the time when this node was started
+
+This is an example response from a follower machine:
+
+```sh
+curl -L 127.0.0.1:4001/v2/stats/self
+```
 
+```json
+{
+    "leaderInfo": {
+        "leader": "machine1",
+        "uptime": "1m18.544996775s"
+    },
+    "name": "machine0",
+    "recvAppendRequestCnt": 5871307,
+    "recvBandwidthRate": 630.3121596542599,
+    "recvPkgRate": 19.272654323628185,
+    "sendAppendRequestCnt": 3175763,
+    "startTime": "2014-01-01T15:26:24.96569404Z",
+    "state": "follower"
+}
+```
+
+And this is an example response from a leader machine:
+
+```sh
+curl -L 127.0.0.1:4001/v2/stats/self
+```
+
+```
+{
+    "leaderInfo": {
+        "leader": "machine0",
+        "uptime": "24.648619798s"
+    },
+    "name": "machine0",
+    "recvAppendRequestCnt": 5901116,
+    "sendAppendRequestCnt": 3212344,
+    "sendBandwidthRate": 1254.3151237301615,
+    "sendPkgRate": 38.71342974475808,
+    "startTime": "2014-01-01T15:26:24.96569404Z",
+    "state": "leader"
+}
+```
+
+### Store Statistics
+
+The store statistics include information about the operations that this node has handled.
+
+Operations that modify the store's state like create, delete, set and update are seen by the entire cluster and the number will increase on all nodes.
+Operations like get and watch are node local and will only be seen on this node.
+
+```sh
+curl -L 127.0.0.1:4001/v2/stats/store
+```
+
+```json
+{
+    "compareAndSwapFail": 0,
+    "compareAndSwapSuccess": 0,
+    "createFail": 0,
+    "createSuccess": 2,
+    "deleteFail": 0,
+    "deleteSuccess": 0,
+    "expireCount": 0,
+    "getsFail": 4,
+    "getsSuccess": 75,
+    "setsFail": 2,
+    "setsSuccess": 4,
+    "updateFail": 0,
+    "updateSuccess": 0,
+    "watchers": 0
+}
+```