10 years ago · 2efbc76689
--- a/Documentation/glossary.md
+++ b/Documentation/glossary.md
@@ -22,6 +22,10 @@ The node in each member follows raft consensus protocol to replicate logs. Clust
 
				 
			
 
				 Peer is another member of the same cluster.
			
 
				 
			
 
				+### Proposal
			
 
				+
			
 
				+A proposal is a request (for example a write request, a configuration change request) that needs to go through raft protocol.
			
 
				+
			
 
				 ### Client
			
 
				 
			
 
				 Client is a caller of the cluster's HTTP API.
			
--- a/Documentation/metrics.md
+++ b/Documentation/metrics.md
@@ -0,0 +1,48 @@
 
				+## Metrics
			
 
				+
			
 
				+**NOTE: The metrics feature is considered as an experimental. We might add/change/remove metrics without warning in the future releases.**
			
 
				+
			
 
				+etcd uses [Prometheus](http://prometheus.io/) for metrics reporting in the server. The metrics can be used for real-time monitoring and debugging.
			
 
				+
			
 
				+The simplest way to see the available metrics is to cURL the metrics endpoint `/metrics` of etcd. The format is described [here](http://prometheus.io/docs/instrumenting/exposition_formats/).
			
 
				+
			
 
				+
			
 
				+You can also follow the doc [here](http://prometheus.io/docs/introduction/getting_started/) to start a Promethus server and monitor etcd metrics.
			
 
				+
			
 
				+The naming of metrics follows the suggested [best practice of Promethus](http://prometheus.io/docs/practices/naming/). A metric name has an `etcd` prefix as its namespace and a subsystem prefix (for example `wal` and `etcdserver`).
			
 
				+
			
 
				+etcd now exposes the following metrics:
			
 
				+
			
 
				+### etcdserver
			
 
				+
			
 
				+| Name                                    | Description                                      | Type    |
			
 
				+|-----------------------------------------|--------------------------------------------------|---------|
			
 
				+| file_descriptors_used_total             | The total number of file descriptors used        | Gauge   |
			
 
				+| proposal_durations_milliseconds         | The latency distributions of committing proposal | Summary |
			
 
				+| pending_proposal_total                  | The total number of pending proposals            | Gauge   |
			
 
				+| proposal_failed_total                   | The total number of failed proposals             | Counter |
			
 
				+
			
 
				+High file descriptors (`file_descriptors_used_total`) usage (near the file descriptors limitation of the process) indicates a potential out of file descriptors issue. That might cause etcd fails to create new WAL files and panics.
			
 
				+
			
 
				+[Proposal](glossary.md#proposal) durations (`proposal_durations_milliseconds`) give you an summary about the proposal commit latency. Latency can be introduced into this process by network and disk IO.
			
 
				+
			
 
				+Pending proposal (`pending_proposal_total`) gives you an idea about how many proposal are in the queue and waiting for commit. An increasing pending number indicates a high client load or an unstable cluster.
			
 
				+
			
 
				+Failed proposals (`proposal_failed_total`) are normally related to two issues: temporary failures related to a leader election or longer duration downtime caused by a loss of quorum in the cluster.
			
 
				+
			
 
				+### wal
			
 
				+
			
 
				+| Name                               | Description                                      | Type    |
			
 
				+|------------------------------------|--------------------------------------------------|---------|
			
 
				+| fsync_durations_microseconds       | The latency distributions of fsync called by wal | Summary |
			
 
				+| last_index_saved                   | The index of the last entry saved by wal         | Gauge   |
			
 
				+
			
 
				+Abnormally high fsync duration (`fsync_durations_microseconds`) indicates disk issues and might cause the cluster to be unstable.
			
 
				+
			
 
				+### snapshot
			
 
				+
			
 
				+| Name                                       | Description                                                | Type    |
			
 
				+|--------------------------------------------|------------------------------------------------------------|---------|
			
 
				+| snapshot_save_total_durations_microseconds | The total latency distributions of save called by snapshot | Summary |
			
 
				+
			
 
				+Abnormally high snapshot duration (`snapshot_save_total_durations_microseconds`) indicates disk issues and might cause the cluster to be unstable.