|
|
@@ -15,16 +15,16 @@ etcd now exposes the following metrics:
|
|
|
|
|
|
## etcdserver
|
|
|
|
|
|
-| Name | Description | Type |
|
|
|
-|-----------------------------------------|--------------------------------------------------|---------|
|
|
|
-| file_descriptors_used_total | The total number of file descriptors used | Gauge |
|
|
|
-| proposal_durations_milliseconds | The latency distributions of committing proposal | Summary |
|
|
|
-| pending_proposal_total | The total number of pending proposals | Gauge |
|
|
|
-| proposal_failed_total | The total number of failed proposals | Counter |
|
|
|
+| Name | Description | Type |
|
|
|
+|-----------------------------------------|--------------------------------------------------|-----------|
|
|
|
+| file_descriptors_used_total | The total number of file descriptors used | Gauge |
|
|
|
+| proposal_durations_seconds | The latency distributions of committing proposal | Histogram |
|
|
|
+| pending_proposal_total | The total number of pending proposals | Gauge |
|
|
|
+| proposal_failed_total | The total number of failed proposals | Counter |
|
|
|
|
|
|
High file descriptors (`file_descriptors_used_total`) usage (near the file descriptors limitation of the process) indicates a potential out of file descriptors issue. That might cause etcd fails to create new WAL files and panics.
|
|
|
|
|
|
-[Proposal](glossary.md#proposal) durations (`proposal_durations_milliseconds`) give you an summary about the proposal commit latency. Latency can be introduced into this process by network and disk IO.
|
|
|
+[Proposal](glossary.md#proposal) durations (`proposal_durations_seconds`) give you a histogram about the proposal commit latency. Latency can be introduced into this process by network and disk IO.
|
|
|
|
|
|
Pending proposal (`pending_proposal_total`) gives you an idea about how many proposal are in the queue and waiting for commit. An increasing pending number indicates a high client load or an unstable cluster.
|
|
|
|
|
|
@@ -32,12 +32,12 @@ Failed proposals (`proposal_failed_total`) are normally related to two issues: t
|
|
|
|
|
|
## wal
|
|
|
|
|
|
-| Name | Description | Type |
|
|
|
-|------------------------------------|--------------------------------------------------|---------|
|
|
|
-| fsync_durations_microseconds | The latency distributions of fsync called by wal | Summary |
|
|
|
-| last_index_saved | The index of the last entry saved by wal | Gauge |
|
|
|
+| Name | Description | Type |
|
|
|
+|------------------------------------|--------------------------------------------------|-----------|
|
|
|
+| fsync_durations_seconds | The latency distributions of fsync called by wal | Histogram |
|
|
|
+| last_index_saved | The index of the last entry saved by wal | Gauge |
|
|
|
|
|
|
-Abnormally high fsync duration (`fsync_durations_microseconds`) indicates disk issues and might cause the cluster to be unstable.
|
|
|
+Abnormally high fsync duration (`fsync_durations_seconds`) indicates disk issues and might cause the cluster to be unstable.
|
|
|
|
|
|
|
|
|
## http requests
|
|
|
@@ -73,22 +73,22 @@ Example Prometheus queries that may be useful from these metrics (across all etc
|
|
|
|
|
|
## snapshot
|
|
|
|
|
|
-| Name | Description | Type |
|
|
|
-|--------------------------------------------|------------------------------------------------------------|---------|
|
|
|
-| snapshot_save_total_durations_microseconds | The total latency distributions of save called by snapshot | Summary |
|
|
|
+| Name | Description | Type |
|
|
|
+|--------------------------------------------|------------------------------------------------------------|-----------|
|
|
|
+| snapshot_save_total_durations_seconds | The total latency distributions of save called by snapshot | Histogram |
|
|
|
|
|
|
-Abnormally high snapshot duration (`snapshot_save_total_durations_microseconds`) indicates disk issues and might cause the cluster to be unstable.
|
|
|
+Abnormally high snapshot duration (`snapshot_save_total_durations_seconds`) indicates disk issues and might cause the cluster to be unstable.
|
|
|
|
|
|
|
|
|
## rafthttp
|
|
|
|
|
|
-| Name | Description | Type | Labels |
|
|
|
-|-----------------------------------|--------------------------------------------|---------|--------------------------------|
|
|
|
-| message_sent_latency_microseconds | The latency distributions of messages sent | Summary | sendingType, msgType, remoteID |
|
|
|
-| message_sent_failed_total | The total number of failed messages sent | Summary | sendingType, msgType, remoteID |
|
|
|
+| Name | Description | Type | Labels |
|
|
|
+|-----------------------------------|--------------------------------------------|--------------|--------------------------------|
|
|
|
+| message_sent_latency_seconds | The latency distributions of messages sent | HistogramVec | sendingType, msgType, remoteID |
|
|
|
+| message_sent_failed_total | The total number of failed messages sent | Summary | sendingType, msgType, remoteID |
|
|
|
|
|
|
|
|
|
-Abnormally high message duration (`message_sent_latency_microseconds`) indicates network issues and might cause the cluster to be unstable.
|
|
|
+Abnormally high message duration (`message_sent_latency_seconds`) indicates network issues and might cause the cluster to be unstable.
|
|
|
|
|
|
An increase in message failures (`message_sent_failed_total`) indicates more severe network issues and might cause the cluster to be unstable.
|
|
|
|