Pārlūkot izejas kodu

Merge pull request #7622 from heyitsanthony/faq-disk-leader

Documentation: add disk latency leader loss question to FAQ
Xiang Li 8 gadi atpakaļ
vecāks
revīzija
36735d52a4
1 mainītis faili ar 4 papildinājumiem un 0 dzēšanām
  1. 4 0
      Documentation/faq.md

+ 4 - 0
Documentation/faq.md

@@ -78,6 +78,10 @@ On the other hand, if the downed member is removed from cluster membership first
 
 etcd sets `strict-reconfig-check` in order to reject reconfiguration requests that would cause quorum loss. Abandoning quorum is really risky (especially when the cluster is already unhealthy). Although it may be tempting to disable quorum checking if there's quorum loss to add a new member, this could lead to full fledged cluster inconsistency. For many applications, this will make the problem even worse ("disk geometry corruption" being a candidate for most terrifying).
 
+### Why does etcd lose its leader from disk latency spikes?
+
+This is intentional; disk latency is part of leader liveness. Suppose the cluster leader takes a minute to fsync a raft log update to disk, but the etcd cluster has a one second election timeout. Even though the leader can process network messages within the election interval (e.g., send heartbeats), it's effectively unavailable because it can't commit any new proposals; it's waiting on the slow disk. If the cluster frequently loses its leader due to disk latencies, try [tuning][tuning] the disk settings or etcd time parameters.
+
 ### Performance
 
 #### How should I benchmark etcd?