Browse Source

Merge pull request #1883 from xiang90/member_migration

doc: add doc for member migration
Xiang Li 11 years ago
parent
commit
091cc237e3
2 changed files with 19 additions and 2 deletions
  1. 13 0
      Documentation/0.5/admin_guide.md
  2. 6 2
      Documentation/0.5/runtime-configuration.md

+ 13 - 0
Documentation/0.5/admin_guide.md

@@ -32,6 +32,19 @@ The data directory has two sub-directories in it:
 If you are spinning up multiple clusters for testing it is recommended that you specify a unique initial-cluster-token for the different clusters.
 This can protect you from cluster corruption in case of mis-configuration because two members started with different cluster tokens will refuse members from each other.
 
+### Member Migration
+
+When there is a scheduled machine maintenance or retirement, you might want to migrate an etcd member to another machine without losing the data and changing the member ID. 
+
+The data directory contains all the data to recover a member to its point-in-time state. To migrate a member:
+
+* Stop the member process
+* Copy the data directory of the now-idle member to the new machine
+* Update the peer URLs for that member to reflect the new machine according to the [member api] [change peer url]
+* Start etcd on the new machine, using the same configuration and the copy of the data directory
+
+[change peer url]: https://github.com/coreos/etcd/blob/master/Documentation/0.5/other_apis.md#change-the-peer-urls-of-a-member 
+
 ### Disaster Recovery
 
 etcd is designed to be resilient to machine failures. An etcd cluster can automatically recover from any number of temporary failures (for example, machine reboots), and a cluster of N members can tolerate up to _(N/2)-1_ permanent failures (where a member can no longer access the cluster, due to hardware failure or disk corruption). However, in extreme circumstances, a cluster might permanently lose enough members such that quorum is irrevocably lost. For example, if a three-node cluster suffered two simultaneous and unrecoverable machine failures, it would be normally impossible for the cluster to restore quorum and continue functioning.

+ 6 - 2
Documentation/0.5/runtime-configuration.md

@@ -6,12 +6,16 @@ etcd comes with support for incremental runtime reconfiguration, which allows us
 
 Let us walk through the four use cases for re-configuring a cluster: replacing a member, increasing or decreasing cluster size, and restarting a cluster from a majority failure.
 
-### Replace a Member
+### Replace a Non-recoverable Member
 
-The most common use case of cluster reconfiguration is to replace a member because of a permanent failure of the existing member: for example, hardware failure, loss of network address, or data directory corruption.
+The most common use case of cluster reconfiguration is to replace a member because of a permanent failure of the existing member: for example, hardware failure or data directory corruption.
 It is important to replace failed members as soon as the failure is detected.
 If etcd falls below a simple majority of members it can no longer accept writes: e.g. in a 3 member cluster the loss of two members will cause writes to fail and the cluster to stop operating.
 
+If you want to migrate an running member to another machine, please refer [member migration section][member migration].
+
+[member migration]: https://github.com/coreos/etcd/blob/master/Documentation/0.5/admin_guide.md#member-migration
+
 ### Increase Cluster Size
 
 To make your cluster more resilient to machine failure you can increase the size of the cluster.