Browse Source

Merge pull request #3889 from gyuho/raft_doc.go_20151118

raft: doc, debugging instruction on MessageType
Gyu-Ho Lee 10 years ago
parent
commit
53d6aede82
1 changed files with 98 additions and 1 deletions
  1. 98 1
      raft/doc.go

+ 98 - 1
raft/doc.go

@@ -13,7 +13,8 @@
 // limitations under the License.
 
 /*
-Package raft provides an implementation of the raft consensus algorithm.
+Package raft sends and receives messages in the Protocol Buffer format
+defined in the raftpb package.
 
 Raft is a protocol with which a cluster of nodes can maintain a replicated state machine.
 The state machine is kept in sync through the use of a replicated log.
@@ -184,5 +185,101 @@ cannot be removed any more since the cluster cannot make progress.
 For this reason it is highly recommended to use three or more nodes in
 every cluster.
 
+MessageType
+
+Package raft sends and receives message in Protocol Buffer format (defined
+in raftpb package). Each state (follower, candidate, leader) implements its
+own 'step' method ('stepFollower', 'stepCandidate', 'stepLeader') when
+advancing with the given raftpb.Message. Each step is determined by its
+raftpb.MessageType. Note that every step is checked by one common method
+'Step' that safety-checks the terms of node and incoming message to prevent
+stale log entries:
+
+	'MsgHup' is used for election. If a node is a follower or candidate, the
+	'tick' function in 'raft' struct is set as 'tickElection'. If a follower or
+	candidate has not received any heartbeat before the election timeout, it
+	passes 'MsgHup' to its Step method and becomes (or remains) a candidate to
+	start a new election.
+
+	'MsgBeat' is an internal type that signals leaders to send a heartbeat of
+	the 'MsgHeartbeat' type. If a node is a leader, the 'tick' function in
+	the 'raft' struct is set as 'tickHeartbeat', and sends periodic heartbeat
+	messages of the 'MsgBeat' type to its followers.
+
+	'MsgProp' proposes to append data to its log entries. This is a special
+	type to redirect proposals to leader. Therefore, send method overwrites
+	raftpb.Message's term with its HardState's term to avoid attaching its
+	local term to 'MsgProp'. When 'MsgProp' is passed to the leader's 'Step'
+	method, the leader first calls the 'appendEntry' method to append entries
+	to its log, and then calls 'bcastAppend' method to send those entries to
+	its peers. When passed to candidate, 'MsgProp' is dropped. When passed to
+	follower, 'MsgProp' is stored in follower's mailbox(msgs) by the send
+	method. It is stored with sender's ID and later forwarded to leader by
+	rafthttp package.
+
+	'MsgApp' contains log entries to replicate. A leader calls bcastAppend,
+	which calls sendAppend, which sends soon-to-be-replicated logs in 'MsgApp'
+	type. When 'MsgApp' is passed to candidate's Step method, candidate reverts
+	back to follower, because it indicates that there is a valid leader sending
+	'MsgApp' messages. Candidate and follower respond to this message in
+	'MsgAppResp' type.
+
+	'MsgAppResp' is response to log replication request('MsgApp'). When
+	'MsgApp' is passed to candidate or follower's Step method, it responds by
+	calling 'handleAppendEntries' method, which sends 'MsgAppResp' to raft
+	mailbox.
+
+	'MsgVote' requests votes for election. When a node is a follower or
+	candidate and 'MsgHup' is passed to its Step method, then the node calls
+	'campaign' method to campaign itself to become a leader. Once 'campaign'
+	method is called, the node becomes candidate and sends 'MsgVote' to peers
+	in cluster to request votes. When passed to leader or candidate's Step
+	method and the message's Term is lower than leader's or candidate's,
+	'MsgVote' will be rejected ('MsgVoteResp' is returned with Reject true).
+	If leader or candidate receives 'MsgVote' with higher term, it will revert
+	back to follower. When 'MsgVote' is passed to follower, it votes for the
+	sender only when sender's last term is greater than MsgVote's term or
+	sender's last term is equal to MsgVote's term but sender's last committed
+	index is greater than or equal to follower's.
+
+	'MsgVoteResp' contains responses from voting request. When 'MsgVoteResp' is
+	passed to candidate, the candidate calculates how many votes it has won. If
+	it's more than majority (quorum), it becomes leader and calls 'bcastAppend'.
+	If candidate receives majority of votes of denials, it reverts back to
+	follower.
+
+	'MsgSnap' requests to install a snapshot message. When a node has just
+	become a leader or the leader receives 'MsgProp' message, it calls
+	'bcastAppend' method, which then calls 'sendAppend' method to each
+	follower. In 'sendAppend', if a leader fails to get term or entries,
+	the leader requests snapshot by sending 'MsgSnap' type message.
+
+	'MsgSnapStatus' tells the result of snapshot install message. When a
+	follower rejected 'MsgSnap', it indicates the snapshot request with
+	'MsgSnap' had failed from network issues which causes the network layer
+	to fail to send out snapshots to its followers. Then leader considers
+	follower's progress as probe. When 'MsgSnap' were not rejected, it
+	indicates that the snapshot succeeded and the leader sets follower's
+	progress to probe and resumes its log replication.
+
+	'MsgHeartbeat' sends heartbeat from leader. When 'MsgHeartbeat' is passed
+	to candidate and message's term is higher than candidate's, the candidate
+	reverts back to follower and updates its committed index from the one in
+	this heartbeat. And it sends the message to its mailbox. When
+	'MsgHeartbeat' is passed to follower's Step method and message's term is
+	higher than follower's, the follower updates its leaderID with the ID
+	from the message.
+
+	'MsgHeartbeatResp' is a response to 'MsgHeartbeat'. When 'MsgHeartbeatResp'
+	is passed to leader's Step method, the leader knows which follower
+	responded. And only when the leader's last committed index is greater than
+	follower's Match index, the leader runs 'sendAppend` method.
+
+	'MsgUnreachable' tells that request(message) wasn't delivered. When
+	'MsgUnreachable' is passed to leader's Step method, the leader discovers
+	that the follower that sent this 'MsgUnreachable' is not reachable, often
+	indicating 'MsgApp' is lost. When follower's progress state is replicate,
+	the leader sets it back to probe.
+
 */
 package raft