Add 'Raft in Practice' to README.

pull/820/head
Ben Johnson 2013-10-25 10:47:10 -06:00
parent 90f816546e
commit 1e79a0d5a6
1 changed files with 32 additions and 0 deletions

View File

@ -83,6 +83,38 @@ By ensuring that this log is replicated identically between all the nodes in the
Replicating the log under normal conditions is done by sending an `AppendEntries` RPC from the leader to each of the other servers in the cluster (called Peers). Replicating the log under normal conditions is done by sending an `AppendEntries` RPC from the leader to each of the other servers in the cluster (called Peers).
Each peer will append the entries from the leader through a 2-phase commit process which ensure that a majority of servers in the cluster have entries written to log. Each peer will append the entries from the leader through a 2-phase commit process which ensure that a majority of servers in the cluster have entries written to log.
## Raft in Practice
### Optimal Cluster Size
The primary consideration when choosing the node count in your Raft cluster is the number of nodes that can simultaneously fail.
Because Raft requires a majority of nodes to be available to make progress, the number of node failures the cluster can tolerate is `(n / 2) - 1`.
This means that a 3-node cluster can tolerate 1 node failure.
If 2 nodes fail then the cluster cannot commit entries or elect a new leader so progress stops.
A 5-node cluster can tolerate 2 node failures. A 9-node cluster can tolerate 4 node failures.
It is unlikely that 4 nodes will simultaneously fail so clusters larger than 9 nodes are not common.
Another consideration is performance.
The leader must replicate log entries for each follower node so CPU and networking resources can quickly be bottlenecked under stress in a large cluster.
### Scaling Raft
Once you grow beyond the maximum size of your cluster there are a few options for scaling Raft:
1. *Core nodes with dumb replication.*
This option requires you to maintain a small cluster (e.g. 5 nodes) that is involved in the Raft process and then replicate only committed log entries to the remaining nodes in the cluster.
This works well if you have reads in your system that can be stale.
2. *Sharding.*
This option requires that you segment your data into different clusters.
This option works well if you need very strong consistency and therefore need to read and write heavily from the leader.
If you have a very large cluster that you need to replicate to using Option 1 then you may want to look at performing hierarchical replication so that nodes can better share the load.
## History ## History
Ben Johnson started this library for use in his behavioral analytics database called [Sky](https://github.com/skydb/sky). Ben Johnson started this library for use in his behavioral analytics database called [Sky](https://github.com/skydb/sky).