Go to file
Xiang Li 1b88b71aa2 Merge pull request #114 from coreos/upstream
fix snapshot
2013-09-21 22:24:22 -07:00
protobuf add connectionstring 2013-08-15 16:35:01 -07:00
.gitignore Add gocov support. 2013-05-02 22:11:50 -06:00
.travis.yml Add Travis CI tests for go-raft. 2013-08-02 11:26:12 +02:00
LICENSE LICENSE: initial commit 2013-06-02 20:43:59 -07:00
Makefile fix(misc): update to new URL 2013-08-09 22:19:56 -07:00
README.md fix(misc): update to new URL 2013-08-09 22:19:56 -07:00
append_entries_request.go expose protobuf encoding/decoing API 2013-09-18 21:07:12 -04:00
append_entries_request_test.go expose protobuf encoding/decoing API 2013-09-18 21:07:12 -04:00
append_entries_response.go expose protobuf encoding/decoing API 2013-09-18 21:07:12 -04:00
append_entries_response_test.go expose protobuf encoding/decoing API 2013-09-18 21:07:12 -04:00
command.go [Fix #74] Refactor to use binary log and binary RPCs. 2013-07-17 07:45:53 -06:00
config.go add connectionstring 2013-08-15 16:35:01 -07:00
debug.go change debug prefix from raft to [raft] 2013-07-28 20:19:02 -07:00
http_transporter.go expose protobuf encoding/decoing API 2013-09-18 21:07:12 -04:00
http_transporter_test.go sleep time should not be hardcoded 2013-08-02 03:03:55 -07:00
join_command.go add connectionstring 2013-08-15 16:35:01 -07:00
leave_command.go fix spelling issue 2013-07-26 16:47:51 -07:00
log.go fix snapshot related issue 2013-09-18 00:19:46 -04:00
log_entry.go chore(*.go): fix imports for protobuf 2013-08-09 21:47:51 -07:00
log_test.go move buf to log struct 2013-08-01 17:58:03 -07:00
nop_command.go with assertion problem at server.go L567 2013-07-25 14:26:27 -07:00
peer.go check sendSRR response 2013-09-17 19:18:50 -04:00
request_vote_request.go expose protobuf encoding/decoing API 2013-09-18 21:07:12 -04:00
request_vote_response.go expose protobuf encoding/decoing API 2013-09-18 21:07:12 -04:00
server.go Merge https://github.com/goraft/raft into upstream 2013-09-22 01:16:46 -04:00
server_test.go fix snapshot related issue 2013-09-18 00:19:46 -04:00
snapshot.go Merge pull request #110 from coreos/conf 2013-08-19 08:48:33 -07:00
snapshot_recovery_request.go fix takesnapshot 2013-09-22 01:15:54 -04:00
snapshot_recovery_response.go fix takesnapshot 2013-09-22 01:15:54 -04:00
snapshot_request.go fix takesnapshot 2013-09-22 01:15:54 -04:00
snapshot_response.go fix takesnapshot 2013-09-22 01:15:54 -04:00
sort.go [Fix #47] Clean up external interface. 2013-07-05 22:49:47 -06:00
statemachine.go add comments and gofmt 2013-06-24 09:52:51 -07:00
test.go fix snapshot related issue 2013-09-18 00:19:46 -04:00
time.go Fix timer cleanup. 2013-05-26 18:02:31 -06:00
transporter.go add snapshotting state 2013-07-16 17:40:19 -07:00
z_test.go gofmt 2013-05-26 20:06:08 -06:00

README.md

go-raft

Build Status

Overview

This is a Go implementation of the Raft distributed consensus protocol. Raft is a protocol by which a cluster of nodes can maintain a replicated state machine. The state machine is kept in sync through the use of a replicated log.

For more details on Raft, you can read In Search of an Understandable Consensus Algorithm by Diego Ongaro and John Ousterhout.

Project Status

This library is feature complete but should be considered experimental until it has seen more usage. If you have any questions on implementing go-raft in your project please file an issue. There is an active community of developers who can help. go-raft is under the MIT license.

Features

  • Leader election
  • Log replication
  • Configuration changes
  • Log compaction
  • Unit tests
  • Fast Protobuf Log Encoding
  • HTTP transport

Projects

These projects are built on go-raft:

  • coreos/etcd - A highly-available key value store for shared configuration and service discovery
  • benbjohnson/raftd - A reference implementation for using the go-raft library for distributed consensus.

If you have a project that you're using go-raft in, please add it to this README so others can see implementation examples.

The Raft Protocol

This section provides a summary of the Raft protocol from a high level. For a more detailed explanation on the failover process and election terms please see the full paper describing the protocol: In Search of an Understandable Consensus Algorithm.

Overview

Maintaining state in a single process on a single server is easy. Your process is a single point of authority so there are no conflicts when reading and writing state. Even multi-threaded processes can rely on locks or coroutines to serialize access to the data.

However, in a distributed system there is no single point of authority. Servers can crash or the network between two machines can become unavailable or any number of other problems can occur.

A distributed consensus protocol is used for maintaining a consistent state across multiple servers in a cluster. Many distributed systems are built upon the Paxos protocol but Paxos can be difficult to understand and there are many gaps between Paxos and real world implementation.

An alternative is the Raft distributed consensus protocol by Diego Ongaro and John Ousterhout. Raft is a protocol built with understandability as a primary tenant and it centers around two things:

  1. Leader Election
  2. Replicated Log

With these two constructs, you can build a system that can maintain state across multiple servers -- even in the event of multiple failures.

Leader Election

The Raft protocol effectively works as a master-slave system whereby state changes are written to a single server in the cluster and are distributed out to the rest of the servers in the cluster. This simplifies the protocol since there is only one data authority and conflicts will not have to be resolved.

Raft ensures that there is only one leader at a time. It does this by performing elections among the nodes in the cluster and requiring that a node must receive a majority of the votes in order to become leader. For example, if you have 3 nodes in your cluster then a single node would need 2 votes in order to become the leader. For a 5 node cluster, a server would need 3 votes to become leader.

Replicated Log

To maintain state, a log of commands is maintained. Each command makes a change to the state of the server and the command is deterministic. By ensuring that this log is replicated identically between all the nodes in the cluster we can replicate the state at any point in time in the log by running each command sequentially.

Replicating the log under normal conditions is done by sending an AppendEntries RPC from the leader to each of the other servers in the cluster (called Peers). Each peer will append the entries from the leader through a 2-phase commit process which ensure that a majority of servers in the cluster have entries written to log.

History

Ben Johnson started this library for use in his behavioral analytics database called Sky. He put it under the MIT license in the hopes that it would be useful for other projects too.