This pull request adds recovery to the messaging.Topic when opening. If
any partial messages are found then the file is truncated at that point
and started from there. This can occur when ungracefully shutting down
a server. It can leave half written messages at the end of segments.
When a data node starts up, the broker URLs were not set before
they were actually being used. The call to client.Open() in
turn triggers the raft streamer and heartbeat which try to connect
to the broker. If those started before the subsequent client.SetURLs()
call, you would see the following error in the logs at startup:
[messaging] 2015/04/01 11:59:22 reconnecting to broker: url={ <nil> /messaging/messages index=2&streaming=true&topicID=0 }, err=Get /messaging/messages?index=2&streaming=true&topicID=0: unsupported protocol scheme ""
Fixing this race uncovered another bug where the join urls would be
cleared the first time the broker was started. In this case, the
join urls should be left alone since they were set properly w/ SetURLs.
Fixes#2152
This is a pre-requisite for #1934. When running separate
broker and data nodes, you currently need to know what role
a host is performing. This complicates cluster setup in
that you must configure separate broker URLs and data node
URLs.
This change allows a broker only node to redirect data nodes endpoints
to a valid data node and a data only node to redirect broker
endpoints to a valid broker.
This sends data node urls via the broker heartbeat from each data
node. The urls are tracked on the broker to support simpler
cluster setup as well as distributed queries.
This commit adds the "influxd restore" command to the CLI. This allows
a snapshot that has been produced by "influxd backup" to be restored
to a config location and the broker and raft directories will be
bootstrapped based on the state of the snapshot.
This commit fixes the broker recovery so that it determines the last index
from the various topic logs instead of persisting the snapshot on every
message that comes in.
* Update the infludb broker to not need a server so that it'll work on non-data node servers
* Update messaging broker to keep track of connect urls for replicas
This commit refactors the raft package to more loosely couple the individual parts. The clock is now broken into
an interface that signals individual actions in the log. The transport has been mocked to allow more control over
the log tests. The handler's log has been mocked to separate its testing from the log itself.
Panic on travis:
=== RUN TestBroker_Join
[raft] 2015/01/30 06:46:16 log open: created at
/tmp/influxdb-messaging-119432971/raft, with ID 0, term 0, last applied
index of 0
[raft] 2015/01/30 06:46:16 log state change: stopped => leader
[raft] 2015/01/30 06:46:16 log initialize: promoted to 'leader' with
cluster ID 3337066551442961397, log ID 1, term 1
[raft] 2015/01/30 06:46:16 log open: created at
/tmp/influxdb-messaging-071763182/raft, with ID 0, term 0, last applied
index of 0
[raft] 2015/01/30 06:46:17 log state change: stopped => follower
[raft] 2015/01/30 06:46:17 log join: entered 'follower' state for
cluster at http://127.0.0.1:33257 with log ID 2
[raft] 2015/01/30 06:46:17 log state change: follower => follower
[raft] 2015/01/30 06:46:17 log state change: follower => stopped
panic: write to: add stream writer: write to: replica unavailable
goroutine 410 [running]:
github.com/influxdb/influxdb/messaging_test.func·003()
/home/travis/gopath/src/github.com/influxdb/influxdb/messaging/broker_test.go:260
+0x140
created by
github.com/influxdb/influxdb/messaging_test.(*Broker).MustReadAll
/home/travis/gopath/src/github.com/influxdb/influxdb/messaging/broker_test.go:262
+0x14e