Drop database did not close any open shard files or close
any topic reader/heartbeats. In the tests, we create and drop new
databases during each test run so these were open files and connection
slowed things down and consumed a lot of RAM as the tests progressed.
If a data node requests a topic index that is earier than is present for
a topic, tombstones allow the broker to know that the data node should
be redirected to another node that has the topic's data already
replicated. If no tombstone exists, then the broker can simply restart
replaying the topic data it has.
This commit changes raft so that term changes are made immediately and
term change signals are made afterward. Previously, election timeouts
were invalidated by incoming term changes which caused an election loop.
Stale term was also fixed and http/pprof was added too.
This commit changes the binary format of messaging.Message to encode
a 4-byte checksum at the beginning of it. This is used when reading
data back out to verify that it is not corrupt.
Corrupted messages are truncated on recovery so the broker can
restart from the previous message.
This pull request adds recovery to the messaging.Topic when opening. If
any partial messages are found then the file is truncated at that point
and started from there. This can occur when ungracefully shutting down
a server. It can leave half written messages at the end of segments.
When a data node starts up, the broker URLs were not set before
they were actually being used. The call to client.Open() in
turn triggers the raft streamer and heartbeat which try to connect
to the broker. If those started before the subsequent client.SetURLs()
call, you would see the following error in the logs at startup:
[messaging] 2015/04/01 11:59:22 reconnecting to broker: url={ <nil> /messaging/messages index=2&streaming=true&topicID=0 }, err=Get /messaging/messages?index=2&streaming=true&topicID=0: unsupported protocol scheme ""
Fixing this race uncovered another bug where the join urls would be
cleared the first time the broker was started. In this case, the
join urls should be left alone since they were set properly w/ SetURLs.
Fixes#2152
This is a pre-requisite for #1934. When running separate
broker and data nodes, you currently need to know what role
a host is performing. This complicates cluster setup in
that you must configure separate broker URLs and data node
URLs.
This change allows a broker only node to redirect data nodes endpoints
to a valid data node and a data only node to redirect broker
endpoints to a valid broker.
This sends data node urls via the broker heartbeat from each data
node. The urls are tracked on the broker to support simpler
cluster setup as well as distributed queries.