This change means that lock control can use the defer call, which means
there is no chance the RLock will be left locked at function exit
Previously this code was more complex as it managed locks manually,
since the RLock must be released to allow the "drop series" broadcast
message go through.
With this change a datanode can stream the requested shard to the
client. An error is returned if the shard does not exist or the the
shard is not local to that node.
1 data node can hit this endpoint to request data for a given shard if
the data no longer resides on the broker.
Drop database did not close any open shard files or close
any topic reader/heartbeats. In the tests, we create and drop new
databases during each test run so these were open files and connection
slowed things down and consumed a lot of RAM as the tests progressed.
Adds a Balancer interface to allow RemoteMappers to send data node
requests to multiple nodes. It also provides the ability to failed
requests to mark the data node as offline using exponential
backoff with a 5 min max wait time.
Fixes#2242
This commit changes raft so that term changes are made immediately and
term change signals are made afterward. Previously, election timeouts
were invalidated by incoming term changes which caused an election loop.
Stale term was also fixed and http/pprof was added too.
2015/04/08 22:27:01 no broker or server configured to handle messaging endpoints
2015/04/08 22:27:02 join: failed to connect data node: http://box296:9012: unable to join
2015/04/08 22:27:02 join: failed to connect data node to any specified server
There is a race when joining a data only node to a broker and another data only node between the
data node heartbeater and the join operation. If the heartbeater
fire before the join attempt, it's possible for the booting data node
to be selected as the first data node for redirection by the broker.
The join attempt would request a data node endpoint on the broker "/data_nodes"
but since the broker cannot handle it, it would redirect to a valid broker.
During this race, the broker would redirect the request back to the same server. If
this happens, the data node would get stuck and not be able to join because it's
still booting.
To work around this, the redirect is randonmized and the join calls will not attempt
to call itself and instead re-request the original URL. A better fix might be to
not start the heartbeater until after the datanode has joined or initialized.
3 was fairly arbitrary and would cause errors such as:
2015/04/08 14:01:12 join: failed to connect data node: {http <nil> influxdb.local:8191 }: unable to join
2015/04/08 14:01:12 join: failed to connect data node to any specified server
in the tests. This can happen when the nodes are slow to startup. The limit is set
arbitarily higher to avoid this error but still give up if it can't connect
after a minute.