Closing the broker before the raft log can trigger this panic since the
raft log depends on the broker via the FSM.
panic: apply: broker apply: broker already closed
goroutine 29164 [running]:
github.com/influxdb/influxdb/raft.(*Log).applier(0xc20833b040, 0xc20802bd40)
/Users/jason/go/src/github.com/influxdb/influxdb/raft/log.go:1386 +0x278
created by github.com/influxdb/influxdb/raft.func·002
/Users/jason/go/src/github.com/influxdb/influxdb/raft/log.go:389 +0x764
By setting it, data node requests can be served by the http handler
before the data node is actually ready.
Possible fix for:
2015/04/14 11:33:54 http: panic serving 10.0.1.8:62661: runtime error: invalid memory address or nil pointer dereference
goroutine 11467 [running]:
net/http.func·011()
/usr/local/go/src/net/http/server.go:1130 +0xcc
github.com/influxdb/influxdb.(*Server).broadcast(0xc20805cc00, 0xc208220000, 0x5d25e0, 0xc208869e80, 0x0, 0x0, 0x0)
/Users/jason/go/src/github.com/influxdb/influxdb/server.go:568 +0x227
github.com/influxdb/influxdb.(*Server).CreateDataNode(0xc20805cc00, 0xc2081c6e70, 0x0, 0x0)
/Users/jason/go/src/github.com/influxdb/influxdb/server.go:859 +0xe6
github.com/influxdb/influxdb/httpd.(*Handler).serveCreateDataNode(0xc20842ea00, 0x19378c0, 0xc2082207e0, 0xc2083191e0)
This commit fixes an issue where the second node joins but the first node
cannot commit because it doesn't have the HTTP endpoint running yet. This
is a side effect of streaming raft since we don't synchronize the quorum
set with the heartbeats. We should cache the config per term in the future.
Refactored query engine to have different processing pipeline for raw queries. This enables queries that have a large offset to not keep everything in memory. It also makes it so that queries against raw data that have a limit will only p
rocess up to that limit and then bail out.
Raw data queries will only read up to a certain point in the map phase before yielding to the engine for further processing.
Fixes#2029 and fixes#2030
This commit changes raft so that term changes are made immediately and
term change signals are made afterward. Previously, election timeouts
were invalidated by incoming term changes which caused an election loop.
Stale term was also fixed and http/pprof was added too.
2015/04/08 22:27:01 no broker or server configured to handle messaging endpoints
2015/04/08 22:27:02 join: failed to connect data node: http://box296:9012: unable to join
2015/04/08 22:27:02 join: failed to connect data node to any specified server
There is a race when joining a data only node to a broker and another data only node between the
data node heartbeater and the join operation. If the heartbeater
fire before the join attempt, it's possible for the booting data node
to be selected as the first data node for redirection by the broker.
The join attempt would request a data node endpoint on the broker "/data_nodes"
but since the broker cannot handle it, it would redirect to a valid broker.
During this race, the broker would redirect the request back to the same server. If
this happens, the data node would get stuck and not be able to join because it's
still booting.
To work around this, the redirect is randonmized and the join calls will not attempt
to call itself and instead re-request the original URL. A better fix might be to
not start the heartbeater until after the datanode has joined or initialized.
If the node is running a broker and a data node, always have the
data node client connect to the local broker since it will already
be initialized or joined.
Removing this option causes issues when deploying influxd
via configuration management. We can now define the same
set of join URLs in the config file across nodes.
This also ensures that the `-flag` option overrides the
config file setting if passed.
The timeout goroutine would continue to run (until the timeout)
even after queryAndWait returned. This causes thousands of extra
goroutines to linger around and makes the test stack traces very
difficult to read.
NewTestConfig() would enable broker and data nodes so running
influxd w/o a config file would start the nodes. If you ran
influxd w/ a config file but did not explicitly set Data.Enabled
or Broker.Enabled, the server would not start. This is not
intuitive when moving to a config file setup.
Instead, broker and data are enabled w/ the config file (like w/o)
and they must be explicitly disabled to run in a data or broker
only mode. This will help w/ backwards compatibility with existing
config files.
If a node is restarted and it had already joined the cluster,
ignore and log that the join urls are being ignored and existing
cluster state will be used.
When starting multiple servers concurrently, they can race to connect
to each other. This change just has the join attempts retry to make
cluster setup easier.
This removes all join URLs from the config. To join a node to a
cluster, the URL of another member of the cluster should be passed
on the command line w/ the -join flag. The join URLs can now be
any node regardless of whether the node is a broker only or data
only node. At join time, the receiving node will redirect the
request to a valid broker or data node if it cannot handle the request
itself.
How a cluster is setup has changed and this test is failing w/
panic: assert failed: invalid initial server id: 2 [recovered]
There is an existing multi-node test w/ a broker and two data
nodes so we're still covering this case and will need to come
back to it.
To add a new data node, it currently needs a broker
and another data node to join. Temporarily adding
a JoinURLs option to the Data node section so a
standalone data node can be created but the intent is
that this will be removed.
Ideally, the the joinURL could point to either a data node
or a broker and it would get the required URLs from that
host but that is not possible currently.
The previous behavior caused "[srvr]" to print out during usage, e.g. in
`influxd help run`:
```
[srvr] 2015/04/06 11:58:04 usage: run [flags]
run starts the broker and data node server....
```
When a data node starts up, the broker URLs were not set before
they were actually being used. The call to client.Open() in
turn triggers the raft streamer and heartbeat which try to connect
to the broker. If those started before the subsequent client.SetURLs()
call, you would see the following error in the logs at startup:
[messaging] 2015/04/01 11:59:22 reconnecting to broker: url={ <nil> /messaging/messages index=2&streaming=true&topicID=0 }, err=Get /messaging/messages?index=2&streaming=true&topicID=0: unsupported protocol scheme ""
Fixing this race uncovered another bug where the join urls would be
cleared the first time the broker was started. In this case, the
join urls should be left alone since they were set properly w/ SetURLs.
Fixes#2152
This is a pre-requisite for #1934. When running separate
broker and data nodes, you currently need to know what role
a host is performing. This complicates cluster setup in
that you must configure separate broker URLs and data node
URLs.
This change allows a broker only node to redirect data nodes endpoints
to a valid data node and a data only node to redirect broker
endpoints to a valid broker.
Refactored query engine to have different processing pipeline for raw queries. This enables queries that have a large offset to not keep everything in memory. It also makes it so that queries against raw data that have a limit will only p
rocess up to that limit and then bail out.
Raw data queries will only read up to a certain point in the map phase before yielding to the engine for further processing.
Fixes#2029 and fixes#2030
This sends data node urls via the broker heartbeat from each data
node. The urls are tracked on the broker to support simpler
cluster setup as well as distributed queries.
This commit adds incremental backup support. Snapshotting from the server
now creates a full backup if one does not exist and creates numbered
incremental backups after that.
For example, if you ran:
$ influxd backup /tmp/snapshot
Then you'll see a full snapshot in /tmp/snapshot. If you run the same
command again then an incremental snapshot will be created at
/tmp/snapshot.0. Running it again will create /tmp/snapshot.1.
This commit adds the "influxd restore" command to the CLI. This allows
a snapshot that has been produced by "influxd backup" to be restored
to a config location and the broker and raft directories will be
bootstrapped based on the state of the snapshot.
This commit adds the backup command to the influxd binary as well as
implements a SnapshotWriter in the influxdb package.
By default the snapshot handler binds to 127.0.0.1 so it cannot be
accessed outside of the local machine.
*config was always non-null, since code at a higher level ensures that a
default config is passed down if no config is specified. So this logic
was pointless.