Commit Graph

249 Commits (c1d6c14c47dfb4d60878a979fbdd8b526281d3d6)

Author SHA1 Message Date
linearb 7212bfce83 add RENAME DATABASE 2015-10-09 13:55:38 -04:00
Philip O'Toole eb28817afe Don't panic when DROPing non-existent nodes 2015-10-05 16:56:19 -07:00
Philip O'Toole 2ac0357406 Support dropping non-Raft nodes 2015-10-04 00:19:52 -07:00
Philip O'Toole d74e0690c7 Revert "Merge pull request #4233 from influxdb/drop-server"
This reverts commit 0bdb36f6dc, reversing
changes made to 3085fbc138.
2015-10-02 08:39:57 -07:00
Cory LaNou e294c89308 use concurrenty safe Peers method 2015-10-02 09:17:15 -05:00
Cory LaNou 42896fee0f use close instead of shutdown 2015-10-02 09:10:01 -05:00
Cory LaNou efc8a57a75 checkRaftState -> enableLocalRaftIfNecessary 2015-10-01 17:13:47 -05:00
Cory LaNou 785ff6a235 use safe call that is already locked 2015-10-01 17:04:08 -05:00
Cory LaNou 2379a0a406 check for nil raftState 2015-10-01 16:54:46 -05:00
Cory LaNou 114b20ec5c need some locking, remove redundant check 2015-10-01 16:52:14 -05:00
Cory LaNou def2551d65 to much code depends on setting this to nil yet 2015-10-01 15:57:55 -05:00
Cory LaNou eaae1cbe24 remove comment 2015-10-01 15:48:17 -05:00
Cory LaNou 98de77eeaf attempt to heal a cluster, shut down raft after removing a node 2015-10-01 15:39:16 -05:00
Cory LaNou c108c6eb96 fix remove peer, add leader to show servers 2015-10-01 15:39:16 -05:00
Cory LaNou 3071eec2ca only the leader can remove a peer 2015-10-01 15:39:16 -05:00
Cory LaNou df08a070f6 do not remove raft files unless it is the node being removed 2015-10-01 15:39:16 -05:00
Cory LaNou 96d63cf9f0 simpler remove peer 2015-10-01 15:39:16 -05:00
Cory LaNou 6f74a64bf6 clarify comment and refactor 2015-10-01 15:39:16 -05:00
Cory LaNou b9a7f51f12 must always test... 2015-10-01 15:39:16 -05:00
Cory LaNou 50ac00f378 log errors for removePeer 2015-10-01 15:39:16 -05:00
Cory LaNou 0191319370 ErrNodeDataLossImminent -> ErrShardNotReplicated 2015-10-01 15:39:15 -05:00
Cory LaNou 76c45fdf55 not dealing with pointers, need to update original 2015-10-01 15:39:15 -05:00
Cory LaNou f50813460e protobuf update.. :-( 2015-10-01 15:39:15 -05:00
Cory LaNou 99da67007d no more shutdown 2015-10-01 15:39:15 -05:00
Cory LaNou 73372ed907 [Ee]xecuteShutdown -> [Mm]onitorShutdown 2015-10-01 15:39:15 -05:00
Cory LaNou 7a3e1f6b27 removing peer wip 2015-10-01 15:39:15 -05:00
Cory LaNou 93507c0b51 add force for drop server, misc fixes, more wip 2015-10-01 15:39:15 -05:00
Cory LaNou e50eb3172a drop server command 2015-10-01 15:39:14 -05:00
Philip O'Toole 591e33b1d8 Initial work, does not address issue 2015-09-30 11:58:42 -07:00
Philip O'Toole 82f866702a Remove obsolete comment 2015-09-28 20:56:43 -07:00
Cory LaNou 72f6f7d268 Merge pull request #4134 from influxdb/issue-3447
Refactor Points and Rows to dedicated packages
2015-09-17 15:27:48 -05:00
Philip O'Toole 9b900a0b3b Add RP create-and-get test cases 2015-09-16 16:21:41 -07:00
Cory LaNou ba830be3b9 actually move influxql.Row* -> models.Row* 2015-09-16 16:32:50 -05:00
Philip O'Toole ca07b86254 Prohibit dropping the default retention policy
This is to prevents users from putting their system into an awkward
state. It is a policy that all databases must have at least a default
retention policy.

Fixes issue #3699.
2015-09-08 23:00:19 -07:00
Philip O'Toole db120f4c0e Log database and retention policy creation 2015-09-08 11:24:59 -07:00
Philip O'Toole 52737917e6 No error required if policy does not exist
This is the same way Database() works, and allows the caller to know it
should access the Raft leader.
2015-09-08 11:23:43 -07:00
Jason Wilder 380d82b078 Fix race in local node creation
It was possible for the metastore Open call to return before it actually
created it's local node.
2015-09-05 09:07:37 -06:00
Jason Wilder 08656f515e Merge pull request #4000 from influxdb/jw-3960
Fix cluster restarting issues
2015-09-04 15:16:51 -06:00
Jason Wilder 3404fe3872 Wait for meta-store to find the leader before returning
The meta-store would open but may not have finished loading the raft log. If write
requests came in, they could fail or be dropped because of missing shard group
info.  This change makes the meta store only return after it has found the leader
and is really ready.

This change also fixed a race in the ClusterRestart test that may be causing it
to fail sporadically.

Fixes #3677 #3960
2015-09-04 14:51:57 -06:00
Cory LaNou d060f3aba9 move all aggregate validations to the parser validation from map/reduce functions 2015-09-04 13:30:40 -05:00
Philip O'Toole cf5a655249 Don't precreate shard groups entirely in past
Fixes issue #3722
2015-09-04 08:31:50 -07:00
Ben Johnson bbc5539517 add SHOW SHARDS statement
This commit adds the ability to list all shards in the cluster
and return their id, start time, end time, expiry time, and
owner ids. Shards are grouped by database.

Fixes #3562
2015-09-03 15:46:52 -06:00
Philip O'Toole a50b7b55f3 Fix race by replacing entire Data instance 2015-09-02 11:20:30 -07:00
Philip O'Toole 14c04eb4d6 Merge pull request #3916 from influxdb/new_stats_diags
Statistics and Diagnostics service
2015-09-01 18:30:53 -07:00
Philip O'Toole f05dc20b58 Hook new monitor service to server
u
2015-09-01 15:03:52 -07:00
Philip O'Toole b423599776 Update comment re shard group creation and sorting
[ci skip]
2015-09-01 13:30:59 -07:00
Philip O'Toole 8f700c36bc Store ShardGroups sorted by time 2015-09-01 13:21:51 -07:00
Ben Johnson 767307eed6 convert meta shard owners to objects
This commit converts meta.ShardInfo.OwnerIDs from a slice of ids
to a slice of objects. This is to support adding statuses for a
shard for a given node. For example, a node may have a shard
assigned to it but it is currently copying the shard and is not
ready to serve data for it.

The old `OwnerIDs` is marked as deprecated, however, the code
still supports loading from older protobuf-encoded data.
2015-08-31 16:33:13 -06:00
Philip O'Toole 1a55951f36 Backend support for database IF NOT EXISTS 2015-08-28 19:04:54 -07:00
Jason Wilder 0286a3e7fe Fix deadlock in metastore
The interaction of continuous query service, the meta-store loading
and initializing raft state, and syncing node info could cause a
deadlock in some instances.  There was an extra read-lock taken by isLeader()
when it already had a read-lock.  Removing this extra lock fixes the startup
deadlock.

Fixes #3607
2015-08-26 14:43:17 -06:00
David Norton fca932b943 skip deleted shard groups 2015-08-25 10:56:41 -04:00
David Norton 88f556af72 convert SHOW MEASUREMENTS to a distributed query 2015-08-23 23:09:51 -04:00
Philip O'Toole 878d7fc5f5 Update shard retention time when policy changes
Fixes issue #3702.
2015-08-19 12:42:05 -07:00
Jason Wilder 0e59568825 Change how cluster is started in tests
Instead of trying to start all the nodes with dynamic peer addresses
set, alwasy start one, then join the rest to this one.  The SetPeers
in the test may be causing leadership changes and sporadic failures.
2015-08-14 13:17:38 -06:00
Jason Wilder 3d203885f0 Remove extraneous join error logging 2015-08-13 16:58:25 -06:00
Jason Wilder 22550dce54 Fix starting a cluster with self in the join urls
If you started a 3 node cluster and passed the same join URLs to
all three nodes, the first node started would not bootstrap correctly.
2015-08-13 16:20:34 -06:00
Jason Wilder 9b16353893 Shutdown raft transport and layer first
The raft.Shutdown() call was deadlocking if operations were still
being applied to the log sometimes.  Change the shutdown behavior to
match how consul shuts down raft:

e37b5ecb69/consul/server.go (L471-L477)
2015-08-13 15:31:31 -06:00
Jason Wilder 5796aec703 Fix race when closing local raft state 2015-08-13 10:02:05 -06:00
Jason Wilder 668181d275 Make log statements more consistent
* Capitalize first letter of message
* Log all services staring consistently
* Remove some extraneous log statements in meta.Store
* Log data dirs for meta, data and hinted handoff
2015-08-13 10:01:42 -06:00
Jason Wilder 29c6094a54 Log raft leader state changes
Make is much easier to determine when a cluster is in a healthly state
as well as who the current leader is.
2015-08-13 10:01:42 -06:00
Jason Wilder 5280b20e66 Make host rename on single node more seemless
Renaming a host that is a raft peer member is pretty difficult but
we can special case single-node renames since we know all the member
in the cluster and we can update the peer store directly on all nodes
(just one).

Fixes #3632
2015-08-13 10:01:42 -06:00
Jason Wilder ffcca1ceff Cap auto-created retention policy replica count at 3
Defaulting to the number of nodes in the cluster is doesn't make
sense with larger clusters. (e.g. 10 nodes = RF 10)
2015-08-12 14:18:02 -06:00
Jason Wilder 17583f7c5d Remove [meta].peers config option
Adding a new peer must happen via the -join flag.
2015-08-12 13:01:27 -06:00
Jason Wilder 3b0b227d31 Wait for raft to close before meta store close returns
Fixes #3516
2015-08-05 15:41:39 -06:00
Jason Wilder e0b25c723d Code review fixes 2015-08-05 14:48:30 -06:00
Jason Wilder b5b8754904 Fix comments and whitespace issues 2015-08-05 14:17:26 -06:00
Jason Wilder d521b625ed Update show servers output to should address, not url
The addresses listed from show servers are not http endpoints so just
show the address without http:// prefix.
2015-08-05 14:17:26 -06:00
Jason Wilder 13052e60f2 Sync hostname to metastore after startup
If the -hostname flag is passed, the node will startup and be accessible from
remote nodes using the specified hostname.  At startup, we attempt to update
the hostname if it's different.  For data-only nodes, this is pretty straight-forward.
For nodes part of the raft cluster, it is much more complicated as the the cluster
must be up and stable (with a leader) for a the update to take place.  The main
complication in this case is that the node starting up will have a different
hostname and will fail to take part of the raft cluster because each other node
does not have this new name in the it's raft peers list.  Since this is very problematic
and very easy to break a cluster, this PR just aborts startup and alerts the operator that
some manual actions must be taken to update the raft peer on all raft members before
the hostname can be fully updated.

Fixes #3421
2015-08-05 14:17:26 -06:00
Jason Wilder 2b76dac479 Don't resolve hostname when creating node
Hostnames were always being resolved to an IP address and the IP
address was used as the host address and raft peer address.  There
was no way to use an actual hostname instead of an IP address.
2015-08-05 14:17:26 -06:00
Jason Wilder ce26a3097a Add UpdateNode raft command 2015-08-05 14:17:26 -06:00
Jason Wilder 90c85cb933 Fix restart single node
Restarting a single node would not bootstrap its raft state
2015-07-28 13:17:59 -06:00
Jason Wilder 95c98d1ab7 Fix data race in WaitForDataChanged 2015-07-28 09:40:25 -06:00
Jason Wilder f5d86b95b3 Add raft column to show servers statement
Reports whether the not is part of the raft consensus cluster or not.
2015-07-28 09:40:25 -06:00
Jason Wilder 06d8ff7c13 Use config.Peers when passing -join flag
Removes the two separate variables in the meta.Config.  -join will
now override the Peers var.
2015-07-28 09:40:25 -06:00
Jason Wilder 2938601e9e Add more meta store cluster tests
* Test add new nodes that become raft peers
* Test restarting a cluster w/ 3 raft nodes and 3 non-raft nodes
2015-07-28 09:40:25 -06:00
Jason Wilder c93e46d569 Support add new raft nodes
This change adds the first 3 nodes to the cluster as raft peers. Other
nodes are data-only.
2015-07-28 09:40:25 -06:00
Jason Wilder f5705aebe1 Rename raftState.openRaft to open 2015-07-28 09:40:25 -06:00
Jason Wilder 9dd66fa4ad Make meta RPC private 2015-07-23 10:21:25 -06:00
Jason Wilder e9044166d6 Invalidate raft member by fetching from leader 2015-07-23 10:21:25 -06:00
Jason Wilder 47b8de7ce8 Hide Meta.Join from config command using toml skip annotation 2015-07-23 10:21:25 -06:00
Jason Wilder eb7d18125e Fix race in test code 2015-07-23 10:21:25 -06:00
Jason Wilder 29011c5cf2 Code review fixes 2015-07-23 10:21:25 -06:00
Jason Wilder b78ac4bf15 Add RPC tests 2015-07-23 10:21:24 -06:00
Jason Wilder 84a8d7d24b Add cluster-tracing option to meta config
Useful for troubleshooting but too verbose for regular use.
2015-07-23 10:21:24 -06:00
Jason Wilder c1fc83e3d5 Make join private so it does not show up in config command 2015-07-23 10:21:24 -06:00
Jason Wilder 29b11a20a2 Support multiple comma-separated join addresses
Will try each once until one succeeds
2015-07-23 10:21:24 -06:00
Jason Wilder 85db9c46e8 Move remaining raft impl details to local raft state 2015-07-23 10:21:24 -06:00
Jason Wilder 790733daad Move snapshot to raft state 2015-07-23 10:21:24 -06:00
Jason Wilder 54e116507f Move apply to raft state 2015-07-23 10:21:24 -06:00
Jason Wilder a9314d6bb7 Move raft index to raft state 2015-07-23 10:21:24 -06:00
Jason Wilder 17a9bb041b Remove raftEnabled func
Not needed since it was just used as a safeguard for seeing if we
are the leader.
2015-07-23 10:21:24 -06:00
Jason Wilder 72e2e1a6f2 Move addPeer to raft state 2015-07-23 10:21:24 -06:00
Jason Wilder 80248f9b53 Remote leaderCh
Not used
2015-07-23 10:21:24 -06:00
Jason Wilder b86fecfd80 Move setPeers to raft state 2015-07-23 10:21:24 -06:00
Jason Wilder 9e4339753f Move leaderCh() to raft state 2015-07-23 10:21:23 -06:00
Jason Wilder 33730da32b Move isLeader to raft state 2015-07-23 10:21:23 -06:00
Jason Wilder fb8a4db74f Move raft closing to localRaft state 2015-07-23 10:21:23 -06:00
Jason Wilder 5ea8342892 Move raft state to separate file
store.go is getting big.
2015-07-23 10:21:23 -06:00
Jason Wilder f3fcfebf83 Make raftState interface private 2015-07-23 10:21:23 -06:00
Jason Wilder a7fa5eb634 Propogate metadata changes from raft nodes to non-raft nodes
Non-raft nodes need to be notifified when the metastore changes. For
example, a database could be dropped on node 1 (non-raft) and node 2
would not know.  Since queries for that database would not be a cache
miss, node 2 would not get updated.

To propogate changes to non-raft nodes, each non-raft node maintains
a blocking connection to a raft node that blocks until a metadata
change occurs.  When the change is triggered, the updated metadata
is returned to the client and the client idempotently updates its local
cache.  It then reconnects and waits for another change.  This is
similar watches in zookeeper or etcd.  Since the blocking request is
always recreated, it also serves as a polling mechanism that will retry
another raft member if the current connection is lost.
2015-07-23 10:21:23 -06:00