Commit Graph

219 Commits (fd840f242c7a3afc90e9198d68b4a97225f78299)

Author SHA1 Message Date
Ben Johnson eb221a5adb Merge pull request #5663 from benbjohnson/query-executor
Refactor QueryExecutor (WIP)
2016-02-17 16:30:20 -07:00
Ben Johnson e3b4b71c13 refactor query executor
This commit moves the `QueryExecutor` to the `cluster` package
and provides an interface to it inside the `influxql` package.
2016-02-17 15:13:56 -07:00
Ady af17b9d8a7 Zero timeout set to support all platforms 2016-02-16 21:21:39 +05:30
Ady 58cccaa202 Update timeout for Dial, in order to work in Windows 2016-02-15 22:22:44 +05:30
Ady b850c13dc5 Modify Dial timeout in test to 0 nano sec to make it pass in Windows 2016-02-15 21:40:13 +05:30
Ady 80a2874361 Merge branch 'master' of https://github.com/influxdata/influxdb into mvadu-patch-ErrDialTimeout
Accomodate name change from influxdb->influxdata
2016-02-13 23:01:01 +05:30
Ady 0a00a59f28 test with more data points 2016-02-11 07:50:15 +05:30
Ady e68dd9a85a Modify WriteTLV with debug prints. Revert chnages to shard_writer 2016-02-11 07:29:47 +05:30
Ben Johnson d9a6a7340f add canonical paths 2016-02-10 11:30:52 -07:00
Ben Johnson 5a0d1ab7c1 rename influxdb/influxdb to influxdata/influxdb
This commit changes all the import and URL references from:

    github.com/influxdb/influxdb

to:

    github.com/influxdata/influxdb
2016-02-10 10:26:18 -07:00
Ben Johnson 5c33b9d786 remove Mapper test references 2016-02-10 09:40:30 -07:00
Ben Johnson cde973f409 refactor query engine 2016-02-10 09:40:24 -07:00
Adarsha 1c7d29b79b Add 1ns delay
Windows does not timeout when the timeout is set to 1Nano sec. This causes TestShardWriter_Write_ErrDialTimeout to fail in Windows.

```go
=== RUN TestShardWriter_Write_ErrDialTimeout
[cluster] 2016/02/10 09:32:30 Starting cluster service
[cluster] 2016/02/10 09:32:30 accept remote connection from 127.0.0.1:57034
[cluster] 2016/02/10 09:32:30 unable to read type-length-value read message type: WSARecv tcp 127.0.0.1:57033: use of closed network connection
[cluster] 2016/02/10 09:32:30 close remote connection from 127.0.0.1:57034
[cluster] 2016/02/10 09:32:30 cluster service accept error: network connection closed
--- FAIL: TestShardWriter_Write_ErrDialTimeout (0.00s)
	shard_writer_test.go:162: expected error <nil>, to contain i/o timeout
```
2016-02-10 09:34:53 +05:30
Michael Mattioli 28f80a79e7 Removed unused code from balancer.go
Removed a block of code that was commented out
without explanation and provides no benefit
whatsoever
2016-02-03 21:33:28 -05:00
Jason Wilder d54f930c2d Don't parse points twice when receiving remote writes
The monitoring stats were causing points to be parsed twice create
more cpu time just parsing points.
2016-01-27 14:24:56 -07:00
Jason Wilder 47c5ade858 Use faster point parsing for remote writes
Parsing the line protocol again on the receiving side of the remote
write consumes a lot cpu.  This uses a different marshaling format
that is much faster to parse after we already parsed the point on
the write side.
2016-01-27 14:24:09 -07:00
Jason Wilder 5abdb42a7d Use a bounded pool for remote writes
Under highly conncurrent write load, the coordinating node would
create a connection to any other node that is part of the replica
group.  Since each connection can be expensive, OOM sitations could
occur because there was no bounds on the number of new connections
that would be created.  If writes on a remote node were slow, connections
could pile up an exacerbate the problem.

This switches the pool to be bounded and has a checkout that is blocking
with a timeout.  If a connection is available, it's returned immediately.
If the pool still has room for more connections, it will create one if needed.
Otherwise, the call will block until a connection becomes available or
the timeout expires.  In the case of a timeout, it is propogated back up
to the PointsWriter that determine what do return to the client.
2016-01-26 17:08:36 -07:00
Jason Wilder f58f0f5373 Fix cluster tests 2016-01-21 15:28:34 -05:00
Jason Wilder e901b648a6 Use TCPHost for writing and query to other nodes 2016-01-21 15:28:34 -05:00
Paul Dix fb9181d240 Fix meta-service build 2016-01-21 15:28:33 -05:00
Paul Dix f385945058 Update Server to work with new metaservice/client 2016-01-21 15:28:33 -05:00
Cory LaNou 8d878fff91 buildable meta -> services/meta 2016-01-21 15:28:32 -05:00
liang@qiniu.com c13f8e9128 Fix wrong results of distributed aggregative query 2015-12-08 07:08:46 +08:00
liang@qiniu.com 4026236659 fix issue 4801 2015-11-18 00:56:05 +08:00
Nathaniel Cook 1719a6107c PointsWriter will drop writes to subscriber service for any in-flight writes 2015-11-05 16:25:00 -07:00
Philip O'Toole de7919240f Migrate internal stats to consistent names
Go style -- and existing runtime stats -- do not use underscores, but
instead use camel case. This change makes the internal stats adhere to
that convention.
2015-10-28 21:07:45 -07:00
Philip O'Toole 2f80e68b2a Move node balancer into cluster package
Initial work for #3377.
2015-10-28 14:35:03 -07:00
Jason Wilder 0926b19e6b Prevent creating points with NaN float values
Float values are not supported in the existing engine and the tsm1
engines.  This changes NewPoint to return an error if a field value
contains a NaN field.  It also allows us to validate fields to prevent
other unsupported types from sneaking in through other input plugins.
2015-10-27 17:12:52 -06:00
MrLee.Kun 883640a288 change cluster logger tag 2015-10-27 15:32:51 +08:00
Charles Chan 9382d5b534 Fix typos.
* non-existant --> non-existent
* propogate --> propagate
2015-10-17 07:36:56 -07:00
Nathaniel Cook cb1aaa8e42 Merge pull request #4375 from influxdb/subscriptions
Feature add subscriber service for creating/dropping subscriptions
2015-10-15 09:17:26 -06:00
Sean Beckett 82f104a8b1 Merge pull request #4436 from influxdb/tag-names-to-keys
WIP tag name --> tag key, field name --> field key
2015-10-14 16:02:46 -07:00
Nathaniel Cook 8b31007aa7 Adds subscriber service for creating/dropping subscriptions to the
InfluxDB data stream.
2015-10-14 15:23:45 -06:00
Sean Beckett fd342ed411 Update rpc.go 2015-10-13 16:56:05 -07:00
Daniel Morsing 62dff895e2 wire up INTO queries.
Since INTO queries need to have absolute information about the database
to work, we need to create a loopback interface back to the cluster
in order to perform them.
2015-10-13 15:00:36 +00:00
Philip O'Toole faad42c1da Log a more accurate connection message
Not all connections are for writes, some are for mapping shards.
2015-10-06 13:39:51 -07:00
Philip O'Toole 2ac0357406 Support dropping non-Raft nodes 2015-10-04 00:19:52 -07:00
Philip O'Toole d74e0690c7 Revert "Merge pull request #4233 from influxdb/drop-server"
This reverts commit 0bdb36f6dc, reversing
changes made to 3085fbc138.
2015-10-02 08:39:57 -07:00
Cory LaNou f50813460e protobuf update.. :-( 2015-10-01 15:39:15 -05:00
Mint 9c6da2417e Fixed comments.
Issue: Enable golint on the code base #4098 (changes only for the cluster subpackage)

- [ ] CHANGELOG.md updated
- [X] Rebased/mergable
- [X] Tests pass
- [X] Sign [CLA](http://influxdb.com/community/cla.html) (if not already signed)
2015-09-28 23:38:21 -05:00
Mint 3cbc1936e5 Changes to make the cluster sub package golint-able
Issue: Enable golint on the code base #4098
2015-09-28 21:40:58 -05:00
Ben Johnson 1b8b625787 refactor SelectMapper 2015-09-22 13:09:26 -06:00
Philip O'Toole 1084d73092 Add cluster-service stats 2015-09-22 10:27:54 -07:00
Cory LaNou 72f6f7d268 Merge pull request #4134 from influxdb/issue-3447
Refactor Points and Rows to dedicated packages
2015-09-17 15:27:48 -05:00
Philip O'Toole 19384efde7 Return an error-on-write if RP does not exist 2015-09-16 18:40:29 -07:00
Cory LaNou d19a510ad2 refactor Points and Rows to dedicated packages 2015-09-16 15:33:08 -05:00
Jason Wilder ab164c20a2 Fix race in cluster RPC serialization
Point was accessed from multiple goroutines and there was a race on the the internal
cachedFields and cachedName fields.  Accessing these fields is unnecessary work as it
requires the point to be unmarshal into Go types and then remarshaled back into protbuf
types.  Instead, just send the line protocol version already available on the point via
the protobuf.  This avoid accesssing these cached fields and eliminates some extra work.

Possible fix for #4069
2015-09-15 16:21:39 -06:00
Philip O'Toole f0bbec6699 Add stats to PointsWriter 2015-09-08 19:30:07 -07:00
Jason Wilder ab0b2231a6 Wait for all the cluster connections to complete 2015-09-08 11:04:00 -06:00
Jason Wilder 99d02e3d62 Log the reason a remote write request might be dropped to the error message 2015-09-04 13:14:46 -06:00
Jason Wilder 1d4ee6c3fa Add tests for influx consistency level parsing 2015-09-02 09:22:15 -06:00
Takayuki Usui da8efa56e1 Fix writes possibly blocked with relaxed write consistency level
Immediately return once the required number of writes are completed,
otherwise requests running with relaxed consistency levels (e.g. any
or one) would be blocked unexpectedly, for instance, waiting for dead
nodes to respond.
2015-09-02 11:08:04 +09:00
Ben Johnson 767307eed6 convert meta shard owners to objects
This commit converts meta.ShardInfo.OwnerIDs from a slice of ids
to a slice of objects. This is to support adding statuses for a
shard for a given node. For example, a node may have a shard
assigned to it but it is currently copying the shard and is not
ready to serve data for it.

The old `OwnerIDs` is marked as deprecated, however, the code
still supports loading from older protobuf-encoded data.
2015-08-31 16:33:13 -06:00
Jason Wilder 027b6e36e7 Fix inconsistent results from show measurements
Running show measurements in a partially replicated cluster produces inconsistent
results due to the connection pooling.  When running remote meta-data queries,
the cluster service ends ups keeping map shard request open but still checks the connection
back into the pool. This causes inconsistent results because data from the last request
interferes with the new request.

This removes the connection pool which fixes the issue.  It also has the side effect of fixing
a nodes pool connections that have gone bad when a node restarts.  For example, in a 3 node cluster
that has been responding to queries correctly, restarting 1 node will cause all the other to fail
to query that node indefinitely.  This is now fixed as well.
2015-08-31 14:31:00 -06:00
David Norton 244948dc8d update shard mapper test 2015-08-25 10:20:58 -04:00
David Norton 88f556af72 convert SHOW MEASUREMENTS to a distributed query 2015-08-23 23:09:51 -04:00
David Norton 5d26cfa4d7 return interface{} from nextChunk* functions 2015-08-22 10:59:29 -04:00
David Norton c8f88f9a61 refactor remote mapping 2015-08-22 10:16:41 -04:00
Jason Wilder a7cb0df4af Fix typos/spacing 2015-08-13 10:02:05 -06:00
Jason Wilder 668181d275 Make log statements more consistent
* Capitalize first letter of message
* Log all services staring consistently
* Remove some extraneous log statements in meta.Store
* Log data dirs for meta, data and hinted handoff
2015-08-13 10:01:42 -06:00
David Norton d661bf1a06 fix #3414: shard mappers perform query re-writing 2015-08-04 09:49:50 -04:00
Philip O'Toole 10eecb441d Allow remote mapping to be forced
This is useful primarily for testing.
2015-07-20 10:44:45 -07:00
Philip O'Toole 425a65fca1 RemoteShard mapping now performed over TCP
With this change remote mapping no longer uses HTTP, as the HTTP ports
exposed by nodes on the cluster are not known cluster wide. The TCP
ports exposed by the cluster service are, so this change uses that
functionality. Each RemoteMapper has its own dedicated connection pool
for each node, and remote mapping TCP connections are in no way coupled
with query TCP connections.
2015-07-20 10:44:38 -07:00
Philip O'Toole a19cea36bd Rename cluster unit test function
Makes future tests, related to shard mapping, clearer.
2015-07-17 13:05:15 -07:00
Philip O'Toole 2dc8bb947e Correctly hook up RemoteMapper's MetaStore 2015-07-16 14:00:10 -07:00
Philip O'Toole 284a9ac0ff Add RemoteMapper implementation 2015-07-15 21:57:23 -07:00
Philip O'Toole e254245f2f Implement simple remote node choice policy 2015-07-15 19:53:10 -07:00
Philip O'Toole f41d2bab5d Start move to unified query executor 2015-07-15 19:31:13 -07:00
Philip O'Toole 74cb96646c Refactor query engine for distributed query support
With this change, the query engine code gathers information about
shards and tagsets by working with individual shards, collating the
information, and returning that to the client. It does not assume that any
particular shard is local, and accesses all shards through abstracted
Mappers, of which there are two types -- a Mapper type for Raw queries
and a second type for Aggregate queries. There are corresponding
Executors for each type of Mapper, but both types of Executors share the
same interface.
2015-07-15 12:54:55 -07:00
Philip O'Toole a84c48bff6 Allow the PointWriter timeout to be configurable 2015-07-02 12:50:12 -04:00
Joseph Crail 5fccee3d16 Fix spelling errors in comments and strings. 2015-06-28 02:54:34 -04:00
David Norton 7c39ede6ba fix #2920: create collectd database on startup 2015-06-11 09:40:42 -04:00
Jason Wilder 67d4ef0e28 Don't queue write failures that due to type conflicts
These will never succeed and will stay in the queue indefinitely.
2015-06-10 14:52:59 -06:00
Jason Wilder 999f4a4c41 Return field type errors as client write errors
Fixes #2849
2015-06-10 14:52:26 -06:00
Philip O'Toole 952fb49368 Move parsing consistency levels to cluster package
Errors parsing the consistency level are indicated by returning a nil
Graphite input.
2015-06-09 14:21:12 -07:00
Philip O'Toole fd0de2fb1b Add function to parser consistency levels 2015-06-09 14:21:12 -07:00
Jason Wilder eb1cd44b8d Log write errors
Since the client only receives a "write failed" or "partial write" error
message, log more context in the logs.
2015-06-09 14:49:22 -06:00
Jason Wilder 5e515fbeda Don't log EOF as an error
It's expected when a client disconnects
2015-06-08 16:39:39 -06:00
Jason Wilder 8323d6aa9e Log when TCP clients connect/disconnect 2015-06-08 16:39:02 -06:00
Jason Wilder 8cbda9694e Ensure unusable connections get closed
Fixes a bug where a connection that was marked as unusable didn't
prevent it from getting checked backed into the pool.
2015-06-08 11:26:56 -06:00
Jason Wilder 0c6ea32540 Use read locks instead of write lock for connection pools checkout 2015-06-08 11:21:07 -06:00
Ben Johnson 6e40f869fe Fix formatting directive. 2015-06-05 23:06:52 -06:00
Ben Johnson 617e214a49 Add remote write logging. 2015-06-05 22:49:03 -06:00
Ben Johnson 607c352412 Add remote write logging. 2015-06-05 22:34:30 -06:00
Jason Wilder 1024965db7 Create shard received from cluster writer 2015-06-05 22:16:51 -06:00
Jason Wilder 1638ff8b6c Handle nil node returned from meta store in shard writer 2015-06-05 22:16:51 -06:00
Jason Wilder 75b72c60fe Add hinted handoff service
The hinted handoff service will queue a write to a remote node if
that write fails and periodically retry the write.
2015-06-05 22:16:51 -06:00
Ben Johnson fb06549552 remove bind address from cluster config 2015-06-05 17:07:54 -06:00
Ben Johnson abbcf15bb2 integrate mux into influxd cluster service 2015-06-05 17:02:32 -06:00
Ben Johnson 5a5c077790 refactor cluster to use mux 2015-06-05 16:54:12 -06:00
Ben Johnson b925e1c1af Multi-node clustering.
This commit adds the ability to cluster multiple nodes together to share
the same metadata through raft consensus.
2015-06-05 14:41:19 -06:00
Cory LaNou 21af1ded6b messages over 1gb are probably not valid 2015-06-04 19:40:48 -06:00
Cory LaNou 5c52c4cda1 add ability to set logger for testing 2015-06-03 09:58:39 -06:00
Jason Wilder 156e7df346 Rename PointsWrite.Store to TSDBStore
Matches MetaStore naming convention better.
2015-06-02 14:47:59 -06:00
Jason Wilder 3957e096f8 Remove ownerID from protobufs
Not needed since the node that processes the request is the owner.
2015-06-02 14:45:52 -06:00
Jason Wilder e400e8f2d6 Use default retention policy if not specified during writes 2015-06-01 17:16:44 -06:00
Jason Wilder 497cd506f9 Remove temporary INFLUXDB_ALPHA write path enable flag
Real thing exists now.
2015-06-01 16:45:08 -06:00
Cory LaNou 17bdf1c114 get both json/line protocol endpoints working 2015-06-01 12:35:57 -06:00
Cory LaNou 3597565955 reading and writing yo! 2015-06-01 11:59:58 -06:00
Ben Johnson bf823d9887 Integrating cmd/influxd/run. 2015-05-30 14:06:36 -06:00