influxdb

Commit Graph

Author	SHA1	Message	Date
Philip O'Toole	44d52ac138	Fully lock HH node queue creation I believe this change address the issues with hinted-handoff not fully replicating all data to nodes that come back online after an outage.. A detailed explanation follows. During testing of of hinted-handoff (HH) under various scenarios, HH stats showed that the HH Processor was occasionally encountering errors while unmarshalling hinted data. This error was not handled completely correctly, and in clusters with more than 3 nodes, this could cause the HH service to stall until the node was restarted. This was the high-level reason why HH data was not being replicated. Furthermore by watching, at the byte-level, the hinted-handoff data it could be seen that HH segment block lengths were getting randomly set to 0, but the block data itself was fine (Block data contains hinted writes). This was the root cause of the unmarshalling errors outlined above. This, in turn, was tracked down to the HH system opening each segment file multiple times concurrently, which was not file-level thread-safe, so these mutiple open calls were corrupting the file. Finally, the reason a segment file was being opened multiple times in parallel was because WriteShard on the HH Processor was checking for node queues in an unsafe manner. Since WriteShard can be called concurrently this was adding queues for the same node more than once, and each queue-addition results in opening segment files. This change fixes the locking in WriteShard such the check for an existing HH queue for a given node is performed in a synchronized manner.	2015-10-07 02:33:43 -07:00
Philip O'Toole	5b0767c30b	EOF is OK in HH processor	2015-10-07 01:56:55 -07:00
Philip O'Toole	8b49c37120	Count HH errors	2015-10-06 20:49:40 -07:00
Philip O'Toole	5d5515a497	If HH can't unmarshal a block, skip that block	2015-10-06 20:49:40 -07:00
Cameron Sparr	883d32cfd0	Add public function to graphite parser to apply template	2015-10-06 17:42:36 -06:00
Paul Dix	bb398daf75	Updates based on @otoolp's PR comments	2015-10-05 20:09:56 -04:00
Jason Wilder	5d9b89d601	Disable copier test Not implemented for tsm1 engine	2015-10-05 20:09:56 -04:00
Paul Dix	7555ccbd70	WIP: engine work	2015-10-05 20:06:21 -04:00
Philip O'Toole	2ac0357406	Support dropping non-Raft nodes	2015-10-04 00:19:52 -07:00
Philip O'Toole	d74e0690c7	Revert "Merge pull request #4233 from influxdb/drop-server" This reverts commit `0bdb36f6dc`, reversing changes made to `3085fbc138`.	2015-10-02 08:39:57 -07:00
Cory LaNou	f50813460e	protobuf update.. :-(	2015-10-01 15:39:15 -05:00
Philip O'Toole	8a1e5a9e53	Clamp initial value of HH retry interval This could happen due to misconfiguration, so do something sensible in that case.	2015-10-01 12:04:33 -07:00
Philip O'Toole	878f776403	Exponential backoff if any hinted-handoff fails	2015-09-30 21:27:13 -07:00
Philip O'Toole	4eba2c1725	Add config support for max HH retry interval	2015-09-30 21:10:03 -07:00
Philip O'Toole	235714755c	HH processor-level stats This change maintains stats on a per-shard and per-node basis.	2015-09-28 18:39:39 -07:00
Philip O'Toole	14db3ce9f5	Add service-level stats for hinted-handoff	2015-09-28 18:08:35 -07:00
Philip O'Toole	a196d3663a	Allow configuration of UDP retention policy Fixes issue #4529	2015-09-28 15:17:56 -07:00
Philip O'Toole	49a70d0fca	Merge pull request #4238 from influxdb/hh_control Fully disable hinted-handoff service if requested	2015-09-28 12:11:18 -07:00
Philip O'Toole	a4a8fa0ff0	Fully disable hinted-handoff service if requested Without this change if hinted-handoff was disabled the service would correctly reject writes, but it would process any data sitting in hinted-handoff queues. With this change the service is completely disabled.	2015-09-25 18:03:43 -07:00
Philip O'Toole	9de3125f6b	Graphite TCP should not block system shutdown With this change Graphite TCP connections are tracked on a per-service basis. This allows a closing Graphite service to first shutdown any active connections, thereby unblocking the rest of shutdowm. This work exposed small shortcomings with the existing Diagnostics system and that code has alse been tweaked. Fixes issue #4017	2015-09-24 14:08:38 -07:00
Antonio Murdaca	49c0b6ea73	Fix go vet warnings This patch fixes the following go vet warnings: ``` services/continuous_querier/service.go:326: influxql.Statements composite literal uses unkeyed fields exit status 1 services/httpd/handler_test.go:145: models.Rows composite literal uses unkeyed fields services/httpd/handler_test.go:146: models.Rows composite literal uses unkeyed fields services/httpd/handler_test.go:165: models.Rows composite literal uses unkeyed fields services/httpd/handler_test.go:166: models.Rows composite literal uses unkeyed fields services/httpd/handler_test.go:187: models.Rows composite literal uses unkeyed fields services/httpd/handler_test.go:188: models.Rows composite literal uses unkeyed fields exit status 1 ``` Signed-off-by: Antonio Murdaca <runcom@linux.com>	2015-09-21 15:28:54 +02:00
Cory LaNou	72f6f7d268	Merge pull request #4134 from influxdb/issue-3447 Refactor Points and Rows to dedicated packages	2015-09-17 15:27:48 -05:00
Cory LaNou	38cb7b49de	Mising defer in httpd recovery. fixes #4124	2015-09-17 09:37:27 -05:00
Cory LaNou	ba830be3b9	actually move influxql.Row* -> models.Row*	2015-09-16 16:32:50 -05:00
Cory LaNou	d19a510ad2	refactor Points and Rows to dedicated packages	2015-09-16 15:33:08 -05:00
Philip O'Toole	d538829b4c	Enhance openTSDB logging and stats	2015-09-09 13:30:11 -07:00
Philip O'Toole	fef20c77b2	Cleanly terminate openTSDB connection on EOF This is not really an error, so don't log it.	2015-09-09 13:01:13 -07:00
Philip O'Toole	02fcaf853d	Add node re Graphite configuration [ci skip]	2015-09-08 23:22:34 -07:00
Philip O'Toole	519a30a463	Add note on openTSDB batching [ci skip]	2015-09-08 23:19:17 -07:00
Philip O'Toole	24aca5611a	Add batch-pending control to openTSDB input	2015-09-08 19:35:42 -07:00
Philip O'Toole	95530e1623	Set UDP input defaults if not set	2015-09-08 19:32:20 -07:00
Philip O'Toole	5373f263a3	Add pending control to batcher With this change, the generic batcher used by many inputs can now be buffered. Testing shows that this performance of the Graphite input by 10-100%, with the biggest improvements at lower numbers of connections.	2015-09-08 19:32:00 -07:00
Philip O'Toole	e38a204afc	Merge pull request #4043 from influxdb/opentsdb_batching Add batching and stats to openTSDB input	2015-09-08 19:27:35 -07:00
Philip O'Toole	1ce5187b66	Merge pull request #4049 from influxdb/udp_stats Add stats to the UDP input	2015-09-08 19:18:17 -07:00
Philip O'Toole	9677a0faab	Add collectd stats	2015-09-08 19:07:47 -07:00
Philip O'Toole	27932409b0	Add stats to the UDP input	2015-09-08 18:48:35 -07:00
Philip O'Toole	817328d378	Add basic stats to the CQ service	2015-09-08 18:17:20 -07:00
Philip O'Toole	349ba8b307	Add batching and stats to openTSDB input	2015-09-08 16:19:50 -07:00
Jason Wilder	73510a0a68	Fix invalid time stamp in graphite metric causes panic If a timestamp was larger than the max epoch value was sent via graphite it would cause the timestamp to overflow when it was marshaled/unmarshaled back from the raft log. The overflow cause the shard group to get created with the wrong timestamp which cause a panic when writing the point. The panic was caused because the timestamp that were supposed to exists in a map created by MapShards did not actually exist so a nil ShardGroup was used. The change prevents creating the point with an invalid timestamp. Since graphite using a timestamp in seconds, the maximum range is known and can be prevented. This also adds a check for the minimum range as well. Fixes #3785	2015-09-08 10:07:47 -06:00
Philip O'Toole	332ce6481d	Removed unused Graphite NewConfig This function is not helpful for sections of the config that support multiple instances.	2015-09-08 08:32:19 -07:00
Philip O'Toole	bbc103305b	Support multiple Graphite inputs Fixes issue #3636	2015-09-06 21:33:46 -07:00
Philip O'Toole	fa29e12222	Shutdown UDP Graphite on SIGTERM Service.Close() had no way of closing the UDP Conn. This change makes the UDP an attribute of the server, so Close() can access it.	2015-09-05 00:30:59 -07:00
Philip O'Toole	579e2a250c	Add stats to httpd package	2015-09-04 12:37:59 -07:00
Philip O'Toole	3df898bd90	Merge pull request #3987 from influxdb/global_expvar_hookup_diagnostics Use expvar statistics directly	2015-09-04 11:13:17 -07:00
Philip O'Toole	89bc392ec4	Access expvar directly from monitor expvar map is already global so access it directly. This simplifies the code and makes it much eaisier to use from other modules.	2015-09-04 09:45:24 -07:00
Philip O'Toole	cf5a655249	Don't precreate shard groups entirely in past Fixes issue #3722	2015-09-04 08:31:50 -07:00
Philip O'Toole	6ad35e23e9	Integrate code review feedback	2015-09-03 20:50:54 -07:00
Philip O'Toole	d58532d844	Add Graphite diagnostics Graphite diagnostics currently show TCP connections.	2015-09-03 20:50:54 -07:00
Philip O'Toole	e07432c59f	Implement diagnostics support This change adds support for diagnostics by decomposing the existing interface into two interfaces -- one for stats, and the other for diags. It also adds some basic monitor of system, network, and the Go runtime.	2015-09-03 20:50:54 -07:00
David Norton	dce666e757	fix #3979 : fix race in CQ service	2015-09-03 19:55:40 -04:00

1 2 3 4 5 ...

269 Commits (7757b9b9db2bee272ada18037e34d75c9e180f1d)