Commit Graph

51 Commits (697f48b4e62e514e701ffec39978b864a3c666e6)

Author SHA1 Message Date
Jason Wilder ad52d0fbd9 Fix tests 2016-01-21 15:30:09 -05:00
Paul Dix f385945058 Update Server to work with new metaservice/client 2016-01-21 15:28:33 -05:00
Cory LaNou 9ec7a710c9 some misc refactoring on influxd startup 2016-01-21 15:28:32 -05:00
Cory LaNou 8d878fff91 buildable meta -> services/meta 2016-01-21 15:28:32 -05:00
Edd Robinson 8384ba3e84 Update comments. 2015-12-02 12:35:58 +00:00
Edd Robinson ffbd6037e2 Initial lint for all packages under services. Supports #4098 2015-11-22 19:23:56 +00:00
ch33hau 8bfdfbda0b Disable HintedHandoff if configuration is not set. #4283 2015-11-11 01:12:34 +08:00
Philip O'Toole de7919240f Migrate internal stats to consistent names
Go style -- and existing runtime stats -- do not use underscores, but
instead use camel case. This change makes the internal stats adhere to
that convention.
2015-10-28 21:07:45 -07:00
Jason Wilder 0926b19e6b Prevent creating points with NaN float values
Float values are not supported in the existing engine and the tsm1
engines.  This changes NewPoint to return an error if a field value
contains a NaN field.  It also allows us to validate fields to prevent
other unsupported types from sneaking in through other input plugins.
2015-10-27 17:12:52 -06:00
Philip O'Toole f38c53695d Add node's active state to diagnostic output 2015-10-26 18:59:58 -07:00
Philip O'Toole f703f58d22 Add HH diagnostics 2015-10-26 18:59:58 -07:00
Philip O'Toole 87299caad1 Add HH statistics 2015-10-26 18:59:58 -07:00
Philip O'Toole 9a73d26bfb Implement NodeProcessor
A NodeProcessor wraps an on-disk queue and the goroutine that attempts
to drain that queue and send the data to the associated target node.
2015-10-26 18:59:55 -07:00
Philip O'Toole 7d22fc75a3 Support configurable purge interval 2015-10-26 13:07:25 -07:00
Philip O'Toole 37cf9a1610 Deletion while iterating is OK in Go 2015-10-09 16:30:20 -07:00
Philip O'Toole f12470a99e If there are no HH segments, then nothing to purge 2015-10-09 14:29:21 -07:00
Philip O'Toole c06ac8f94c Don't add a new segment every purge check
Everytime the purge check was running, a new segment was being added.
This meant the list of almost-empty files in the HH directories would
grow continually.
2015-10-09 14:26:47 -07:00
Philip O'Toole b009f25e3d Delete queues for inactive nodes
Deletion only takes place if all data in the queue is older than the
configured time.
2015-10-08 20:34:24 -07:00
Philip O'Toole 5b0a8ed306 HH should not process dropped nodes 2015-10-08 18:23:12 -07:00
Philip O'Toole 44d52ac138 Fully lock HH node queue creation
I believe this change address the issues with hinted-handoff not fully replicating all data to nodes that come back online after an outage.. A detailed explanation follows.

During testing of of hinted-handoff (HH) under various scenarios, HH stats showed that the HH Processor was occasionally encountering errors while unmarshalling hinted data. This error was not handled completely correctly, and in clusters with more than 3 nodes, this could cause the HH service to stall until the node was restarted. This was the high-level reason why HH data was not being replicated.

Furthermore by watching, at the byte-level, the hinted-handoff data it could be seen that HH segment block lengths were getting randomly set to 0, but the block data itself was fine (Block data contains hinted writes). This was the root cause of the unmarshalling errors outlined above. This, in turn, was tracked down to the HH system opening each segment file multiple times concurrently, which was not file-level thread-safe, so these mutiple open calls were corrupting the file.

Finally, the reason a segment file was being opened multiple times in parallel was because WriteShard on the HH Processor was checking for node queues in an unsafe manner. Since WriteShard can be called concurrently this was adding queues for the same node more than once, and each queue-addition results in opening segment files.

This change fixes the locking in WriteShard such the check for an existing HH queue for a given node is performed in a synchronized manner.
2015-10-07 02:33:43 -07:00
Philip O'Toole 5b0767c30b EOF is OK in HH processor 2015-10-07 01:56:55 -07:00
Philip O'Toole 8b49c37120 Count HH errors 2015-10-06 20:49:40 -07:00
Philip O'Toole 5d5515a497 If HH can't unmarshal a block, skip that block 2015-10-06 20:49:40 -07:00
Philip O'Toole 8a1e5a9e53 Clamp initial value of HH retry interval
This could happen due to misconfiguration, so do something sensible in
that case.
2015-10-01 12:04:33 -07:00
Philip O'Toole 878f776403 Exponential backoff if any hinted-handoff fails 2015-09-30 21:27:13 -07:00
Philip O'Toole 4eba2c1725 Add config support for max HH retry interval 2015-09-30 21:10:03 -07:00
Philip O'Toole 235714755c HH processor-level stats
This change maintains stats on a per-shard and per-node basis.
2015-09-28 18:39:39 -07:00
Philip O'Toole 14db3ce9f5 Add service-level stats for hinted-handoff 2015-09-28 18:08:35 -07:00
Philip O'Toole a4a8fa0ff0 Fully disable hinted-handoff service if requested
Without this change if hinted-handoff was disabled the service would
correctly reject writes, but it would process any data sitting in
hinted-handoff queues. With this change the service is completely
disabled.
2015-09-25 18:03:43 -07:00
Cory LaNou d19a510ad2 refactor Points and Rows to dedicated packages 2015-09-16 15:33:08 -05:00
Jason Wilder 7cf31a74cd Prevent out of memory range slices from being created
If the hinted handoff segment is corrupt, the size read could be
invalid and attempting to create a slice using that size causes
a panic.  Ideally, we'd have a checksum on the seqment record but
for now just return an error when the size is larger than the
segment file.

Fixes #3687
2015-08-17 10:48:01 -06:00
Jason Wilder 668181d275 Make log statements more consistent
* Capitalize first letter of message
* Log all services staring consistently
* Remove some extraneous log statements in meta.Store
* Log data dirs for meta, data and hinted handoff
2015-08-13 10:01:42 -06:00
Jason Wilder 398ffabab7 Fix panic in hinted handoff processor
A short write has occurred and we do not have enough bytes to determine
the size of the payload.  This is corrupted record that we should drop.
Instead of panicing, log the error and advance the queue since the error
at this location is unreoverable currently.

Fixes #3436
2015-08-06 14:06:41 -06:00
Joseph Crail 5fccee3d16 Fix spelling errors in comments and strings. 2015-06-28 02:54:34 -04:00
Jason Wilder 67d4ef0e28 Don't queue write failures that due to type conflicts
These will never succeed and will stay in the queue indefinitely.
2015-06-10 14:52:59 -06:00
Cory LaNou 8a5cf394d8 add ability to silence logging for testing 2015-06-10 10:27:57 -05:00
Jason Wilder 5dab8de7e0 Log hinted handoff processing duration 2015-06-09 14:49:18 -06:00
Jason Wilder c9f9b37753 Add hinted handoff queue rate limiting
Basic throughput limiting to dynamically maintain a bytes/sec
limit if configured for hinted handoff retry queues.  If a batch is
larger than the limit, the limit will slow the processing down to one
write per second.

By default, the limit is unbounded.  It can be enabled to help prevent
retstarting nodes that have queued writes for them from being overloaded
at startup.
2015-06-09 14:46:13 -06:00
Jason Wilder 07d1aac50f Fix comment in hinted handoff processor 2015-06-08 16:51:48 -06:00
Jason Wilder ede254484d Process hinted handoff queues concurrently 2015-06-08 15:05:44 -06:00
Jason Wilder d86b15953c Fix queue advance not writing updated position
The advance function was not writing the updated position in the
queue until after the next advance call was called.  Resulted in
the very last message would get replayed on restart each time.
2015-06-08 12:09:31 -06:00
Jason Wilder e75208f15c Log remote write erros 2015-06-08 09:09:10 -06:00
Jason Wilder 2ccf97e6a0 Add wait group to hinted handoff service 2015-06-05 22:16:52 -06:00
Jason Wilder 4ec77d8b84 Fix comments 2015-06-05 22:16:52 -06:00
Jason Wilder e2f3ff26a5 Fix data race when close hinted handoff service 2015-06-05 22:16:51 -06:00
Jason Wilder f4cea559d3 Purge hinted handoff queue if entries stick around past a cutoff 2015-06-05 22:16:51 -06:00
Jason Wilder 9d67f9bf62 Add hinted handoff max age and throughput rate limit config options 2015-06-05 22:16:51 -06:00
Jason Wilder 809b9b8a83 Add basic hinted handoff support
If a remote write fails, it will be queued to a per-node, local disk
queue and retried later.
2015-06-05 22:16:51 -06:00
Jason Wilder 89c01cd37d Add hinted handoff queue
This adds a disk backed queue that will be to persist hinted
handoff writes.
2015-06-05 22:16:51 -06:00
Jason Wilder 75b72c60fe Add hinted handoff service
The hinted handoff service will queue a write to a remote node if
that write fails and periodically retry the write.
2015-06-05 22:16:51 -06:00