Commit Graph

8301 Commits (66827524081d1e97558d0384d84789a337c9cc87)

Author SHA1 Message Date
Philip O'Toole 44d52ac138 Fully lock HH node queue creation
I believe this change address the issues with hinted-handoff not fully replicating all data to nodes that come back online after an outage.. A detailed explanation follows.

During testing of of hinted-handoff (HH) under various scenarios, HH stats showed that the HH Processor was occasionally encountering errors while unmarshalling hinted data. This error was not handled completely correctly, and in clusters with more than 3 nodes, this could cause the HH service to stall until the node was restarted. This was the high-level reason why HH data was not being replicated.

Furthermore by watching, at the byte-level, the hinted-handoff data it could be seen that HH segment block lengths were getting randomly set to 0, but the block data itself was fine (Block data contains hinted writes). This was the root cause of the unmarshalling errors outlined above. This, in turn, was tracked down to the HH system opening each segment file multiple times concurrently, which was not file-level thread-safe, so these mutiple open calls were corrupting the file.

Finally, the reason a segment file was being opened multiple times in parallel was because WriteShard on the HH Processor was checking for node queues in an unsafe manner. Since WriteShard can be called concurrently this was adding queues for the same node more than once, and each queue-addition results in opening segment files.

This change fixes the locking in WriteShard such the check for an existing HH queue for a given node is performed in a synchronized manner.
2015-10-07 02:33:43 -07:00
Philip O'Toole fb83158f38 Merge pull request #4349 from influxdb/continue_on_unmarshal
If HH can't unmarshal a block, skip that block
2015-10-07 02:03:08 -07:00
Philip O'Toole 5b0767c30b EOF is OK in HH processor 2015-10-07 01:56:55 -07:00
Philip O'Toole 8b49c37120 Count HH errors 2015-10-06 20:49:40 -07:00
Philip O'Toole 5d5515a497 If HH can't unmarshal a block, skip that block 2015-10-06 20:49:40 -07:00
Philip O'Toole 6f80d690dd Update CHANGELOG for PR4342
[ci skip]
2015-10-06 20:26:37 -07:00
Philip O'Toole 11675df981 Merge pull request #4342 from kostya-sh/4325_ast-bug
Fix aggregates validation in presence of arithmetic expressions
2015-10-06 20:25:11 -07:00
Cameron Sparr 883d32cfd0 Add public function to graphite parser to apply template 2015-10-06 17:42:36 -06:00
Michael Desa 897a5effff Merge pull request #4329 from influxdb/md-stress-timestamps
Add support for evenly spaced timestamps
2015-10-06 16:26:08 -07:00
Nathaniel Cook 6f1a44bd07 Merge pull request #4345 from influxdb/remove_iterator
tsdb.Iterator is no longer used. Removing
2015-10-06 17:00:50 -06:00
Paul Dix f041939a1c Merge pull request #4308 from influxdb/pd-storage-engine
The TSM storage engine
2015-10-06 15:54:56 -07:00
Paul Dix b11308133a Only limit field count for non-tsm engines 2015-10-06 15:49:37 -07:00
Paul Dix 40ff4f4a86 Change default to bz1 2015-10-06 15:30:34 -07:00
Philip O'Toole cd191a645e Merge pull request #4347 from influxdb/fix_connection_log
Log a more accurate connection message
2015-10-06 13:47:56 -07:00
Philip O'Toole faad42c1da Log a more accurate connection message
Not all connections are for writes, some are for mapping shards.
2015-10-06 13:39:51 -07:00
Jason Wilder 41e3294d4a Fix panic: assignment to entry in nil map
Closing the store did not properly return an error for in-flight
writes because the closing channel was set to nil when closed.  A
nil channel is not selectable so writes continue on past the guard
checks and trigger panics.
2015-10-06 14:03:52 -06:00
Paul Dix be477b2aab Fix cursor bug on index 2015-10-06 12:26:45 -07:00
Nathaniel Cook d380ee37a2 tsdb.Iterator is no longer used. Removing 2015-10-06 10:34:07 -06:00
Konstantin Shaposhnikov 95a0e149b0 Fix aggregates validation in presence of arithmetic expressions
Fixes #4325
2015-10-06 21:24:50 +08:00
Samer Kanjo 26cab4f327 Complete lint of project root. 2015-10-05 23:10:25 -05:00
Philip O'Toole 23fb7e29fc Merge pull request #4335 from influxdb/drop_server_panic
Don't panic when DROPing non-existent nodes
2015-10-05 20:51:04 -07:00
dgnorton a42fb7874b Merge pull request #4336 from influxdb/dgn-fix-4276
fix #4276: shouldn't drop all series when regex doesn't match
2015-10-05 20:23:20 -04:00
Paul Dix 27d0db33c1 Merge branch 'pd-storage-engine' of github.com:influxdb/influxdb into pd-storage-engine 2015-10-05 20:12:36 -04:00
Paul Dix 267f34b94e Updates based on PR feedback 2015-10-05 20:09:56 -04:00
Paul Dix 26a93ec23e Fix deletes not kept if shutdown before flush on tsm1 2015-10-05 20:09:56 -04:00
Paul Dix bb398daf75 Updates based on @otoolp's PR comments 2015-10-05 20:09:56 -04:00
Jason Wilder c6f2f9cec2 Avoid duplicating values slice when encoding 2015-10-05 20:09:56 -04:00
Jason Wilder cb28dabf62 Make DecodeBlock panic if block size is too small
Should never get a block size 9 bytes since Encode always returns the min
timestampe and a 1 byte header.  If we get this, the engine is confused.
2015-10-05 20:09:56 -04:00
Jason Wilder b0449702e5 Fix comment typos 2015-10-05 20:09:56 -04:00
Paris Holley 36898f9451 do not include empty tags in hash 2015-10-05 20:09:56 -04:00
Paul Dix d9f94bdeeb Add db crash recovery 2015-10-05 20:09:56 -04:00
Jason Wilder 1d754db00b Propogate all encoding errors to engine
Avoid panicing in lower level code and allow the engine to decide what
it should do.
2015-10-05 20:09:56 -04:00
Jason Wilder 4c54c78009 Move compression encoding constants to encoders
Will make it less error-prone to add new encodings int the future
since each encoder has it's set of constants.  There are some placeholder
contants for uncompressed encodings which are not in all encoder currently.
2015-10-05 20:09:56 -04:00
Jason Wilder b1a57e1628 Fix go vet errors 2015-10-05 20:09:56 -04:00
Jason Wilder 5d9b89d601 Disable copier test
Not implemented for tsm1 engine
2015-10-05 20:09:56 -04:00
Jason Wilder ab791ba913 Fix TestStoreOpenShardCreateDelete
Shard path can be a directory.
2015-10-05 20:09:56 -04:00
Paul Dix d47ddb5454 Cleanup after pd1 -> tsm1 name change. 2015-10-05 20:09:55 -04:00
Paul Dix 594253cbba Rename storage engine to tsm1, for Time Structured Merge Tree! 2015-10-05 20:09:55 -04:00
Paul Dix 0a11a2fdbc Add deletes to new storage engine 2015-10-05 20:09:55 -04:00
Paul Dix 4beca1a245 Implement reverse cursor direction on pd1 2015-10-05 20:09:55 -04:00
Jason Wilder dbf6228817 Fix go vet 2015-10-05 20:09:55 -04:00
Jason Wilder d9499f0598 Remove zig zag encoding from timestamp encoder
Not needed since all timestamps will be sorted in ascending order.  Negatives
are not possible.
2015-10-05 20:09:55 -04:00
Paul Dix a2b139e006 Fix compaction and multi-write bugs.
* Fix bug with locking when the interval completely covers or is totally inside another one.
* Fix bug with full compactions running when the index is actively being written to.
2015-10-05 20:09:55 -04:00
Jason Wilder 2366baaf0b Handle partial reads when loading WAL
If reading into fixed sized buffer using io.ReadFull, the func can
return io.ErrUnexpectedEOF if the read was short.  This was slipping
through the error handling causing the shard to fail to load.
2015-10-05 20:09:55 -04:00
Paul Dix 3332236527 Fix bugs with writing old data and compaction. 2015-10-05 20:09:55 -04:00
Jason Wilder 5d938d0a8b Add test with duplicate timestamps
Should not happen but makes sure that the same values are encoded
and decoded correctly.
2015-10-05 20:09:55 -04:00
Jason Wilder c47d14540d Add compressed string encoding
Uses snappy to compress multiple strings into a block
2015-10-05 20:09:55 -04:00
Paul Dix 861a15b3e6 Fix panic when data file has small index 2015-10-05 20:09:55 -04:00
Paul Dix be011b8da9 Add logging to pd1 2015-10-05 20:09:54 -04:00
Paul Dix c1213ba367 Update WAL to deduplicate values on Cursor query.
Added test and have failing section for single value encoding.
2015-10-05 20:09:54 -04:00