Commit Graph

1400 Commits (5263070632eeed600119ad96f865d2562e1133c4)

Author SHA1 Message Date
Jason Wilder 25206c729c Add compactor type 2015-11-24 08:50:07 -07:00
Philip O'Toole f8b4950ea9 Enhance tsm1dev logging 2015-11-23 14:24:39 -08:00
Jason Wilder f70323cb89 Add MergeIterator
MergeIterator will be used to merge multiple TSM KeyIterators and the
WAL KeyIterator using a stream based iteration approach.  Each iteration
cycle returns a key and values ordered in way to write a new TSM file
optimally.
2015-11-23 14:59:15 -07:00
Jason Wilder 5334271e26 Add KeyIterator for WAL segments
This provides and interface and type to combine multiple WAL segments
in order and then allow the values to be read in an order suitable for
writing to a TSM file.
2015-11-23 14:59:15 -07:00
Jason Wilder d2b045f89b Code cleanup 2015-11-23 14:03:50 -07:00
Jason Wilder 697cfe604b Add stubbed out dev tsm engine
Starting to integrate some of the components into a engine that is
usable for development purposes.  This allows the code to evolve while
keeping the existing TSM engine in tact for reference.

Currently, just the WAL is wired up so writes can be tested.  Other engine
functions will panic the server if called.
2015-11-23 13:55:34 -07:00
Jason Wilder 7461b61bf2 Fix race in WAL and WALSegmentWriter
WAL currentSegmentWriter was not accessed under a mutex.  The WALSegmentWriter
also did not use a mutex to protect the underlying writer.
2015-11-23 13:55:34 -07:00
Jason Wilder aa00ef953a Fix typo in func names 2015-11-23 13:55:34 -07:00
Jason Wilder e2b1a09ece Implemment WAL write/delete functions 2015-11-23 13:55:33 -07:00
Jason Wilder afc0d5bfb9 Add WALSegmentReader/Writer
Basic types for reading and writing WAL segment files.
2015-11-23 13:55:33 -07:00
Jason Wilder 151b33d000 Rename wal.go to log.go
This is the existing WAL + cache implementation.  Moving it to a separate file
so that it can remain intact while a refactoring to a independent WAL can occur.

The WAL was also named Log in the code so this names file more closely to the concept
in the code.
2015-11-23 13:53:30 -07:00
Philip O'Toole 19f53d8a75 Add some simple benchmarks 2015-11-20 21:09:44 -08:00
Philip O'Toole 5b573b9248 Move to simpler cache
This cache simply evicts as much as possible whenever a checkpoint is
set.
2015-11-20 21:09:24 -08:00
Jason Wilder 0d1508a7c6 Add comments for search 2015-11-17 23:24:10 -07:00
Jason Wilder a7d7c280ed Add block type to index
This will faciliate loading a block into a type specific result without
first loading the block.  This will also allow us to populate the database
index solely from the index.
2015-11-17 23:24:09 -07:00
Jason Wilder e5022a898d Support decoding into type specific slices
There is a lot of allocations performed when decoding blocks.  These
types can be re-used to reduce allocations in many cases.  This change
allows a type specific slice to be passed in to decode funcs to be re-used
if it is large enough.

The existing decode is is left for backwards compatibility but is not
very efficient right now.  It may be removed.
2015-11-17 23:24:09 -07:00
Jason Wilder 5a12c49475 Make type specific decoders exported 2015-11-17 23:24:09 -07:00
Jason Wilder d517bad6f2 Add BlockType func
Allows the block type to be determined without decoding all the values.
2015-11-17 23:24:09 -07:00
Philip O'Toole 6aede8f562 Clone should sort values
This code may actually change soon due to internal design changes, but
this will ensure testing output is constant.
2015-11-17 11:59:50 -08:00
Philip O'Toole 76b02c9143 Merge pull request #4812 from influxdb/checkpointed_wal_tsm_cache
Checkpointed WAL tsm1 cache
2015-11-17 11:27:00 -08:00
Jason Wilder bd0a89bb00 Merge pull request #4808 from influxdb/jw-tsm
Add FileStore for TSM files
2015-11-17 12:11:26 -07:00
Jason Wilder 7c7a68d783 Small cleanups 2015-11-17 11:30:29 -07:00
CrazyJvm 6e60e3226a check point without fields when NewPoint 2015-11-17 13:21:52 +08:00
Philip O'Toole d8ea132c53 Add WAL cache 2015-11-16 19:52:49 -08:00
Jason Wilder 9c2be12b65 Add FileStore.Remove func
Allows a TSMFile to be removed from the active set of files managed
by the FileStore.
2015-11-16 09:16:10 -07:00
Jason Wilder c2530e93d7 Add mutexes around seeker usage
These are not goroutine safe.
2015-11-16 09:05:27 -07:00
Jason Wilder ef18f8afb2 Handle TSM key deletions
This writes a tombstone file containing a line per deleted key. This
file is read when a TSMReader is created and any keys listed in the file
are removed from the index.
2015-11-16 08:44:52 -07:00
Jason Wilder ed7cfb6df3 Add Keys function to TSMIndex
Useful for testing
2015-11-16 08:44:52 -07:00
Jason Wilder d8c0c26934 Return error if number of blocks would overflow 2015-11-16 08:44:52 -07:00
Jason Wilder 16c5e0a2e0 Add Close to TSMWriter interface 2015-11-16 08:44:52 -07:00
Jason Wilder b279534f2a Remove type specific casts in encoders
This prevented the encoders from using other implementations of the Value
interface because it would always cast one of the types to our specific
implementations.
2015-11-16 08:44:52 -07:00
Jason Wilder 0ab423c7ff Initial FileStore implementation
Provides functionality to load a directory of TSM files (or add them manually)
as well as reading blocks of values for individual key and times.
2015-11-16 08:44:52 -07:00
Jason Wilder e4312d854c Fix typo 2015-11-12 09:53:38 -07:00
Jason Wilder 8e14877e98 Add diagrams/documentation about indirect indexing strategy 2015-11-11 17:01:20 -07:00
Jason Wilder 7df1c2dd31 Add an indirectIndex implementation
Implements the indirect indexing approach to allow indexes on MMAPed files
that may be larger than available RAM
2015-11-11 15:16:11 -07:00
Jason Wilder 06016882ab Add revised TSM file writer/readers
This adds some basic file reader/writers for creating the updated TSM file format.  It uses a simple
in-memory index without MMAP for now, but will be extended to use and indirect indexing approach as well as MMAPed file access as described in the design doc.

This code is not integrated into the TSM engine yet
2015-11-11 12:52:34 -07:00
Jason Wilder 9312921ae2 Add non-mmap file indexing option 2015-11-09 16:04:00 -07:00
Jason Wilder 44077851ca Design review updates
* Add file/block design option ideas
* Update cache eviction policy
2015-11-09 15:56:26 -07:00
Jason Wilder 9239e3132f Add design doc 2015-11-09 15:08:47 -07:00
Philip O'Toole de7919240f Migrate internal stats to consistent names
Go style -- and existing runtime stats -- do not use underscores, but
instead use camel case. This change makes the internal stats adhere to
that convention.
2015-10-28 21:07:45 -07:00
Jason Wilder 239f43b529 Remove commented out test skip
[ci skip]
2015-10-28 16:14:54 -06:00
Jason Wilder 1cd30501fd Ensure calling Delete on a deleted file does not panic 2015-10-28 16:11:42 -06:00
Jason Wilder 0db2192f10 Replace use of reflect.DeepEquals with dataFilesEquals
Since the reflect.DeepEquals seems to reach into the dataFile without
acquiring a lock, it's not safe to use it.
2015-10-28 16:02:13 -06:00
Jason Wilder 9d8018297d Prevent writing to a deleted file
When a dataFile is deleted, the f file pointer is set to nil.  Since deleting
a file happens asynchronously, code that had a reference when it was valid may
run when it's gone.
2015-10-28 15:56:46 -06:00
Jason Wilder a1d8af2441 Fix data race in tsm dataFile
dataFile was not protected by a mutex which causes a data race and live
code and tests.  filesAndLock used reflect.DeepEqual on a copy of dataFile
slices.  reflect.DeepEqual appears to access unexported dataFile fields
which can't be protected.  This was changed to use a equals func that will
require a mutex to be acquired.

The other issue was that many of the dataFile funcs access the mmap without
acquiring a lock.  When a dataFile is deleted (possibly during rewriting),
reads from the mmap could return invalid data because references to the dataFile
are still in use by other goroutines.

Fixes #4534
2015-10-28 15:21:36 -06:00
Jason Wilder 4d24b05ac1 Log WAL points that fail to parse
Mainly for debugging as since this should not happen going forward.  Since
there may be points with NaN already stored in the WAL, this is helpful for
troubleshooting panics.
2015-10-27 17:12:56 -06:00
Jason Wilder 7f4a3f516b Return error if NaN is encoded in a block 2015-10-27 17:12:56 -06:00
Jason Wilder 0926b19e6b Prevent creating points with NaN float values
Float values are not supported in the existing engine and the tsm1
engines.  This changes NewPoint to return an error if a field value
contains a NaN field.  It also allows us to validate fields to prevent
other unsupported types from sneaking in through other input plugins.
2015-10-27 17:12:52 -06:00
Jason Wilder 56d85d44ad Use RemoveAll instead of Remove
When a database is dropped, removing old segments returns an error
because the files are already gone.  Using RemoveAll handles this
case more gracefully.
2015-10-26 13:16:32 -06:00
Jason Wilder 8af066b8ee Add more context to errors when flushes fail 2015-10-26 13:08:06 -06:00
Jason Wilder 4d277e7772 Ensure WAL is closed when closing engine
If a database is dropped, the WAL maintenance goroutines could still
kick in an fail becase the DB dirs are gone.
2015-10-26 10:38:52 -06:00
Jason Wilder 6240972121 Log errors returned from failed compactions 2015-10-26 10:37:09 -06:00
Jason Wilder 562ccb6492 Add missing cacheLock/writeLock.Unlock calls
If an error occurred in this code path, the locks would not be released.
2015-10-26 10:17:47 -06:00
Jason Wilder 4afb98ba8b Return error instead of panicing if we can't create a new WAL segment
If a drop database is executed while writes are in flight, a panic
could occur because the WAL would fail to write to the DB dirs where
had been removed.

Partil fix for #4538
2015-10-26 09:53:47 -06:00
Ben Johnson e9d303531e reuse tsm1 decode buffer
This commit changes `tsm1.DecodeBlock()` to reuse the same
slice of `[]tsm1.Value` instead of reallocating a new one each time.
2015-10-23 12:51:55 -06:00
Jason Wilder 7beb4f7fec Remove Stat+Remove before Rename calls
The Stat+Remove calls are unnecessary because Rename will replace
the destination file if it exist or not.  There is no need to remove
the destination file before calling Rename.
2015-10-23 12:18:49 -06:00
Jason Wilder 61cbb5b6ff Use RemoveAll instead of Remove
Several places use os.Remove and check for os.ErrNotExist.  os.Remove
does not return os.ErrNotExit, it returns a *PathError so these remove
calls will panic if the file does not exist.

Instead use os.RemoveAll that will not return an error if the file does
not exist.

Fixes #4545
2015-10-23 12:06:20 -06:00
Jason Wilder 827c51384c Merge pull request #4543 from influxdb/jw-wal-tests
Fix WAL Write/Close concurrency issues
2015-10-22 10:23:46 -06:00
Jason Wilder 8dec255b15 Move closing check for flush before cacheLock
Lock does not need to be acquired.
2015-10-22 10:12:07 -06:00
Paul Dix 529985964f Ensure compactions don't create files much larger than 2GB.
* refactor compaction
* rework compaction cleanup logic to work with multiple resulting files
* ensure the uint64 number for a series key doesn't use 0 or MaxInt64 for sentinel values
2015-10-22 08:34:02 -04:00
Jason Wilder f66de17e8a Move closing channel init before first usage
Fix a test that closes and re-opens the same WAL.
2015-10-21 23:47:29 -06:00
Jason Wilder 6db5429d06 Fix deadlock when closing WAL and flush is running
Close acquired the cacheLock and writeLock in a different order than flush.  If addToCache was also
running in a goroutine (acquiring cacheLock), a deadlock could happen.
2015-10-21 23:46:59 -06:00
Jason Wilder 4f31b8fab9 Fix panic: error opening new segment file for wal
panic: error opening new segment file for wal: open /var/folders/lj/vlbynqp52pxdxxlxx64j6bk80000gn/T/tsm1-test709000715/_00002.wal: no such file or directory

goroutine 8 [running]:
github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).writeToLog(0xc820098500, 0x1, 0xc8201584b0, 0x1c, 0x45, 0x0, 0x0)
	/Users/jason/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:427 +0xc19
2015-10-21 23:46:09 -06:00
Jason Wilder 4ac67259f7 Fix data race in filesAndLock/Compact
WARNING: DATA RACE
Write by goroutine 10:
  github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Compact()
      /Users/jason/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:716 +0x1a3e
  github.com/influxdb/influxdb/tsdb/engine/tsm1_test.TestEngine_WriteCompaction_Concurrent.func2()
      /Users/jason/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1_test.go:422 +0xc8

Previous read by goroutine 9:
  github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).filesAndLock()
      /Users/jason/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:476 +0xe8
  github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).Write()
      /Users/jason/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:370 +0x216
  github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).flush()
      /Users/jason/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:604 +0xef4
  github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Log).WritePoints()
      /Users/jason/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/wal.go:243 +0x794
  github.com/influxdb/influxdb/tsdb/engine/tsm1.(*Engine).WritePoints()
      /Users/jason/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1.go:350 +0xb0
  github.com/influxdb/influxdb/tsdb/engine/tsm1_test.TestEngine_WriteCompaction_Concurrent.func1()
      /Users/jason/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1_test.go:401 +0x432

Goroutine 10 (running) created at:
  github.com/influxdb/influxdb/tsdb/engine/tsm1_test.TestEngine_WriteCompaction_Concurrent()
      /Users/jason/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1_test.go:426 +0x225
  testing.tRunner()
      /private/var/folders/q8/bf_4b1ts2zj0l7b0p1dv36lr0000gp/T/workdir/go/src/testing/testing.go:456 +0xdc

Goroutine 9 (running) created at:
  github.com/influxdb/influxdb/tsdb/engine/tsm1_test.TestEngine_WriteCompaction_Concurrent()
      /Users/jason/go/src/github.com/influxdb/influxdb/tsdb/engine/tsm1/tsm1_test.go:406 +0x182
  testing.tRunner()
      /private/var/folders/q8/bf_4b1ts2zj0l7b0p1dv36lr0000gp/T/workdir/go/src/testing/testing.go:456 +0xdc
2015-10-21 16:38:35 -06:00
Jason Wilder cf8a7c6d9a Fix panic: assignment to entry in nil map
WAL.addToCache could if WritePoints call is running when the WAL is
closed.
2015-10-21 16:33:36 -06:00
Jason Wilder 4c4ffda30a Add WAL write/flush concurrently test 2015-10-21 16:33:26 -06:00
Jason Wilder 965827c8e8 Add test for writing/querying series with mixed field numbers 2015-10-21 15:21:28 -06:00
Jason Wilder 1922a4010f Add test for writin/querying point earlier than first stored 2015-10-21 15:21:28 -06:00
Jason Wilder 11bcb0f95a Add test for data races in tsm1 Write
Running this test with -race triggers a number of data races.
2015-10-21 12:26:39 -06:00
Jason Wilder e2ada3c59b Log errors when flushes fail
There errors were getting silently dropped which might be hiding
problems.
2015-10-21 09:12:34 -06:00
Jason Wilder ba73b1fac6 Fix panic: runtime error: index out of range - Values.MinTime
When rewriting a tsm file, a panice on the Values slice could happen
if there were no values in the slice and the conditions of the rewrite
causes DecodeAndCombine to be called with the empty slice.  This could
happen is the sizes of the points new values was equal to
the MaxPointsInBlock config options and there were no future blocks after
the current one being written.

When this happens, DecodeAndCombine returns a zero length remaining values
slice which is passed back into DecodeAndCombine one last time.  In this case,
we now just return the original block since there is nothing new to combine.

Fixes #4444 #4365
2015-10-19 13:31:43 -06:00
Philip O'Toole 100ca62f5c Fix build after go metalinter cleanup 2015-10-16 07:02:39 -07:00
Paul Dix f4d8271006 Merge pull request #4402 from influxdb/pd-tsm-file-size-limit
TSM1 fixes
2015-10-16 07:46:08 -04:00
Paul Dix ba41d6fd92 remove failing test 2015-10-16 07:25:33 -04:00
Ben Johnson c27f8ae3a4 tsm1 meta lint 2015-10-15 15:03:10 -06:00
Jason Wilder ae925625ce Merge pull request #4451 from influxdb/jw-int64
Int64 encoding enhancements
2015-10-15 13:44:55 -06:00
Jason Wilder 3db28cfe99 Fix typos 2015-10-15 13:36:42 -06:00
Jason Wilder c037ae1416 Add run-length encoding to int64 encoder
This encoding can really help counter values with large runs of the same
value (e.g. a counter that is 0 for long periods of time).
2015-10-14 19:23:58 -06:00
Jason Wilder 014c51f32a Delta-encode int64 before bit packing
This will help large integer counters type fields that increment by
small amounts over time.  Instead of storing the larger raw value
in a compressed format, we store the difference from the prior value
in compressed format which allows the value to be stored using
fewer bits.
2015-10-14 19:23:51 -06:00
Ben Johnson f2d23b070b add tsm1 wal quickcheck
This commit adds quickcheck testing for the tsm1 WAL.
2015-10-14 09:38:38 -06:00
Paul Dix a99c9ec5af Ensure rewrite index doesn't write corrupt file.
Fixes #4401
2015-10-10 16:58:25 -04:00
Jason Wilder 629219951a Fix timestamp encoding not using run-length encoding when possible
influx_inpsect uncovered some scenarios where timestamps could be stored using
run-length encoding but were being stored using simple8 which uses more space.
2015-10-09 22:38:17 -06:00
Jason Wilder 758359accc Prevent panic in DecodeSameTypeBlock
If DecodeSameTypeBlock is called on on an empty Values slice, it would
panic with an index out of bounds error.  This func can actually be removed
because DecodeBlock can determine what type of values are encoded already.

This will still panic if the block cannot be decoded due to other reasons.

Fixes #4365
2015-10-09 12:52:23 -06:00
Ben Johnson 2b3bb5336d add tsm1 quickcheck tests 2015-10-08 11:59:57 -06:00
Jason Wilder 79185fc1dc Fix index out of bounds panic in int64Decoder
Code was missing a check for when we did not have anymore bytes to decode
so it panic when we tried to decode the empty slice.
2015-10-08 11:21:19 -06:00
Jason Wilder b3343a6d2a Fix similar float values encoding overflow
If similar float values were encoded, the number of leading bits would
overflow the 5 available bits to store them (e.g. store 33 in 5 bits).  When
decoding, the values after the overflowed value would spike to very large and
small values.

To prevent the overflow, we clamp the value to 31 which is the maximum
number of leading zero bits we can encoded.

Fixes #4357
2015-10-07 15:05:56 -06:00
Paul Dix b11308133a Only limit field count for non-tsm engines 2015-10-06 15:49:37 -07:00
Paul Dix 40ff4f4a86 Change default to bz1 2015-10-06 15:30:34 -07:00
Paul Dix be477b2aab Fix cursor bug on index 2015-10-06 12:26:45 -07:00
Paul Dix 267f34b94e Updates based on PR feedback 2015-10-05 20:09:56 -04:00
Paul Dix 26a93ec23e Fix deletes not kept if shutdown before flush on tsm1 2015-10-05 20:09:56 -04:00
Paul Dix bb398daf75 Updates based on @otoolp's PR comments 2015-10-05 20:09:56 -04:00
Jason Wilder c6f2f9cec2 Avoid duplicating values slice when encoding 2015-10-05 20:09:56 -04:00
Jason Wilder cb28dabf62 Make DecodeBlock panic if block size is too small
Should never get a block size 9 bytes since Encode always returns the min
timestampe and a 1 byte header.  If we get this, the engine is confused.
2015-10-05 20:09:56 -04:00
Jason Wilder b0449702e5 Fix comment typos 2015-10-05 20:09:56 -04:00
Paul Dix d9f94bdeeb Add db crash recovery 2015-10-05 20:09:56 -04:00
Jason Wilder 1d754db00b Propogate all encoding errors to engine
Avoid panicing in lower level code and allow the engine to decide what
it should do.
2015-10-05 20:09:56 -04:00
Jason Wilder 4c54c78009 Move compression encoding constants to encoders
Will make it less error-prone to add new encodings int the future
since each encoder has it's set of constants.  There are some placeholder
contants for uncompressed encodings which are not in all encoder currently.
2015-10-05 20:09:56 -04:00
Jason Wilder b1a57e1628 Fix go vet errors 2015-10-05 20:09:56 -04:00
Paul Dix d47ddb5454 Cleanup after pd1 -> tsm1 name change. 2015-10-05 20:09:55 -04:00
Paul Dix 594253cbba Rename storage engine to tsm1, for Time Structured Merge Tree! 2015-10-05 20:09:55 -04:00
Paul Dix 0a11a2fdbc Add deletes to new storage engine 2015-10-05 20:09:55 -04:00
Paul Dix 4beca1a245 Implement reverse cursor direction on pd1 2015-10-05 20:09:55 -04:00
Jason Wilder dbf6228817 Fix go vet 2015-10-05 20:09:55 -04:00
Jason Wilder d9499f0598 Remove zig zag encoding from timestamp encoder
Not needed since all timestamps will be sorted in ascending order.  Negatives
are not possible.
2015-10-05 20:09:55 -04:00
Paul Dix a2b139e006 Fix compaction and multi-write bugs.
* Fix bug with locking when the interval completely covers or is totally inside another one.
* Fix bug with full compactions running when the index is actively being written to.
2015-10-05 20:09:55 -04:00
Jason Wilder 2366baaf0b Handle partial reads when loading WAL
If reading into fixed sized buffer using io.ReadFull, the func can
return io.ErrUnexpectedEOF if the read was short.  This was slipping
through the error handling causing the shard to fail to load.
2015-10-05 20:09:55 -04:00
Paul Dix 3332236527 Fix bugs with writing old data and compaction. 2015-10-05 20:09:55 -04:00
Jason Wilder 5d938d0a8b Add test with duplicate timestamps
Should not happen but makes sure that the same values are encoded
and decoded correctly.
2015-10-05 20:09:55 -04:00
Jason Wilder c47d14540d Add compressed string encoding
Uses snappy to compress multiple strings into a block
2015-10-05 20:09:55 -04:00
Paul Dix 861a15b3e6 Fix panic when data file has small index 2015-10-05 20:09:55 -04:00
Paul Dix be011b8da9 Add logging to pd1 2015-10-05 20:09:54 -04:00
Paul Dix c1213ba367 Update WAL to deduplicate values on Cursor query.
Added test and have failing section for single value encoding.
2015-10-05 20:09:54 -04:00
Jason Wilder 9f9692acdf Rename float encoding tests 2015-10-05 20:09:54 -04:00
Jason Wilder a4d92162ef Add documentation about compression 2015-10-05 20:09:54 -04:00
Jason Wilder 2da52ec4fe Fix deadlock in pd1_test.go
The defer tx.Rollback() tries to free the queryLock but the defer e.Cleanup() runs
before it and tries to take a write lock on the query lock (which blocks) and prevents
tx.Rollback() from acquring the read lock.
2015-10-05 20:09:54 -04:00
Jason Wilder 7e0df18e1a Update simple8b api usage 2015-10-05 20:09:54 -04:00
Jason Wilder cb23f5ac53 Add a compressed boolean encoding
Packs booleans into bytes using 1 bit per value.
2015-10-05 20:09:54 -04:00
Jason Wilder 1196587dc4 Keep track of the type of the block encoded
Allowes decode to decode an arbitrary block correctly.
2015-10-05 20:09:54 -04:00
Jason Wilder 731ae27123 Remove unnecessary allocations from int64 decoder
The decoder was creating a large slice and decoding all values when
instead, it could decode one packed value as needed.
2015-10-05 20:09:54 -04:00
Jason Wilder 95046c1e37 Add test assertions for time encoding type 2015-10-05 20:09:54 -04:00
Jason Wilder e42d8660d0 Fix run length encoding check
Values were run length encoded even when they should not have been
2015-10-05 20:09:54 -04:00
Jason Wilder 092689c131 Reduce memory allocations
Converting between different encoders is wasting a lot of memory allocating different
typed slices.
2015-10-05 20:09:54 -04:00
Jason Wilder ce1d45ecda Use zigzag encoding for timestamp deltas
Previously were using a frame of reference approach where we would
transform the (possibly negative) deltas into positive values from
the minimum.  That required an extra pass over the values as well
as a large slice allocation so we could encode the originals in uncompressed
form if they were too large.

This switches the encoding to use zigzag encoding for the deltas which
removes the extra slice allocation as well as the extra loops.

Improves encoding performane by ~4x.
2015-10-05 20:09:54 -04:00
Jason Wilder 4a37ba868d Add int64 compression
This is using zig zag encoding to convert int64 to uint64s and then using simple8b
to compress them, falling back to uncompressed if the value exceeds 1 << 60.  A
patched encoding scheme would likely be better in general but this provides decent
compression for integers that are not at the ends of the int64 range.
2015-10-05 20:09:53 -04:00
Jason Wilder 42e1babe7f Add time and float compression
Time compression uses an adaptive approach using delta-encoding,
frame-of-reference, run length encoding as well as compressed integer
encoding.

Float compression uses an implementation of the Gorilla paper encoding
for timestamps based on XOR deltas and leading and trailing null suppression.
2015-10-05 20:09:53 -04:00
Jason Wilder 112a03f24c Fix go vet errors 2015-10-05 20:08:58 -04:00
Jason Wilder 88248f3f81 Ensure we have files when iterating in cursor
Prevents index out of bounds panic
2015-10-05 20:08:58 -04:00
Paul Dix db4ad33f3c Update tests to use transactions. Add test for single series 10k points. 2015-10-05 20:06:22 -04:00
Paul Dix 0b33a71bb7 Add recover to maintenance. Change snapshot writer to not use bolt on shard. 2015-10-05 20:06:22 -04:00
Paul Dix 0fd116d1f2 Ensure data files can't be deleted while query is running.
Also ensure that queries don't try to use files that have been deleted.
2015-10-05 20:06:22 -04:00
Paul Dix b1bdb4f15a Make compaction run at most at set duration. 2015-10-05 20:06:22 -04:00
Paul Dix 1c8eac1523 Add PerformMaintenance to store for flushes and compactions.
Also fixed shard to work again with b1 and bz1 engines.
2015-10-05 20:06:22 -04:00
Paul Dix d694454f47 Fix wal flushing, compacting, and write lock 2015-10-05 20:06:22 -04:00
Paul Dix 6c94e738a0 Add support for multiple fields 2015-10-05 20:06:22 -04:00
Paul Dix 667b3e6c08 Handle hash collisions on keys 2015-10-05 20:06:22 -04:00
Paul Dix 48069e782c Add compaction and time range based write locks. 2015-10-05 20:06:22 -04:00
Paul Dix 2eb2a647d6 Add multicursor to combine wal and index 2015-10-05 20:06:22 -04:00
Paul Dix 7baba84a21 Ensure we don't have duplicate values. Fix panic in compaction. 2015-10-05 20:06:22 -04:00
Paul Dix 0770ccc87d Make writes to historical areas possible 2015-10-05 20:06:21 -04:00
Paul Dix 982c28b947 Update to work with new cursor definitiono and Point in models 2015-10-05 20:06:21 -04:00
Paul Dix 365a631b53 Update wal to only open new segment file on flush if its not an idle flush 2015-10-05 20:06:21 -04:00
Paul Dix 7c8ab4f1d8 Add test for close and restart of engine and fix errors. 2015-10-05 20:06:21 -04:00
Paul Dix c5f6c57d7f Update engine to put index at the end of data files 2015-10-05 20:06:21 -04:00
Paul Dix fe1f9a51e5 Add memory settings and WAL backpressure 2015-10-05 20:06:21 -04:00
Paul Dix 5e59cb9393 Update encoding test to work with new interface. 2015-10-05 20:06:21 -04:00
Paul Dix 2100e66437 Add full durability to WAL and flush on startup 2015-10-05 20:06:21 -04:00
Paul Dix 82e1be7527 WIP: more WAL work 2015-10-05 20:06:21 -04:00
Paul Dix 2ba032b7a8 WIP: finish basics of PD1. IT WORKS! (kind of) 2015-10-05 20:06:21 -04:00
Paul Dix 7555ccbd70 WIP: engine work 2015-10-05 20:06:21 -04:00
Paul Dix 12ea1cb26f Add comment about encoding float 2015-10-05 20:06:21 -04:00
Paul Dix fb2a1cb2f3 WIP: skeleton for encoding for new engine 2015-10-05 20:06:20 -04:00
Ben Johnson 96715d7d90 rename Cursor.Seek() to Cursor.SeekTo() 2015-09-22 13:23:16 -06:00
Ben Johnson b213ddad78 refactor cursor 2015-09-22 13:10:12 -06:00
Ben Johnson a5269e9cc7 rename direction to ascending. 2015-09-22 13:09:26 -06:00
Philip O'Toole f9bfb2fcc5 Merge pull request #4142 from influxdb/nil_partition
If partition is nil return on Close immediately
2015-09-17 16:37:37 -07:00
Philip O'Toole 5e991f1703 If partition is nil return on Close immediately 2015-09-16 19:38:02 -07:00
Cory LaNou d19a510ad2 refactor Points and Rows to dedicated packages 2015-09-16 15:33:08 -05:00
Philip O'Toole 40b1068c81 Use unified statMap for WAL
Don't declare distinct stat map for partitions. It's more useful to see
the stats collated together per-WAL. This may need further change in the
future.
2015-09-10 14:23:40 -07:00
Philip O'Toole 13a302e533 WAL tag keys are "path" not "bind". 2015-09-10 14:10:45 -07:00
Philip O'Toole bf55f61edd Add stats for the WAL 2015-09-10 12:30:47 -07:00
Philip O'Toole 5086ea42fa Update WAL comments
[ci skip]
2015-09-10 11:29:43 -07:00
Philip O'Toole 101a4d2a55 Merge pull request #4066 from influxdb/pd-fail-writes-on-memory-pressure
Update WAL to fail writes if pressure too high.
2015-09-10 11:27:32 -07:00
Ben Johnson 733fa0a109 disable bz1 recompression
This commit only appends new blocks of points and disables checks for
recompressing small blocks at the end of a series.
2015-09-10 11:26:29 -06:00
Paul Dix 2d67a9ea22 Update WAL to fail writes if pressure too high.
If the memory gets 5x above the partition size threshold, the WAL will start returning write failures to the clients. This will allow them to backoff their write volume.

Also updated the stress script to track failed requests and output messages on failure and when it returns to success.
2015-09-09 22:41:32 -07:00
Paul Dix 482e00d3e3 Merge pull request #4011 from influxdb/pd-simplify-wal
Simplify WAL to not compact
2015-09-08 22:32:53 -07:00
Paul Dix ecbc79e7e3 Fix disksize to work with new WAL 2015-09-08 19:37:33 -07:00
Paul Dix dfd6b11dda Fix memory compaction logic.
* Only fire a go routine to flush and compact if it isn't already running
* Have a sleep backoff time that scales up as the percentage of memory used goes up
2015-09-08 19:28:29 -07:00
Paul Dix a1fb77198b Simplify WAL to not compact since it doesn't really help the engine anyway 2015-09-08 19:28:29 -07:00
Philip O'Toole 76903f7440 Instrument bz1 engine 2015-09-08 19:09:39 -07:00
Jason Wilder 6b4926257a Add inspect tool
Start of a lower-level file inspection tool.  This currently dumps
summary statistics for the shards, index and WAL that can be used to
understand the shape of the data is in the local shards.  This util
operates on the shards itself and not through the server and is intended
more for debugging/troubleshooting.
2015-09-04 10:38:59 -06:00
Jason Wilder df70a1c8ce Update tests to use Direction enum 2015-09-04 09:00:11 -06:00
Jason Wilder e767feb8d9 Fix order by desc with aggregate function not return any values 2015-09-03 22:31:58 -06:00
Jason Wilder 7fa3d445f7 Support reverse iteration for b1 engine 2015-09-03 22:31:58 -06:00
Jason Wilder 2725757dba Simplify WAL cursor seek movement logic 2015-09-03 22:31:58 -06:00
Jason Wilder 5a6b0afc4b Replace cursor direction with a type 2015-09-03 22:31:48 -06:00
Jason Wilder 7c67e60c4f Add bz1 reverse cursor test 2015-09-03 22:28:36 -06:00
Jason Wilder 5e481181bc Add WAL reverse cursor test 2015-09-03 22:28:36 -06:00
Jason Wilder 266bdc1c2b Support sort by time DESC in wal and bz1 engines 2015-09-03 22:28:36 -06:00
Cory LaNou 6592dcc699 EnableLogging -> LoggingEnabled 2015-09-03 16:56:07 -05:00
Ben Johnson deff06f850 add copier service
This commit adds the copier service which allows one server to
copy shards from another server. This will be used for moving
shards in the cluster.
2015-09-03 13:07:35 -06:00
Ben Johnson b63ebb72a5 limit bz1 quickcheck tests to 10 iterations on CI
This commit checks the `CI` environment variable in the bz1
test suite and limits the quickcheck runs if the value is `true`.
2015-09-02 11:27:11 -06:00
Ben Johnson d52fe89035 add WAL lock to prevent timing lock contention
This commit adds a lock to the WAL log to prevent timing how long
it takes to obtain the Bolt write lock.
2015-09-01 11:08:39 -06:00
Paul Dix 040fa060df Add more detailed logging for compactions 2015-09-01 09:52:20 -04:00
Jason Wilder af2531b373 Use read lock to check current memory size of partition
A write lock was being taken to read the memory size to determine if writes
should be paused.  What happens is that writers get blocked indefintely when
trying to acquire a write lock which makes writes pause (or stop) for long periods
of time.
2015-08-28 15:11:30 -06:00
Jason Wilder 6ba17eca36 Reduce lock contention on Log.WritePoints
The log was deferring the release of the read lock on the WAL.  This had
the affect that a read-lock was held until after the partition finished writing
(which maintains it's own locks).  The read lock is only needed around the call
to pointsToPartions so it can get a consistent copy of the points to write.  After
that calls returns, a lock is not needed so free it immediatedly.
2015-08-28 15:11:30 -06:00
Jason Wilder f5f8f04116 Fix panic in addToCache
addToCache is called in a goroutine and can panic if the server is closed while opening.  If
part of the open func errors, it returns an error and immediately calls close.  close sets
p.cache to nil which causes the goroutine trying to initialized the cache to panic as well.  The
goroutine should run under a write lock to avoid this race/panic.
2015-08-28 13:01:17 -06:00
Jason Wilder eb4a8d4f4a Fix panic when logging error in WAL
If LoadMetadataIndex() tries to log an error, it causes a panic because the
logger is not set until Open() is called, which is after LoadMetaDataIndex() returns.
Instead, just set the logger up when the WAL is created.
2015-08-28 12:59:38 -06:00
Ben Johnson 3ce001929c Use 4KB default block size for bz1
This commit changes the default block size from 64KB to 4KB for
bz1. This was lowered because small blocks were being uncompressed,
merged, recompressed, and inserted for a large portion of updates.
This became slower and slower over time until it reached the 64KB
threshold. We moved to the 4KB threshold in order to lower the
impact of this recompression.
2015-08-26 11:05:01 -06:00
dgnorton 2cf6233cbc Merge pull request #3808 from influxdb/dmq-show-measurements2
convert SHOW MEASUREMENTS to a distributed query
2015-08-26 11:43:38 -04:00
Paul Dix d903cc351e Merge pull request #3845 from influxdb/pd-fix-wal-meta-panic
Fix metafile so it doesn't get trampled by other goroutines.
2015-08-25 18:35:03 -04:00
Paul Dix 0d744dafed Fix metafile so it doesn't get trampled by other goroutines.
Fixes #3832 and fixes #3833
2015-08-25 18:23:24 -04:00
Daniel Morsing 71a83b7f9d Remove unused buffer allocation
The buffer allocation in bz1 was unused and I'm fairly certain that it
was harmful to performance if used. For queries that run through a bz1
block, needing to hold on to a 64kb block is expensive. Better to churn
on the allocator and have the blocks be released when they are unused
than to have 64kb hanging around for each series regardless of size.

Thanks to @jwilder for brainstorming this issue with me.
2015-08-25 14:51:17 -06:00
Paul Dix a4735624f8 Merge pull request #3829 from influxdb/pd-fix-missing-data-after-flush
Fix missing data in aggregates with bz1
2015-08-25 16:27:03 -04:00
Paul Dix 8c6af91e93 Fix bug with bz1 where some data would get hidden.
Seeking to the middle of a compressed block wasn't working properly. Fixes #3781
2015-08-25 16:16:59 -04:00
Daniel Morsing 40dab87ac9 Merge pull request #3817 from influxdb/walmem
Walmem
2015-08-25 13:29:42 -06:00
David Norton 6f0ba18904 fix TestDropMeasurementStatement 2015-08-25 10:01:38 -04:00
Daniel Morsing 5455851ac7 move allocation outside struct + gofmt 2015-08-24 15:28:30 -06:00
Daniel Morsing 35b6c7867d reuse memory buffers for marshaling wal entries
By using preallocated buffers for marshaling WAL entries, we can
reduce the amount of memory we allocate.

On a run of `influx_stress -series 10000 -points 1000` this cuts
total allocations from 18684.15MB to 15200.73MB
2015-08-24 14:49:25 -06:00
Daniel Morsing b7bbe8b5e0 remove unused backoffcount field 2015-08-24 10:25:38 -06:00
Paul Dix 981d7175fb Improve WAL flush log output. 2015-08-23 11:28:06 -04:00
Paul Dix 15cf803b57 Ensure WAL cache gets sorted when needed.
Fixes #3792
2015-08-21 17:48:42 -04:00
Daniel Morsing 27162dd904 only convert key to string once. 2015-08-21 11:01:34 -07:00
Paul Dix 73f3dc1e14 Update store to properly manage WAL create/delete.
* Update the store to remove the WAL directories associated with a shard or database when they are deleted.
* Fix the Store so that it creates separate WAL directories for databases and retention policies.
2015-08-21 11:22:04 -04:00
Paul Dix 2882ef88dc Merge pull request #3766 from influxdb/pd-close-wal-before-bolt
Make bz1 close the WAL before closing bolt so it can flush
2015-08-20 15:25:51 -04:00
Paul Dix 51c565e461 Ensure partition only closes current segment if its there 2015-08-20 14:37:02 -04:00
Ben Johnson 9e336bacf9 fix wal close deadlock 2015-08-20 11:56:50 -06:00
Paul Dix 9567b2c8a6 Fix logic with closing partitions 2015-08-20 13:53:59 -04:00
Ben Johnson 8f12cef883 Merge pull request #3735 from benbjohnson/append-threshold
Append to small bz1 blocks
2015-08-20 11:47:34 -06:00
Paul Dix 4e7631a135 Merge pull request #3765 from influxdb/pd-fix-wal-io-reads
Fix reads of metadata file in WAL
2015-08-20 13:08:29 -04:00
Ben Johnson e57d60210a Append to small bz1 blocks
This commit changes the bz1 append to check for a small
ending block first. If the block is below the threshold
for block size then it is rewritten with the new data
points instead of having a new block written.
2015-08-20 10:52:52 -06:00
Paul Dix e817036952 Make bz1 close the WAL before closing bolt so it can flush, fix locking on write. 2015-08-20 12:51:47 -04:00
Ben Johnson 6c4297ece5 Add bz1 size benchmarks
This commit add benchmarks to show the size difference between
different block sizes.
2015-08-20 10:22:29 -06:00
Paul Dix 72da8d9741 Merge pull request #3750 from influxdb/pd-fix-wal-logging
Fix WAL logging enable.
2015-08-20 12:05:01 -04:00
Paul Dix 370f008220 Fix reads of metadata file in WAL 2015-08-20 10:52:29 -04:00
Paul Dix 1f21d50005 Fix logging in segments and style on log messages 2015-08-20 10:43:25 -04:00
Paul Dix 13d606eaf6 Fix bug querying data from WAL while compacting.
If a flush is happening and you bring up a cursor for a series, if that series didn't have any data in the cache (after the flush started) then it would return no data. What it should have done instead is return the data that is in the flush cache, which is held in separate area of memory until it is committed to the index.
2015-08-20 09:34:02 -04:00
Paul Dix 564625eef7 Fix WAL logging enable. 2015-08-19 18:45:12 -04:00
Paul Dix 4c1f7110f8 Make the WAL cursor create a copy of the cache 2015-08-19 17:25:44 -04:00
Paul Dix c31b88de60 Merge pull request #3569 from influxdb/pd-wal
Add initial WAL implementation and tests
2015-08-18 20:45:32 -04:00
Paul Dix 028d0a6d7d Fix compaction logging, make default idle flush interval 5 minutes. 2015-08-18 20:41:03 -04:00
Ben Johnson 0f2d66fb70 add WAL recovery 2015-08-18 15:08:01 -06:00
Paul Dix 9df3b7d828 Add WAL configuration options 2015-08-18 16:59:54 -04:00
Paul Dix 30bcd3e0e4 Combine all WAL partition cache maps into one 2015-08-18 10:18:06 -04:00
Paul Dix a3cdf0b97c Ensure that metadata is always loaded out of the index in sorted order 2015-08-18 08:27:09 -04:00
Paul Dix 41cf76f722 Fix vet 2015-08-18 08:15:02 -04:00
Paul Dix a509df0484 Compress metadata, add Delete to WAL.
* All metadata for each shard is now stored in a single key with compressed value
* Creation of new metadata no longer requires a syncrhnous write to Bolt. It is passed to the WAL and written to Bolt periodically outside the write path
* Added DeleteSeries to WAL and updated bz1 to remove series there when DeleteSeries or DropMeasurement are called
2015-08-18 08:10:51 -04:00
Paul Dix 3348dab4e0 Fix bug with new shards not getting series data persisted. 2015-08-16 15:45:09 -04:00
Paul Dix 9a53406e55 remove extraneous debug stuff 2015-08-16 12:46:50 -04:00
Paul Dix 6776014047 Fix bug in stress script, remove extraneous printlns 2015-08-16 12:46:50 -04:00
Paul Dix a77a91933e WIP: fix bug with how bz writes index. fix bug with wal not having index set. 2015-08-16 12:46:50 -04:00
Paul Dix b583b896ce Integrate WAL and BZ1 and make BZ1 the default engine. 2015-08-16 12:46:50 -04:00
Paul Dix 301b014f3f Make WAL flush after inactive for writes for a given interval. 2015-08-16 12:46:50 -04:00
Paul Dix d4b04510ab Make flush check configurable to avoid race in tests 2015-08-16 12:46:49 -04:00
Paul Dix 006403ce1d Add WAL back pressure when over memory threshold 2015-08-16 12:46:49 -04:00
Paul Dix 1bffb70a61 Refactoring and cleanup based on PR comments 2015-08-16 12:46:49 -04:00
Paul Dix eebdd5b7db Add initial WAL implementation and tests 2015-08-16 12:46:49 -04:00
Daniel Morsing 432fa31060 protect engine points cache from concurrent modifications.
Creating a cursor would access the engine cache concurrently with
writes, causing data races. Fix by adding a mutex around cache
accesses.
2015-08-14 14:02:03 -07:00
Ben Johnson 10c1ae782a fix duplicate points in b1/cursor
This commit fixes the b1 cursor so that reads from either the cache
or bolt buffer will check against the previously read key to ensure
that two of the same keys are not returned.

Fixes #3571.
2015-08-11 13:43:44 -06:00
Ben Johnson 25293052b6 add b1 test harness 2015-08-10 12:46:57 -06:00
Ben Johnson 394e9635cf fix bz1 quickcheck bugs
This commit fixes issues found from using a more complex `testing/quick`
implementation of the `WriteIndex()` test. The newer test inserts
multiple sets of random data that's confined to a smaller random space
so there's more chance of overlapping data.

The fixes were primarily around inserting old data or inserting the same
timestamp multiple times for a single write. The block splitting was not
working correctly before and the sorting and deduping was not handled
correctly.
2015-08-06 15:12:48 -06:00
Ben Johnson f7111e037b add bz1 testing/quick coverage 2015-08-04 18:36:14 -06:00
Ben Johnson 4077148245 refactor bz1 to integrate with WAL 2015-08-03 14:32:17 -06:00
Ben Johnson 6be31e7f15 2015-08-03 14:32:17 -06:00
Ben Johnson de09c02874 add benchmarks 2015-08-03 14:32:17 -06:00
Ben Johnson 1ada790de7 add bz1 storage engine 2015-08-03 14:32:17 -06:00
Ben Johnson a9cbf6c857 Rename v1 engine to b1
This commit changes the 'v1' engine to 'b1' to represent "bolt v1".
2015-07-29 08:55:07 -06:00
Jason Wilder 37c971bb82 Fix querying measurements with spaces
Fixes #3319
2015-07-22 14:49:54 -06:00
Ben Johnson a7f50ae03c refactor storage to engine 2015-07-22 11:08:10 -06:00
Ben Johnson de1f9a3736 refactor tsdb tests into test package 2015-07-22 11:07:06 -06:00