Commit Graph

24 Commits (10db0aafeb8b21459a537d455afe48f8b19e22c4)

Author SHA1 Message Date
Jason Wilder d99c5e26f6 Fix memory spike when compacting overwritten points
If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction.  If
the series was large, it could cause very large RAM spikes and OOMs.

The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.
2016-05-05 22:31:30 -06:00
Jason Wilder a0ac754802 Fix loading huge series into RAM when points are overwritten
In some query scenarios, if there are a lot of points on disk spread
across many blocks in TSM files and a point is overwritten near the
begginning of the shard's timerange, the full series could be loaded
into RAM triggering OOMs and huge allocations.

The issue was that the KeyCursor code that handles overwriting points
had a simple implementation that just deduped the whole series in this
case.  This falls over when the series is quite large.

Instead, the KeyCursor has been changed to only decode blocks with
updated points.  It then keeps track of what section of the blocks
have been read so they are not re-read when the later points are
decoded.

Since the points in a block are always sorted, the code was also changed
to remove the Deduplicate calls since they end up
reallocating the slice.  Instead, we do a sorted merge and re-use
the slice as much as we can.
2016-05-05 09:34:44 -06:00
Jason Wilder 97504a552c Support time range tombstones in FileStore/KeyCursor 2016-04-27 13:09:52 -06:00
Jason Wilder a789e819a3 Remove NewTSMReaderWithOptions
There are two TSMIndex implementations, the directIndex and the
indirectIndex.  Originally, we only had the directIndex and later
added the indirectIndex and NewTSMReaderWithOptions in order to
allow both indexes to be used in tests and code.  This has created
a problem since we really only use the directIndex for writing and
always use the indirectIndex for reading.

This changes removes the NewTSMReaderWithOptions func so that it is
no longer possible to create a TSMReader with a directIndex.  This
will allow a lot of the block reading code used by the directIndex
to be removed and simplify maintainence.  It also gives better test
coverage of the code that is actually used by the TSM engine now.
2016-04-27 13:09:52 -06:00
Ben Johnson 286072f65a
update dep: simple8b @ b421ab40 2016-04-22 09:46:05 -06:00
Ben Johnson 525e22c92b
tsm1 query engine alloc reduction
This commit makes a number of performance improvements to
reduce allocations during query execution. Several objects
and buffers are now reused across the components to avoid
allocations.

Previously a simple `count(value)` query across 1M points
would require 26,000+ allocations. After the changes in
this commit that number has been reduced to 88.
2016-04-11 14:50:59 -06:00
Jason Wilder 9984cd5d6d Fix skipping blocks at query time when overlaps exist
Depending on how data is written across TSM files, it was possible
to skip over some blocks at query time making it looks like data was missing.
2016-03-14 13:11:11 -06:00
Jason Wilder 8d70d65a82 Convert time.Time to int64 2016-02-25 15:15:01 -07:00
Ben Johnson 5a0d1ab7c1 rename influxdb/influxdb to influxdata/influxdb
This commit changes all the import and URL references from:

    github.com/influxdb/influxdb

to:

    github.com/influxdata/influxdb
2016-02-10 10:26:18 -07:00
Ben Johnson 00806de9b8 refactor query engine 2016-02-10 09:40:25 -07:00
Jason Wilder 756421ec4a Look for fully compacted block in addition to max size during compaction
Some data shapes would cause files to grow larger than the max size more
quickly which resulted in them getting skipped by the full compaction planner
at times.  Some datasets that could make this happen are very large keys or
very large numbers of keys (10M).  When this happened, multiple max sized
files would accumulate but the blocks would not be full.  When the shard went
cold for writes, these files would get recompacted down to the optimal size, but
a lot of space would be wasted in the mean time.
2016-01-07 15:18:42 -07:00
Jason Wilder a38c95ec85 Update compactions to run concurrently
This has a few changes in it (unfortuantely).  The main change is to run compactions
concurrently.  While implementing this, a few query and performance bugs showed up that
are also fixed by this commit.
2015-12-23 18:01:11 -07:00
Jason Wilder 9d82e24ca0 Fix performance of dropping large number of keys 2015-12-08 10:47:06 -07:00
Jason Wilder 87892d79da Dedupe points at query time if there are overlapping blocks 2015-12-07 21:10:10 -07:00
Paul Dix 1bee7d1512 Update TSM, remove old version, add config
* remove rolloverTSMFileSize constant that is no longer used
* remove the maxGenerationFileCount since it is no longer a limitation that's necessary with the new compaction scheme. We no longer read WAL segments as part of the compaction so memory is only used as we read in each individual key
* remove minFileCount and switch to a user configurable variable
* remove the mutex from WALSegmentWriter. There's never more than one open in the WAL at one time and it's not exported through any function so the lock on the WAL should be used. This simplified keeping track of the last write time and removed a bunch of unnecessary locks.
* update WALSegmentWriter.Write to take the compressed bytes so that encoding and compression can occur before the call to write (while we don't hold the WAL lock)
* remove a bunch of unnecessary locking in WAL.writeToLog
* Add check for TSM file magic number and vesion
* Remove old tsm, log, and unused cursor code
* Remove references to tsm1dev everywhere except in the inspector
* Clean up config options for compaction and snapshotting
* Remove old TSM configuration options
* Update the config.sample.toml with TSM options
* Update WAL compact to force if it has been cold for writes for a configurable period of time (1h by default)
2015-12-06 18:50:39 -05:00
Jason Wilder 52bec1f7f6 Change TSM file naming to generation-sequence.tsm 2015-12-04 11:51:33 -07:00
Jason Wilder c7e37766e7 Avoid repetitive index searches when iterating over cursors
First pass at TSM cursor iteration ended up searching the file indexes
too frequently and hurt performance.  This changes that to search it once
and then have the cursor hold onto the block locations to seek
to.  Doubles the query performance from the first iteration, but still a lot
of room for improvement.
2015-12-04 10:02:59 -07:00
Jason Wilder adf5c5b223 Replace Next/Prev with Scan 2015-12-03 12:39:13 -07:00
Jason Wilder be59ba3455 Add Prev support to FileStore
Allows read the previous block of values given a timestamp and key.
2015-12-03 12:39:12 -07:00
Jason Wilder 6fba01df89 Implement single field TSM queries 2015-12-03 12:35:36 -07:00
Jason Wilder 4a03469662 Integrate TSM compaction into dev engine 2015-12-02 09:45:23 -07:00
Jason Wilder 9c2be12b65 Add FileStore.Remove func
Allows a TSMFile to be removed from the active set of files managed
by the FileStore.
2015-11-16 09:16:10 -07:00
Jason Wilder ef18f8afb2 Handle TSM key deletions
This writes a tombstone file containing a line per deleted key. This
file is read when a TSMReader is created and any keys listed in the file
are removed from the index.
2015-11-16 08:44:52 -07:00
Jason Wilder 0ab423c7ff Initial FileStore implementation
Provides functionality to load a directory of TSM files (or add them manually)
as well as reading blocks of values for individual key and times.
2015-11-16 08:44:52 -07:00