Commit Graph

409 Commits (10db0aafeb8b21459a537d455afe48f8b19e22c4)

Author SHA1 Message Date
Cory LaNou 4d30ea1eb3 minor PR feedback refactor 2016-05-10 08:14:51 -05:00
Cory LaNou a3bf3e2ef1 added baseline backup/restore plumbing 2016-05-10 08:14:51 -05:00
Ben Johnson 078e561820
parallelize iterators 2016-05-09 10:25:30 -06:00
Jason Wilder d99c5e26f6 Fix memory spike when compacting overwritten points
If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction.  If
the series was large, it could cause very large RAM spikes and OOMs.

The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.
2016-05-05 22:31:30 -06:00
Ben Johnson 4c45f8ec32 Merge pull request #6560 from benbjohnson/optimize-tsm1-call-iterator
Move call iterator to series level
2016-05-05 11:13:53 -06:00
Ben Johnson fdf34d4356
move call iterator to series level
This commit moves the `CallIterator` to wrap the individual series
instead of wrapping a shard. This allows individual points to be
aggregated before being merged.

This will cause a small increase in memory usuage per series but
it shows a 20% decrease in query time when there are a moderate
number of points per series.
2016-05-05 09:59:03 -06:00
Jason Wilder a0ac754802 Fix loading huge series into RAM when points are overwritten
In some query scenarios, if there are a lot of points on disk spread
across many blocks in TSM files and a point is overwritten near the
begginning of the shard's timerange, the full series could be loaded
into RAM triggering OOMs and huge allocations.

The issue was that the KeyCursor code that handles overwriting points
had a simple implementation that just deduped the whole series in this
case.  This falls over when the series is quite large.

Instead, the KeyCursor has been changed to only decode blocks with
updated points.  It then keeps track of what section of the blocks
have been read so they are not re-read when the later points are
decoded.

Since the points in a block are always sorted, the code was also changed
to remove the Deduplicate calls since they end up
reallocating the slice.  Instead, we do a sorted merge and re-use
the slice as much as we can.
2016-05-05 09:34:44 -06:00
Jason Wilder 57cb3fdbc0 Merge pull request #6522 from influxdata/tp-tsm-dump
Dump TSM files to line protocol
2016-05-03 10:44:33 -06:00
Jason Wilder 4196554f51 Fix overwriting points returning wrong value
The cursors were returning the wrong value in the case when points
existed in both the cache and tsm files with the same timestamp. The
cache value should have been returned, but the tsm value was returned
incorrectly.

Fixes #6439
2016-05-03 09:21:31 -06:00
Edd Robinson fd77dbe648 Merge pull request #6546 from influxdata/er-build-tag
Fix invalid build tag
2016-05-03 16:00:39 +01:00
Jonathan A. Sternberg a2a5c32770 Merge pull request #6539 from influxdata/js-6495-fix-aggregates-with-empty-shards
Fix aggregate returns when data is missing from some shards
2016-05-03 10:56:21 -04:00
Jonathan A. Sternberg d6d0addcec Fix aggregate returns when data is missing from some shards
If a shard is empty for a specific field and the field type is something
other than a float, a nil iterator would get returned from one of the
empty shards and cause the combined iterators to be cast to the float
type and all other iterator types to be discarded (or for integers, to
be cast).

This is rare since most aggregates don't accept strings or booleans, but
for queries like:

    SELECT distinct(string) FROM mydata

It would result in nothing getting returned if one of the shards didn't
have a value for `string`.

This change modifies the query engine to return nil for the shards
instead of a fake iterator and then to only use the fake iterator if the
final aggregate iterator is nil (meaning that no iterators could be
constructed for the field from any shard).

Fixes #6495.
2016-05-03 10:41:22 -04:00
Edd Robinson d35fa1ec97 Remove redundant windows build tags 2016-05-03 14:22:02 +01:00
Jason Wilder e0304ae3d5 Fix shards not getting assigned to series on restart
Also, simplifies the LoadMetaDataIndex func to not require a *Shard
2016-05-02 11:36:05 -06:00
Jason Wilder 2d09937fd2 Fix removing fully deleted index blocks
If multiple tombstone entries happen to exist for the same key in a
tombstone file, it was possible to panic.  The first application
would remove all index entries and the second time around the code
still assumed entries would exist and would index into the nil slice.

Also fixes a case where the range of time would fully delete all index
entries, but it did not align with math.MinInt64 and math.MaxInt64.  This
would cause the index locations to still exist in the offset slice.  This
is inefficient because the BlockIterator would still scan and decode the block
only to discover that all the values are deleted.  We now just remove it from
the offsets slice in this case since the range of values are deleted.
2016-05-02 11:36:05 -06:00
Jason Wilder 58aa65d5a8 Optimize applyTombstones
When a large tombstone file existed on disk, this code was slow since
it would apply each tombstone to the index one at a time causing the
index to be scanned for each key.

Instead, we group all the tombstones together by timestamp and apply
in bulk so that the index in scan once for each set of tombstones.

If we change to immuntable tombstone files, it might be better to just
write a file where all the keys have the same tombstone so we can re-apply
them efficiently.
2016-05-02 11:36:05 -06:00
Jason Wilder c73c7cea25 Revert filtering index entries in BlockIterator
This was the wrong fix.  The real issue was the tombstones were
being read incorrectly and also applied incorrectly at times.  This
code is slower and not necessary so reverting it.
2016-05-02 11:36:04 -06:00
Jason Wilder f9ace932c0 Fix V2 tombstone reading file position
Each iteration of the loop was incrementing the position by 4 incorrectly.
The position should start at four since the header is 4 bytes.  This
caused tombstones at the end of the file to not be read because the counter
was out of sync with the actual file position which cause the loop to exit early.

Probably better to refactor this to check for io.EOF instead of using the counter.
2016-05-02 11:36:04 -06:00
Jason Wilder bd1009080e Prevent writing empty tombstone files
If you delete from a measurement with a tag those does not match
any series, we would write a empty tombstone file and file to load
it back.
2016-05-02 11:36:04 -06:00
Jason Wilder 8082fc61ba Fix parsing keys when loading database index
The code for parsing a key our of the WAL or TSM files in the engine
was naive and didn't account for measurements with escape chars. This
uses the correct parsing code to parse and load them correctly.

Fixes #6496
2016-04-30 14:47:19 -06:00
Todd Persen 9eb4c1ec57 Fix typo in comment. 2016-04-29 16:26:27 -07:00
Jason Wilder abcb559b09 Remove index meta data when series and measurements are gone
This remove the dropMeta param from the tsdb.Store.DeleteSeries and
lets the shard determine when to remove the meta data from the index
based on what series still have data in the shard.

This uncovered a nasty bug in compactions where a fully deleted series would
prematurely end the compactions and not carry forward the rest of the data
in the TSM file.  This is now fixed as well.
2016-04-29 16:31:57 -06:00
Jason Wilder 4e353867d5 Fix first block not getting purged when deleting series 2016-04-27 17:08:00 -06:00
Ben Johnson f7af787aef
add DELETE query support
This commit adds query language support for deleting series with a
`DELETE` query.
2016-04-27 15:16:23 -06:00
Jason Wilder aefd2ad08b Add DeleteSeries and DeleteSeriesRange 2016-04-27 13:09:53 -06:00
Jason Wilder c306090361 Fix tombstone rename on windows 2016-04-27 13:09:53 -06:00
Jason Wilder 86d37614e4 Remove debugging from test output 2016-04-27 13:09:53 -06:00
Jason Wilder bf3aa5857d Don't add tombstone for timerange not contained by file 2016-04-27 13:09:53 -06:00
Jason Wilder 6042e114a1 Remove tombstoned values during compaction
This will skip blocks that are fully tombstoned as well as remove
points that have been removed within a block.
2016-04-27 13:09:53 -06:00
Jason Wilder 23bbfb2192 Prevent truncated WAL entries from panicing 2016-04-27 13:09:53 -06:00
Jason Wilder 0de21ade40 Add delete range of values support to WAL and cache loader 2016-04-27 13:09:53 -06:00
Jason Wilder d13d01b516 Allow deleting series by time on a shard 2016-04-27 13:09:53 -06:00
Jason Wilder 4d71d2b01f Add support for deleting cache values using time range 2016-04-27 13:09:52 -06:00
Jason Wilder c154cd4b4a Remove TSMReaderOptions
Not used
2016-04-27 13:09:52 -06:00
Jason Wilder c8bd41c2d8 Remove TSM reader Keys func
It's very inneficient and should never be used.
2016-04-27 13:09:52 -06:00
Jason Wilder 7e06d558d5 Update ContainsValue to handle tombstones 2016-04-27 13:09:52 -06:00
Jason Wilder 97504a552c Support time range tombstones in FileStore/KeyCursor 2016-04-27 13:09:52 -06:00
Jason Wilder 27c2bc3f15 Sepearate IndexWriter from TSMIndex
Allows for future versionion of the TSMIndex as well as removing
a lot of unnecessary code.
2016-04-27 13:09:52 -06:00
Jason Wilder bb82331db7 Move TSMIndex defn to reader.go 2016-04-27 13:09:52 -06:00
Jason Wilder 1ac0b01c5a Remove fileAccessor
No longer used
2016-04-27 13:09:52 -06:00
Jason Wilder a789e819a3 Remove NewTSMReaderWithOptions
There are two TSMIndex implementations, the directIndex and the
indirectIndex.  Originally, we only had the directIndex and later
added the indirectIndex and NewTSMReaderWithOptions in order to
allow both indexes to be used in tests and code.  This has created
a problem since we really only use the directIndex for writing and
always use the indirectIndex for reading.

This changes removes the NewTSMReaderWithOptions func so that it is
no longer possible to create a TSMReader with a directIndex.  This
will allow a lot of the block reading code used by the directIndex
to be removed and simplify maintainence.  It also gives better test
coverage of the code that is actually used by the TSM engine now.
2016-04-27 13:09:52 -06:00
Jason Wilder bc6328d196 Add time range support to tombstone files
This adds support for a time range to tombstone files to allow a subset
of points to be deleted instead of the whole series.  It changes the
tombstone file format to a binary format and maintains backwards compatibility
with the old text format tombstone files.
2016-04-27 13:09:52 -06:00
Ben Johnson 286072f65a
update dep: simple8b @ b421ab40 2016-04-22 09:46:05 -06:00
Ben Johnson d204a8b683
optimize tsm1.FloatDecoder
This commit changes the `FloatDecoder.val` from a `float64` type
to a `uint64` to avoid an additional type conversion during read.
Now the type gets converted to a `float64` only on call to `Values()`.
2016-04-21 08:49:12 -06:00
Jason Wilder 87ceb7426a Don't lock the cache while adding entries
Entries have their own locking so the cache doesn't need to be lock
when adding to them.
2016-04-20 16:08:58 -06:00
Jason Wilder fbaa7db54f Don't lock entry when scanning new values to add 2016-04-20 16:00:26 -06:00
Jason Wilder bfa225f149 Merge pull request #6430 from influxdata/jw-cache-load-size
Disable cache max memory size when reloading the cache
2016-04-20 14:35:23 -06:00
Stephen Gutekanst 9dc09c5257 Make logging output location more programmatically configurable (#6213)
This has various benefits:

- Users embedding InfluxDB within other Go programs can specify a different logger / prefix easily.
- More consistent with code used elsewhere in InfluxDB (e.g. services, other `run.Server.*` fields, etc).
- This is also more efficient, because it means `executeQuery` no longer allocates a single `*log.Logger` each time it is called.
2016-04-20 21:07:08 +01:00
Jason Wilder f679787080 Disable cache max memory size when reloading the cache
The cache max memory size is an approximate size and can prevent a
shard from loading at startup.  This change disable the max size
at startup to prevent this problem and sets the limt back after
reloading.

Fixes #6109
2016-04-20 10:41:30 -06:00
Jonathan A. Sternberg c8c38e15cd Merge pull request #6386 from influxdata/js-iterator-next-error
Modify all of the iterators to allow returning an error on Next()
2016-04-20 10:39:53 -04:00