Commit Graph

1126 Commits (2cbddb3efd851d408b9eb0676e168b27dbe7eb79)

Author SHA1 Message Date
Jonathan A. Sternberg 32e42b93ae Merge pull request #6705 from influxdata/js-6701-duplicate-points-with-select
Filter out sources that do not match the shard database/retention policy
2016-05-24 09:48:31 -04:00
Jonathan A. Sternberg 5e7e0bd19b Filter out sources that do not match the shard database/retention policy
If you use a statement like this:

    SELECT value FROM one..cpu, two..cpu

It will access both the `one` and `two` databases as if you had selected
the `cpu` measurement twice for both of them. Updated the `tsdb.Shard`
create iterator function to filter out any sources that do not apply to
that shard so this duplication doesn't happen.

Fixes #6701.
2016-05-23 17:05:33 -04:00
Jason Wilder f48a106860 Optimized timestamp run-length decoding
Removes the up-front allocation of decoded values and return them
as needed.
2016-05-23 14:05:25 -06:00
Edd Robinson 0b2a806789 Merge pull request #6690 from influxdata/jw-shard-size
Fix panic in shard.DiskSize()
2016-05-20 15:29:53 +01:00
Edd Robinson 40732a35d0 Merge pull request #6660 from influxdata/er-vet
Fix vet issues
2016-05-20 11:12:25 +01:00
Jason Wilder d324777bfc Fix panic in shard.DiskSize()
If the wal or data dir is not accessible (possibly deleted), the
DiskSize walk funcs could panic because they did not check the
error passed in.
2016-05-19 23:19:44 -06:00
Jonathan A. Sternberg 5621ccc2ce Remove limit optimization when using an aggregate
The limit optimization was put into the wrong place and caused only part
of the shard to be read when a limit was used. The optimization is
possible, but requires a bit of refactoring to the code here so the call
iterator is created per series before handed to the limit iterator.

Fixes #6661.
2016-05-19 10:29:38 -04:00
Jason Wilder 4c089a56f4 Fix read tombstones: EOF
Due to an bug in TSM tombstone files, it was possible to create
empty tombstone files.  At startup, the TSM file would error out
and not load the TSM file.

Instead, treat it as an empty v1 file so the TSM file can load
correctly.

Fixes #6641
2016-05-18 23:29:25 -06:00
Jason Wilder 7fb7faaaca Fix points already read from being returned more than once
If there were duplicate points in multiple blocks, we would correctly
dedup the points and mark the regions of the blocks we've read.
Unfortunately, we were not excluding the already points as the cursor
moved to points in the later blocks which could cause points to be
return twice incorrectly.

Fixes #6611
2016-05-18 17:21:10 -06:00
Jason Wilder 9f89420b4c Merge pull request #6653 from influxdata/jw-compact-fix
Compaction fixes
2016-05-18 16:10:10 -06:00
Jason Wilder 121195a865 Merge pull request #6665 from influxdata/jw-series-stats
Reload series count stat at startup
2016-05-18 15:58:15 -06:00
Edd Robinson 09dc48b847 Merge pull request #6664 from influxdata/jw-shard-size
Store shard size on disk statistic
2016-05-18 22:39:12 +01:00
Jason Wilder 209dd005c5 Merge pull request #6627 from influxdata/jw-deadlock
Fix possible deadlock when queries and delete series run concurrently
2016-05-18 15:30:37 -06:00
Jason Wilder f2bcf9d9ab Code review fixes 2016-05-18 15:25:56 -06:00
Jason Wilder d32ad26d27 Fix data not getting reloaded
The optimization to speed up shard loading had the side effect of
skipping adding series to the index that already exist.  The skipping
was in the wrong location and also skipped the shards measurementFields
index which is required in order to query that series in the shard.
2016-05-18 15:25:56 -06:00
Jason Wilder e859141b75 Speed up tests
Switched the max keys test to write int64 of the same value so RLE
would kick in and the file size will be smaller (84MB vs 3.8MB).

Removed the chunking test which was skipped because the code will
not downsize a block into smaller chunks now.

Skip MaxKeys tests in various environments because it needs to
write too much data to run reliably.
2016-05-18 15:25:56 -06:00
Jason Wilder eff71cbe23 Rollover to new TSM file when max blocks exceeded
Fixes #6406
2016-05-18 15:25:55 -06:00
Jason Wilder 8fda621d8b Fix memory spike when compacting overwritten points
If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction.  If
the series was large, it could cause very large RAM spikes and OOMs.

The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.

Fixes #6557
2016-05-18 15:25:55 -06:00
Jason Wilder f1ab89561a Reload series count stat at startup 2016-05-18 15:21:57 -06:00
Edd Robinson 28ad7c687b Add const for interval 2016-05-18 22:14:59 +01:00
Jason Wilder cbc551f9dc Collect shard size stats 2016-05-18 22:14:59 +01:00
Jonathan A. Sternberg 946968ba23 Fixing panic in SHOW FIELD KEYS caused by 733a17d
The list of field keys in the index may have differed from the field
keys in the actual shard. Fixing `SHOW FIELD KEYS` so it relies only on
the shard rather than the index.

Fixes #6659.
2016-05-18 14:43:50 -04:00
Edd Robinson f78e67d09c Fix concurrent map access panic 2016-05-18 17:56:50 +01:00
Edd Robinson f680ab0f0d Fix vet issues 2016-05-18 13:34:11 +01:00
Joe LeGasse af432e7d12 Fix loop variable reuse in database close
Fixes #6650
2016-05-17 11:25:39 -04:00
Jonathan A. Sternberg 42cdaf0365 Merge pull request #6529 from influxdata/js-6519-select-tag-key-specifier
Support cast syntax for selecting a specific type
2016-05-16 12:30:14 -04:00
Jonathan A. Sternberg 23f6a706bb Support cast syntax for selecting a specific type
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.

This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.

The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.

Fixes #6519.
2016-05-16 12:08:29 -04:00
Jason Wilder ce141eae37 Merge pull request #6637 from influxdata/jw-revert-compact
Revert "Fix memory spike when compacting overwritten points"
2016-05-16 09:46:24 -06:00
Jason Wilder 23fc9ff748 Revert "Fix memory spike when compacting overwritten points"
This reverts commit d99c5e26f6.
2016-05-16 09:30:34 -06:00
Jonathan A. Sternberg a17f3d960a SHOW TAG VALUES accepts != and !~ in WHERE clause
Fixes #6607.
2016-05-16 08:51:09 -04:00
Jason Wilder 57d4becaec Fix possible deadlock when queries and delete series run concurrently
This locks showeed up in a deadlock systems running queries and
delete series across a large dataset.  Queries should not need to
lock the tsdb.Store for writes
2016-05-13 17:04:12 -06:00
Jason Wilder 5b6f3afefa Limit concurrent shards loading to number of cores available 2016-05-13 15:41:32 -06:00
Jason Wilder 11871958c6 Merge pull request #6618 from influxdata/jw-shard-load
Optimize shard index loading
2016-05-13 14:16:17 -06:00
Jason Wilder 9e54adc719 Speed up drop database
Drop database was closing and deleting each shard dir individually and
serially.  It would then delete the empty database dirs.

This changes drop database to close all shards in parallel and run
one os.RemoveAll to remove everything under the db dir which is more
efficient.

This also reworked the locking to avoid locking the tsdb.Store for
long periods of time.  That can cause queries and writes for other
databases to block as well.
2016-05-13 10:26:28 -06:00
Jason Wilder 0dbd4893da Optimize shard index loading
On data sets with many series and potentially large series keys,
the cost of parsing the key and re-indexing can be high.

Loading the TSM keys into the index was being done repeatedly for
series that were already index by an earlier TSM file.  This was
wasted worked and slows down shard loading.

Parsing the key was also innefficient and allocated a new string
slice.  This was simplified to remove that allocation.
2016-05-12 14:02:42 -06:00
Ben Johnson 7afb73aa99 Merge pull request #6598 from benbjohnson/parallelize-planning
Parallelize query planning
2016-05-12 09:00:58 -06:00
Jonathan A. Sternberg 89346bb618 Merge pull request #6600 from influxdata/0.13
Merge 0.13 release candidate back to master
2016-05-11 13:04:26 -04:00
Ben Johnson 668bae57df
parallelize query planning
This commit changes the `tsm1.Engine` to create individual series
iterators in batches so that it can be parallelized. Iterators
are combined at the end so they can be redistributed to the
parallelized merge iterator.
2016-05-11 10:38:11 -06:00
Cory LaNou c32906a366 Merge pull request #6593 from influxdata/cjl-copyshard
create shard snapshot
2016-05-10 20:01:59 -05:00
Jonathan A. Sternberg 8353b0c20f Merge pull request #6592 from influxdata/js-3451-show-field-keys-with-field-type
Update SHOW FIELD KEYS to return the field type with the field key
2016-05-10 14:13:17 -04:00
Jason Wilder d8490f1170 Merge pull request #6587 from influxdata/jw-validate-fields
Fix for merge values
2016-05-10 11:56:07 -06:00
Jonathan A. Sternberg 733a17d9e9 Update SHOW FIELD KEYS to return the field type with the field key
Fixes #3451.
2016-05-10 13:16:57 -04:00
Cory LaNou f415cf89ad wip 2016-05-10 11:01:03 -05:00
Jason Wilder 9b86bfea2a Merge pull request #6582 from eleme/fix_engine_cache_size
fix cache size of engine
2016-05-10 09:01:03 -06:00
Jason Wilder 8839cabd41 Add benchmark for Merge 2016-05-10 08:39:55 -06:00
Cory LaNou 4d30ea1eb3 minor PR feedback refactor 2016-05-10 08:14:51 -05:00
Cory LaNou a3bf3e2ef1 added baseline backup/restore plumbing 2016-05-10 08:14:51 -05:00
Jason Wilder 4f39cb2f97 Fix case where Merge return unsorted values 2016-05-09 15:40:34 -06:00
Ben Johnson 078e561820
parallelize iterators 2016-05-09 10:25:30 -06:00
Jonathan A. Sternberg 3f4072be7a Fix SHOW TAG VALUES condition to not filter "name" erroneously
Before #6038 was merged, we needed to filter "name" so that it didn't
accidentally hit the code path that used "name" to check the name of a
measurement. This was changed to "_name" to avoid a conflict with a
legitimate tag that used "name" as the key.

SHOW TAG VALUES was never modified to remove the code that filtered out
"name". This removes that line of code so a condition with "name"
doesn't get removed erroneously.

Example:

    SHOW TAG VALUES WITH KEY = host WHERE "name" = 'jsternberg'

Fixes #6581.
2016-05-09 10:27:53 -04:00