Commit Graph

801 Commits (7bd1bd8ab394fbb6e3bd03e323cdb9c91ae6cf90)

Author SHA1 Message Date
Jason Wilder 20f1fb3f7f Replace gotos with anonymous functions 2016-10-03 12:08:53 -06:00
Jason Wilder 750c8b3932 Reduce lock contention in cache.Values
The cache read lock was held for the whole duration of the call when it
only needs to be held at the beginning since entries have their
own locks.
2016-10-03 10:21:54 -06:00
Jason Wilder 1b462312a9 Re-use decoder pools
The decoders were held onto each iterator to avoid creating them all
the time.  Some of them have use quite a bit of memory so they can
be expensive to create when querying across many series.

Intead, more them to a re-usable pool where we create the minimum that
could active be in use.  This reduces garbage as well as makes the iterators
less expensive to create.
2016-10-03 10:21:54 -06:00
Jason Wilder a15a416eaa Fix decoding RLE integer blocks with negative deltas
Integer blocks that were run length encoded could produce the wrong
value when read back out because the deltas were not zig zag decoded
before scaling the final value.  If the deltas were negative, as would
be seen in a counter that decrements by a constant value, the results
would be random with som negative and positive values.

Fixes #7391
2016-10-02 23:51:29 -06:00
Jason Wilder dcb65865a2 Merge pull request #7376 from influxdata/jw-revert
Revert re-using byte slices during compactions
2016-09-28 08:24:35 -06:00
joelegasse 87ecd97e7b Merge pull request #7371 from influxdata/2016-09-27--rw--use-gotos-for-encoding-cleanup
Gotos to simplify uses of the new encoder pools.
2016-09-28 08:57:33 -04:00
Jason Wilder 1755f20d2a Revent re-using byte slices during compactions
This is causing a fatal error: fault panic when packing blocks.
2016-09-27 23:41:06 -06:00
rw c3fc87b619 Remove dangling named return value. 2016-09-27 14:18:32 -07:00
rw fcd425c8c6 Incorporate style feedback from Joe. 2016-09-27 14:07:06 -07:00
rw 47c1c6763c Use encoder reset to save on allocs. 2016-09-27 13:31:35 -07:00
rw 9429a2f96a Gotos to simplify uses of the new encoder pools.
For maintainability.
2016-09-27 11:47:25 -07:00
rw f131d3cc77 Fix off-by-one error that could panic. 2016-09-26 17:03:03 -07:00
Jason Wilder 4b5d989905 Merge pull request #7335 from influxdata/jw-tsm-syscalls
Avoid stat syscall when planning compactions
2016-09-26 12:30:05 -06:00
Jason Wilder 139ef8062e Simplify encoder buffer usage 2016-09-26 12:19:16 -06:00
Jason Wilder 658149a6ff Removed commented out code 2016-09-26 12:19:15 -06:00
Jason Wilder 7f96d78b79 Make encoder re-usable
This allows encoders to be re-used and maintained in a pool to
avoid allocating new ones on every compactions and write of an encoded
block.  The pool used is not a sync.Pool to ensure that the encoders
will not be garbage collected.
2016-09-26 12:19:15 -06:00
Jason Wilder 0401527093 Pre-allocate cache store and entries
These were not sized so they always had to be grown causing
garbage to be created.
2016-09-26 12:19:15 -06:00
Jason Wilder 730ceeea46 Re-used allocated byte slices during compactions 2016-09-26 12:19:15 -06:00
Jason Wilder c2cfd63091 Avoid stat syscall when planning compactions
When the planner runs, it needs to determine if any files have tombstones.
The code to determine if a tombstone existed involved stating the .tombstone
file.  Since the planner runs very frequently when there are many shards, this
causea a lot of system calls that are unnecessary.

Instead, cache the results of the stats calls and only refresh them when we
haven't checked at least once or we write new tombstone data.

This also caches the results of the TSMReader.Stats call to avoid creating
garbage.
2016-09-24 15:53:28 -06:00
Jason Wilder 95682faec2 Merge branch '1.0' into jw-merge-10 2016-09-08 09:00:51 -06:00
Jason Wilder 1a35c0a3fc Fix neverending full compactions
The full compaction planner could return a plan that only included
one generation.  If this happened, a full compaction would run on that
generation producing just one generation again.  The planner would then
repeat the plan.

This could happen if there were two generations that were both over
the max TSM file size and the second one happened to be in level 3 or
lower.

When this situation occurs, one cpu is pegged running a full compaction
continuously and the disks become very busy basically rewriting the
same files over and over again.  This can eventually cause disk and CPU
saturation if it occurs with more than one shard.

Fixes #7074
2016-09-03 17:35:14 -06:00
Jason Wilder a6f6fda415 Fix DeleteSeries when multiple fields exists
The logic for determining whether a series key was already in the
the set of TSM series was too restrictive.  It allowed only the first
field of a series to be added leaving all the remaing fields.
2016-08-31 20:53:10 -06:00
Jason Wilder 190537a557 Fix DeleteSeries when multiple fields exists
The logic for determining whether a series key was already in the
the set of TSM series was too restrictive.  It allowed only the first
field of a series to be added leaving all the remaing fields.
2016-08-31 20:35:35 -06:00
Jonathan A. Sternberg dc2527ce86 Merge branch '1.0' 2016-08-31 14:45:57 -05:00
Jason Wilder 3d411371f2 Merge pull request #7233 from influxdata/jw-stats2
Write path stats
2016-08-29 10:15:23 -06:00
Jason Wilder d878d30d18 Fix shard write stats
* Rename *Fail to *Err for consistency with other metrics
* Use index Series count instead of sepaate counter
2016-08-29 09:46:11 -06:00
Jason Wilder e203323776 Add wal write success/error stats 2016-08-29 09:38:48 -06:00
Jason Wilder 83ca8c3867 Decrement cache memory stat when deleting series 2016-08-29 09:38:41 -06:00
Jason Wilder 03326f993f Add cache write success/error stats 2016-08-29 09:38:32 -06:00
Jason Wilder b31bf798f1 Fix runtime: goroutine stack exceeds 1000000000-byte limit
Fixes #7225
2016-08-29 09:26:48 -06:00
Jonathan A. Sternberg 8b234546a8 Merge pull request #7204 from influxdata/1.0
Merge 1.0 branch to master
2016-08-25 15:20:30 -05:00
Jonathan A. Sternberg 10029caf2f Support negative timestamps in the query engine
Negative timestamps are now supported. We also now refuse two
nanoseconds that are at the edge of the minimum time window. One of the
nanoseconds we do not accept is because we need MinInt64 to be used for
some internal comparisons in the TSM engine and it was causing an
underflow when we subtracted one from the minimum time. The second is so
we can have one minimum time that signifies the default minimum that
nobody can write to (so we can implicitly rewrite the timestamp on
aggregate queries) but still use the explicit timestamp if it is given
to us by the user. We aren't able to tell the difference between if the
user provided it or if it was implicit without those values being
different.

If the default minimum time is used with an aggregate query, we rewrite
the time to be the epoch for backwards compatibility since we believe
that's more important than supporting that extra nanosecond.
2016-08-25 12:52:41 -05:00
Ben Johnson cc628a1097
Fix mmap dereferencing
Adds a missing dereference call to `Close()` as well as fixes
a tag copy issue.
2016-08-24 10:48:07 -06:00
Edd Robinson 90ff713f21 Fix base64 encoding issue in stats
Fixes #7177.
2016-08-22 15:21:31 +01:00
Ben Johnson 8aa224b22d
reduce memory allocations in index
This commit changes the index to point to index data in the shards
instead of keeping it in-memory on the heap.
2016-08-16 14:09:00 -06:00
Jason Wilder 0ea645642b Remove compaction assert that should not be there
This assert was not removed when the issue that cause the assert
to trigger was fixed in 0f5e994.

Fixes #7121
2016-08-08 09:59:45 -06:00
Jason Wilder 19546faab3 Release cursor/iterator resources aggressively 2016-08-03 00:21:39 -06:00
Jason Wilder e8e6bc44a7 Remove defers in TSM reader read path 2016-08-02 16:39:45 -06:00
Jason Wilder 5576e7fedb Simplifications 2016-07-28 20:25:37 -06:00
Jason Wilder 8367771d35 Fix go vet 2016-07-28 20:25:37 -06:00
Jason Wilder 030f1ef622 Include full for tombstone files
The path info only contained the file name which caused tombstone
files to not be removed if there were queries running against
a file that was compacted.

This is now consistent with the TSMReader.Path which returns the
full path info.
2016-07-28 20:25:37 -06:00
Jason Wilder c3fda24cf9 Make sure all in-use files are tracked
break cause the first one to be tracked and all others would
leak as temp files that would not be removed until the server
restarted.
2016-07-28 20:25:37 -06:00
Jason Wilder c1a94e8861 Remove temp TSM files when disabling compactions
If they were left around, re-enabling them again could cause
future compactions to continuously fail.  A restart of the
server would clean them up correctly though.
2016-07-28 20:25:37 -06:00
Jason Wilder 602a2e80ce Ensure aux and cond cursors are closed when iterator is closed 2016-07-28 20:25:37 -06:00
Jason Wilder 5764a730d5 Prevent tombstoning series keys more than once
If there were multiple TSM files and a delete/drop was run,
we would write the delete series to the tombstone file N
times for each file.  This occurred because FileStore.WalkKeys walks
every key in every TSM file which can return duplicate keys.

This issue caused TSM files to be much larger than they should be
and also cause large memory usage during the delete.
2016-07-28 20:25:36 -06:00
Jason Wilder ef8ecf0e90 Apply reload tombstones in batches
This keeps some memory bounds when reloading a TSM files tombstones
so that the heap does not grow exceedintly fast and stay there
after the deletes are applied.
2016-07-28 20:25:36 -06:00
Jason Wilder 4436e65fb9 Apply deletes to TSM files concurrently 2016-07-28 20:25:36 -06:00
Jason Wilder a8c69e222a Use scanner for reading v1 tombstones
Use a bufio.Scanner to read v1 tombstones instead of reading in
the whole file and parsing it from memory.
2016-07-28 20:25:36 -06:00
Jason Wilder 7b8959f6f2 Apply tombstones iteratively at startup
Tombstone were read fully into memory at startup which could consume
a lot of RAM and OOM the process if there were a lot of deleted
series and many TSM files.

This now walks the tombstone file and iteratively applies the tombstone
which uses significantly less RAM.  This may be slightly slower in the
generate cause, but should scale better.
2016-07-28 20:25:36 -06:00
Jason Wilder 7c3d1aac68 Simplify purger.add logic 2016-07-26 13:02:08 -06:00
Jason Wilder cab84ae279 Prevent concurrent compactions from stepping on each other
Normally, compactions do not conflict on the files they are compacting.
If the full cold threshold is set very low, it can cause conflicts where
two compactions compact the same files.  The full compaction was the
only place this could happen as it's planning is greedy.

To make this safer for concurrent execution, the compaction tracks which
files are current being compacted and prevents any new compactions from
starting if the file set overlaps.

Fixes #6595
2016-07-26 12:58:25 -06:00
Jason Wilder ded6e40d47 Remove lastPlanCheck var
This causes full compactions to not run if the server is running, but
after a restart they do run.
2016-07-26 12:58:25 -06:00
Jason Wilder 2f78c4ec83 Fix race when creating temp file
Using os.O_EXCL is safer than checking and then creating the file.
2016-07-26 12:58:25 -06:00
Cory LaNou 063675b928 updates to make snappy compression tests work again 2016-07-22 14:33:20 -05:00
Cory LaNou 968d322d6d finish tsm file exporter 2016-07-21 17:20:51 -05:00
Jason Wilder fb5a143b08 Fix typos 2016-07-21 12:13:04 -06:00
Jason Wilder 13147efb24 Close underlying cursors when closing iterators
If a query is interrupted via kill query, the tsm files managed
by the file store purger would never get removeed because
KeyCursor.Close was never called.

KeyCursor.Close should always be called now.
2016-07-21 12:13:04 -06:00
Jason Wilder 822f409b31 Allow queries to complete before closing TSM files
If a query was running against a file being compacted, we close the file
and the query would end wherever it had read up to.  This could result
in queries that randomly lost data, but running them again showed the
full results.

We now use a reference counting approach and move the in-use files out
of the way in the filestore and allow the queries to complete against
the old tsm files.  The new files are installed and new queries will
use them.

Fixes #5501
2016-07-21 12:13:04 -06:00
Edd Robinson f37e726869 Add trace logging statements to tsdb 2016-07-21 11:14:29 +01:00
Edd Robinson 44231abcbd Add trace logger controlled via DataLoggingEnabled 2016-07-21 11:14:29 +01:00
Edd Robinson 83cc580ff8 Tidy up logging 2016-07-21 11:14:29 +01:00
Mark Rushakoff 518bd3b565 Micro-optimize BooleanDecoder for 20% speedup
benchmark                          old ns/op     new ns/op     delta
BenchmarkBooleanDecoder_2048-4     9954          7846          -21.18%

benchmark                          old allocs     new allocs     delta
BenchmarkBooleanDecoder_2048-4     0              0              +0.00%

benchmark                          old bytes     new bytes     delta
BenchmarkBooleanDecoder_2048-4     0             0             +0.00%
2016-07-20 08:43:05 -07:00
Mark Rushakoff 523aea715a Protect against bounds errors in FloatDecoder 2016-07-19 15:59:27 -07:00
Mark Rushakoff e483689563 Protect against bounds errors in BooleanDecoder 2016-07-19 15:59:27 -07:00
Mark Rushakoff 35e3adc890 Protect against bounds errors in IntegerDecoder 2016-07-19 15:43:27 -07:00
Mark Rushakoff 42b35ca068 Protect against bounds errors in TimeDecoder 2016-07-19 15:43:27 -07:00
Mark Rushakoff be589a6760 Protect against bounds errors in StringDecoder 2016-07-19 15:43:27 -07:00
Mark Rushakoff 5b549ffdfe Handle bounds errors in UnpackBlock 2016-07-19 15:43:27 -07:00
Mark Rushakoff 39f12e376c Defend against some boundary errors in TSM reading 2016-07-19 15:43:27 -07:00
Mark Rushakoff 28f31b4a0c Add test cases to repro corruption panics 2016-07-19 15:36:17 -07:00
Jason Wilder b692ef4f48 Rename throttle package to limiter 2016-07-18 12:00:58 -06:00
Jason Wilder c2370b437b Limit in-flight wal writes/encodings
A slower disk can can cause excessive allocations to occur when
writing to the WAL because the slower encoding and compression occurs
before taking the write lock.  The encoding/compression grabs a large
byte slice from a pool and ultimately waits until it can acquire the
write lock.

This adds a throttle to limit how many inflight WAL writes can be queued
up to prevent OOMing the processess with slower disks and heavy writes.
2016-07-17 23:53:12 -06:00
Jason Wilder 46fdcba6e3 Remove compaction enabled logging
Too verbose
2016-07-17 23:53:12 -06:00
Jason Wilder 2fa28ba1d3 Don't log error when compactions are aborted 2016-07-17 23:53:12 -06:00
Jason Wilder b48d88ce9e Abort running compactions when series are deleted
If a delete is issued while a compaction is running, the a newly
deleted series could re-appear after the compaction completed. This
could occur the compaction had already written the blocks for series
that were just deleted.  When the compaction completes, the newly
written tombstone files would be deleted, essentially undeleting the
series.
2016-07-17 23:53:12 -06:00
Jason Wilder 0f5e994383 Fix panic in full compactions due to duplciate data in blocks
Due to a bug in compactions, it's possible some blocks may have duplicate
points stored.  If those blocks are decoded and re-compacted, an assertion
panic could trigger.

We now dedup those blocks if necessary to remove the duplicate points
and avoid the panic.
2016-07-14 11:32:36 -06:00
Jason Wilder 0264966f5c Add index optimize planning step
For larger datasets, it's possible for shards to get into a state where
many large, dense TSM files exist.  While the shard is still hot for
writes, full compactions will skip these files since they are already
fairly optimized and full compactions are expensive.  If the write volume
is large enough, the shard can accumulate lots of these files.  When
a file is in this state, it's index can contain every series which
causes startup times to increase since each file must parse the full
set of series keys for every file.  If the number of series is high,
the index can be quite large causing large amount of disk IO at startup.

To fix this, a optmize compaction is run when a full compaction planning
step decides there is nothing to do.  The optimize compaction combines
and spreads the data and series keys across all files resulting in each
file containing the full series data for that shard and a subset of the
total set of keys in the shard.

This allows a shard to only store a series key once in the shard reducing
storage size as well allows a shard to only load each key once at startup.
2016-07-14 11:32:36 -06:00
Jason Wilder 5ee20e04a8 Fix compaction level planner
Large files created early in the leveled compactions could cause
a shard to get into a bad state.  This reworks the level planner
to handle those cases as well as splits large compactions up into
multiple groups to leverage more CPUs when possible.
2016-07-14 11:14:09 -06:00
Jonathan A. Sternberg 12a33fe0d3 Add stats and diagnostics to the TSM engine
Track the number of TSM files in the file store and keep engine
statistics related to the number of TSM compactions.
2016-07-07 19:35:55 -05:00
Jonathan A. Sternberg 837a9804cf Refactoring the monitor service to avoid expvar
Truncate the time interval output of the monitor service to be on even
time intervals rather than on every minute based on the start time. This
normalizes the output from the monitor service.
2016-07-07 11:13:58 -05:00
Jason Wilder 2f82d9a525 Truncate the slice when merging the caches 2016-07-05 12:12:21 -05:00
Jason Wilder fdf0bac717 Fix panic: runtime error: index out of range
Fixes #6829
2016-06-27 18:50:48 -06:00
Jason Wilder ca6bfac01a Fix out of order blocks returned during query
If there were blocks in later TSM files that were for overwritten
points or writes into the past, they could be returned more than
once or out of order causing the cursor values to be unsorted.

One effect of this is that graphs in graphana would render with
the line going all over the place in spots.

This might also cause duplicate data to be returned.

Fixes #6738
2016-06-22 17:34:44 -06:00
Jonathan A. Sternberg 7bdcd669a8 Merge pull request #6879 from influxdata/js-prune-deadcode
Removing dead code from every package except influxql
2016-06-22 08:12:19 -05:00
Jonathan A. Sternberg 497db2a6d3 Removing dead code from every package except influxql
The tsdb package had a substantial amount of dead code related to the
old query engine still in there. It is no longer used, so it was removed
since it was left unmaintained. There is likely still more code that is
the same, but wasn't found as part of this code cleanup.

influxql has dead code show up because of the code generation so it is
not included in this pruning.
2016-06-20 22:41:07 -05:00
Jonathan A. Sternberg 8812bc8a93 Remove a double lock in the tsm1 index writer 2016-06-20 17:32:34 -05:00
Jonathan A. Sternberg 6e205ce135 Set the condition cursor instead of aux iterator when creating a nil condition cursor
A copy/paste error had nil cursors destined for a condition cursor get
set to the auxiliary cursor instead. When the number of conditions
exceeded the number of auxiliary fields, this would result in a stack
trace in some situations. When the number of conditions was less than or
equal to the number of auxiliary fields, it means that an auxiliary
cursor may have been overwritten with a nil cursor accidentally and a
leak might have happened since it was never closed.

Fixes #6859.
2016-06-17 14:54:48 -05:00
Jason Wilder ac6addd0b5 Ensure restore doesn't write broken files
Restore would try to open the shard if there was an error.  If there
was an error, the files written are very likely to be partially written
and they can cause the server to panic.

To prevent a shard from trying to open broken files, we now write to
a temp file and rename it to the actual name only after fully writing
and fsyncing the file.
2016-06-07 14:36:46 -06:00
Jason Wilder 838a29cca8 Fix race in cache
If cache.Deduplicate is called while writes are in-flight on the cache, a data race
could occur.

WARNING: DATA RACE
Write by goroutine 15:
  runtime.mapassign1()
      /usr/local/go/src/runtime/hashmap.go:429 +0x0
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).entry()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:482 +0x27e
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).WriteMulti()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:207 +0x3b2
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent.func1()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:421 +0x73

Previous read by goroutine 16:
  runtime.mapiterinit()
      /usr/local/go/src/runtime/hashmap.go:607 +0x0
  github.com/influxdata/influxdb/tsdb/engine/tsm1.(*Cache).Deduplicate()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache.go:272 +0x7c
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent.func2()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:429 +0x69

Goroutine 15 (running) created at:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:423 +0x3f2
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:473 +0xdc

Goroutine 16 (finished) created at:
  github.com/influxdata/influxdb/tsdb/engine/tsm1.TestCache_Deduplicate_Concurrent()
      /Users/jason/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/cache_test.go:431 +0x43b
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:473 +0xdc
2016-06-06 15:45:01 -06:00
Jason Wilder bc76048371 Fix panic in cache.DeleteRange
Deleting keys that did not exist in the cache could cause a panic
because the entry returned would be nil and was not checked.
2016-06-06 14:48:53 -06:00
Jason Wilder a74ea4cbf4 Allow creating shards in a disable state
For restoring a shard, we need to be able to have the shard open,
but disabled.  It was racy to open it and then disable it separately
since writes/queries could occur in between that time.
2016-06-01 16:17:18 -06:00
Jason Wilder d0023dee5d Convert inline errors to constants 2016-05-31 10:51:54 -06:00
Jason Wilder 1ff8ecf4fb Add ability to disable shards
Disabling a shard causes all writes and queries to a shard to return
an error.  This also disables compactions for the shard.
2016-05-31 10:51:54 -06:00
Edd Robinson baf5d505e6 Merge pull request #6754 from influxdata/er-fs
Prevent ReadFloatBlock from panicking when no values
2016-05-31 16:41:29 +01:00
Edd Robinson 003c30989a Check for no values 2016-05-31 16:28:17 +01:00
rw dcec206f2e Dedup `.RUnlock` between two conditionals. 2016-05-29 10:20:58 -07:00
rw 1b160d1af0 Low-contention path for pre-existing cache entries.
This change appears to increase bulk ingestion throughput by 2x-3x in
multiprocessor environments.
2016-05-28 23:50:11 -07:00
Jason Wilder 11959005f4 Switch backup to use shard.Snapshot
This switch the backup shard call to use the shard Snapshot that
internally creates a snapshot by hardlinking all of the TSM and
tombstone files instead.  This reduces the time that the FileStore
is locked and will allow for larger shards to be backup more easily.
2016-05-27 09:30:25 -06:00
David Norton 381059a55c Merge pull request #6736 from influxdata/benchmark-write-points-allocs
Benchmarks to count allocs in WritePoints.
2016-05-27 10:13:17 -04:00
Edd Robinson 6a7f9527e3 Revert d2672a3 and 1e0a4e9 2016-05-27 10:34:14 +01:00
rw 92e7fec5cf Benchmarks to count allocs in WritePoints. 2016-05-26 17:13:14 -07:00
Edd Robinson d2672a3280 Update Go version 2016-05-26 15:26:09 +01:00
Edd Robinson 1e0a4e9119 Move fields under mutex 2016-05-26 12:00:46 +01:00
Jason Wilder d6661060a3 Merge pull request #6719 from shurcooL/fix-tombstone-open-error-check
tsdb/engine/tsm1: Check os.Open error before using file.
2016-05-25 12:11:26 -06:00
Jason Wilder a77dd4fe4c Merge pull request #6725 from influxdata/jw-tsm-query
Fix pathological TSM query case
2016-05-25 11:23:38 -06:00
Jason Wilder 7d50970631 Fix continous compaction edge case
The level planner would keep including the same TSM files to be
recompacted even if they were already quite compacted and split
across several TSM files.

Fixes #6683
2016-05-25 10:36:24 -06:00
Jason Wilder 0b481ff627 Fix pathalogical TSM query case
This fixes a pathalogical query condition cause by and problematic
structuring of TSM files based on how points were written.  The
condition can occur when there are multiple TSM files and a large
number of points are written into the past.  The earlier existing
TSM files must also have points in the past and close to the present
causing their time range to eclipse the later files.

When this condition occurs, some queries can spend an excessive amount
of time merge all the overlapping blocks.

The fix was to constrain the window of overlapping blocks based on
the first one we ran into.  There was also a simple case in the Merge
where we could skip the binary search path and just append the two
inputs.
2016-05-25 09:14:17 -06:00
Dmitri Shuralyov c03ebf896b tsdb/engine/tsm1: Check os.Open error before using file.
os.Open is documented as:

> Open opens the named file for reading. If successful, methods on
> the returned file can be used for reading;

That suggests the file's methods should only be called if opening
was successful. The original code would defer f.Close() right after
os.Open, before ensuring that err is nil, so f.Close() would run
even if os.Open did not return successfully.

Apply https://github.com/golang/go/wiki/CodeReviewComments#indent-error-flow
suggestion to keep the normal path at minimal indentation, and indent
the error handling code instead. This improves code readability.
2016-05-24 21:08:35 -07:00
Jason Wilder f48a106860 Optimized timestamp run-length decoding
Removes the up-front allocation of decoded values and return them
as needed.
2016-05-23 14:05:25 -06:00
Edd Robinson 40732a35d0 Merge pull request #6660 from influxdata/er-vet
Fix vet issues
2016-05-20 11:12:25 +01:00
Jonathan A. Sternberg 5621ccc2ce Remove limit optimization when using an aggregate
The limit optimization was put into the wrong place and caused only part
of the shard to be read when a limit was used. The optimization is
possible, but requires a bit of refactoring to the code here so the call
iterator is created per series before handed to the limit iterator.

Fixes #6661.
2016-05-19 10:29:38 -04:00
Jason Wilder 4c089a56f4 Fix read tombstones: EOF
Due to an bug in TSM tombstone files, it was possible to create
empty tombstone files.  At startup, the TSM file would error out
and not load the TSM file.

Instead, treat it as an empty v1 file so the TSM file can load
correctly.

Fixes #6641
2016-05-18 23:29:25 -06:00
Jason Wilder 7fb7faaaca Fix points already read from being returned more than once
If there were duplicate points in multiple blocks, we would correctly
dedup the points and mark the regions of the blocks we've read.
Unfortunately, we were not excluding the already points as the cursor
moved to points in the later blocks which could cause points to be
return twice incorrectly.

Fixes #6611
2016-05-18 17:21:10 -06:00
Jason Wilder f2bcf9d9ab Code review fixes 2016-05-18 15:25:56 -06:00
Jason Wilder d32ad26d27 Fix data not getting reloaded
The optimization to speed up shard loading had the side effect of
skipping adding series to the index that already exist.  The skipping
was in the wrong location and also skipped the shards measurementFields
index which is required in order to query that series in the shard.
2016-05-18 15:25:56 -06:00
Jason Wilder e859141b75 Speed up tests
Switched the max keys test to write int64 of the same value so RLE
would kick in and the file size will be smaller (84MB vs 3.8MB).

Removed the chunking test which was skipped because the code will
not downsize a block into smaller chunks now.

Skip MaxKeys tests in various environments because it needs to
write too much data to run reliably.
2016-05-18 15:25:56 -06:00
Jason Wilder eff71cbe23 Rollover to new TSM file when max blocks exceeded
Fixes #6406
2016-05-18 15:25:55 -06:00
Jason Wilder 8fda621d8b Fix memory spike when compacting overwritten points
If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction.  If
the series was large, it could cause very large RAM spikes and OOMs.

The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.

Fixes #6557
2016-05-18 15:25:55 -06:00
Edd Robinson f78e67d09c Fix concurrent map access panic 2016-05-18 17:56:50 +01:00
Edd Robinson f680ab0f0d Fix vet issues 2016-05-18 13:34:11 +01:00
Jonathan A. Sternberg 42cdaf0365 Merge pull request #6529 from influxdata/js-6519-select-tag-key-specifier
Support cast syntax for selecting a specific type
2016-05-16 12:30:14 -04:00
Jonathan A. Sternberg 23f6a706bb Support cast syntax for selecting a specific type
Casting syntax is done with the PostgreSQL syntax `field1::float` to
specify which type should be used when selecting a field. You can also
do `field1::field` or `tag1::tag` to specify that a field or tag should
be selected.

This makes it possible to select a tag when a field key and a tag key
conflict with each other in a measurement. It also means it's possible
to choose a field with a specific type if multiple shards disagree. If
no types are given, the same ordering for how a type is chosen is used
to determine which type to return.

The FieldDimensions method has been updated to return the data type for
the fields that get returned. The SeriesKeys function has also been
removed since it is no longer needed. SeriesKeys was originally used for
the fill iterator, but then expanded to be used by auxiliary iterators
for determining the channel iterator types. The fill iterator doesn't
need it anymore and the auxiliary types are better served by
FieldDimensions implementing that functionality, so SeriesKeys is no
longer needed.

Fixes #6519.
2016-05-16 12:08:29 -04:00
Jason Wilder 23fc9ff748 Revert "Fix memory spike when compacting overwritten points"
This reverts commit d99c5e26f6.
2016-05-16 09:30:34 -06:00
Jason Wilder 0dbd4893da Optimize shard index loading
On data sets with many series and potentially large series keys,
the cost of parsing the key and re-indexing can be high.

Loading the TSM keys into the index was being done repeatedly for
series that were already index by an earlier TSM file.  This was
wasted worked and slows down shard loading.

Parsing the key was also innefficient and allocated a new string
slice.  This was simplified to remove that allocation.
2016-05-12 14:02:42 -06:00
Ben Johnson 668bae57df
parallelize query planning
This commit changes the `tsm1.Engine` to create individual series
iterators in batches so that it can be parallelized. Iterators
are combined at the end so they can be redistributed to the
parallelized merge iterator.
2016-05-11 10:38:11 -06:00
Cory LaNou c32906a366 Merge pull request #6593 from influxdata/cjl-copyshard
create shard snapshot
2016-05-10 20:01:59 -05:00
Jason Wilder d8490f1170 Merge pull request #6587 from influxdata/jw-validate-fields
Fix for merge values
2016-05-10 11:56:07 -06:00
Cory LaNou f415cf89ad wip 2016-05-10 11:01:03 -05:00
Jason Wilder 9b86bfea2a Merge pull request #6582 from eleme/fix_engine_cache_size
fix cache size of engine
2016-05-10 09:01:03 -06:00
Jason Wilder 8839cabd41 Add benchmark for Merge 2016-05-10 08:39:55 -06:00
Cory LaNou 4d30ea1eb3 minor PR feedback refactor 2016-05-10 08:14:51 -05:00
Cory LaNou a3bf3e2ef1 added baseline backup/restore plumbing 2016-05-10 08:14:51 -05:00
Jason Wilder 4f39cb2f97 Fix case where Merge return unsorted values 2016-05-09 15:40:34 -06:00
Ben Johnson 078e561820
parallelize iterators 2016-05-09 10:25:30 -06:00
thbourlove 22c2e7e1c5 fix cache memory size of engine 2016-05-09 21:29:34 +08:00
Jason Wilder d99c5e26f6 Fix memory spike when compacting overwritten points
If a large series contains a point that is overwritten, the compactor
would load the whole series into RAM during a full compaction.  If
the series was large, it could cause very large RAM spikes and OOMs.

The change reworks the compactor to merge blocks more incrementally
similar to the fix done in #6556.
2016-05-05 22:31:30 -06:00
Ben Johnson 4c45f8ec32 Merge pull request #6560 from benbjohnson/optimize-tsm1-call-iterator
Move call iterator to series level
2016-05-05 11:13:53 -06:00
Ben Johnson fdf34d4356
move call iterator to series level
This commit moves the `CallIterator` to wrap the individual series
instead of wrapping a shard. This allows individual points to be
aggregated before being merged.

This will cause a small increase in memory usuage per series but
it shows a 20% decrease in query time when there are a moderate
number of points per series.
2016-05-05 09:59:03 -06:00
Jason Wilder a0ac754802 Fix loading huge series into RAM when points are overwritten
In some query scenarios, if there are a lot of points on disk spread
across many blocks in TSM files and a point is overwritten near the
begginning of the shard's timerange, the full series could be loaded
into RAM triggering OOMs and huge allocations.

The issue was that the KeyCursor code that handles overwriting points
had a simple implementation that just deduped the whole series in this
case.  This falls over when the series is quite large.

Instead, the KeyCursor has been changed to only decode blocks with
updated points.  It then keeps track of what section of the blocks
have been read so they are not re-read when the later points are
decoded.

Since the points in a block are always sorted, the code was also changed
to remove the Deduplicate calls since they end up
reallocating the slice.  Instead, we do a sorted merge and re-use
the slice as much as we can.
2016-05-05 09:34:44 -06:00
Jason Wilder 57cb3fdbc0 Merge pull request #6522 from influxdata/tp-tsm-dump
Dump TSM files to line protocol
2016-05-03 10:44:33 -06:00
Jason Wilder 4196554f51 Fix overwriting points returning wrong value
The cursors were returning the wrong value in the case when points
existed in both the cache and tsm files with the same timestamp. The
cache value should have been returned, but the tsm value was returned
incorrectly.

Fixes #6439
2016-05-03 09:21:31 -06:00
Edd Robinson fd77dbe648 Merge pull request #6546 from influxdata/er-build-tag
Fix invalid build tag
2016-05-03 16:00:39 +01:00
Jonathan A. Sternberg a2a5c32770 Merge pull request #6539 from influxdata/js-6495-fix-aggregates-with-empty-shards
Fix aggregate returns when data is missing from some shards
2016-05-03 10:56:21 -04:00
Jonathan A. Sternberg d6d0addcec Fix aggregate returns when data is missing from some shards
If a shard is empty for a specific field and the field type is something
other than a float, a nil iterator would get returned from one of the
empty shards and cause the combined iterators to be cast to the float
type and all other iterator types to be discarded (or for integers, to
be cast).

This is rare since most aggregates don't accept strings or booleans, but
for queries like:

    SELECT distinct(string) FROM mydata

It would result in nothing getting returned if one of the shards didn't
have a value for `string`.

This change modifies the query engine to return nil for the shards
instead of a fake iterator and then to only use the fake iterator if the
final aggregate iterator is nil (meaning that no iterators could be
constructed for the field from any shard).

Fixes #6495.
2016-05-03 10:41:22 -04:00
Edd Robinson d35fa1ec97 Remove redundant windows build tags 2016-05-03 14:22:02 +01:00
Jason Wilder e0304ae3d5 Fix shards not getting assigned to series on restart
Also, simplifies the LoadMetaDataIndex func to not require a *Shard
2016-05-02 11:36:05 -06:00
Jason Wilder 2d09937fd2 Fix removing fully deleted index blocks
If multiple tombstone entries happen to exist for the same key in a
tombstone file, it was possible to panic.  The first application
would remove all index entries and the second time around the code
still assumed entries would exist and would index into the nil slice.

Also fixes a case where the range of time would fully delete all index
entries, but it did not align with math.MinInt64 and math.MaxInt64.  This
would cause the index locations to still exist in the offset slice.  This
is inefficient because the BlockIterator would still scan and decode the block
only to discover that all the values are deleted.  We now just remove it from
the offsets slice in this case since the range of values are deleted.
2016-05-02 11:36:05 -06:00
Jason Wilder 58aa65d5a8 Optimize applyTombstones
When a large tombstone file existed on disk, this code was slow since
it would apply each tombstone to the index one at a time causing the
index to be scanned for each key.

Instead, we group all the tombstones together by timestamp and apply
in bulk so that the index in scan once for each set of tombstones.

If we change to immuntable tombstone files, it might be better to just
write a file where all the keys have the same tombstone so we can re-apply
them efficiently.
2016-05-02 11:36:05 -06:00
Jason Wilder c73c7cea25 Revert filtering index entries in BlockIterator
This was the wrong fix.  The real issue was the tombstones were
being read incorrectly and also applied incorrectly at times.  This
code is slower and not necessary so reverting it.
2016-05-02 11:36:04 -06:00
Jason Wilder f9ace932c0 Fix V2 tombstone reading file position
Each iteration of the loop was incrementing the position by 4 incorrectly.
The position should start at four since the header is 4 bytes.  This
caused tombstones at the end of the file to not be read because the counter
was out of sync with the actual file position which cause the loop to exit early.

Probably better to refactor this to check for io.EOF instead of using the counter.
2016-05-02 11:36:04 -06:00