Commit Graph

460 Commits (452d77cbaf073dc9862c6f0782211b675bcebe91)

Author SHA1 Message Date
Jason Wilder 452d77cbaf tsm: cache: introduce entry locks.
Based on @jwilder's alternative to the 'dirty' slice that featured
in previous iterations of this fix.

Suggested-by: Jason Wilder <jason@influxdb.com>
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-26 00:05:38 +11:00
Jon Seymour eb7eec078d tsm: cache: introduce commit lock to Cache
Currently two compactors can execute Engine.WriteSnapshot at once.

This isn't thread safe since both threads want to make modifications to
Cache.snapshot at the same time.

This commit introduces a lock which is acquired during Snapshot() and
released during ClearSnapshot(), ensuring that at most one thread
executes within Engine.WriteSnapshot() at once.

To ensure that we always release this lock, but only release the
snapshot resources on a successful commit, we modify ClearSnapshot() to
accept a boolean which indicates whether the write was successful or not
and guarantee to call this function if Snapshot() has been called.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-25 12:10:37 +11:00
Jon Seymour 45d025db99 tsm: cache: add a tests to demonstrate thread safety vulnerabilities
There are two tests that show two different one vulnerability.

One test shows that Cache.Deduplicate modifies entries in a snapshot's
store without a lock while cache readers are deduplicating those same
entries while correctly locked.

A second test shows that two threads trying to execute the methods
that Engine.WriteSnapshot calls will cause concurrent, unsynchronized
mutating access to the snapshot's store and entries.

The tests fail at this commit and are fixed by subsequent commits.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-25 12:10:31 +11:00
Jon Seymour d7d81f79da tsm: cache: add a test that demonstrates concurrent reads are safe
Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-25 12:06:10 +11:00
Jon Seymour 530b86ba7d tsm: cache: restore the semantics of cachedBytes and memSize stats
Fixes #5805.

This commit undoes a regression introduced by #5789.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-24 06:16:46 +11:00
Jon Seymour 3475356dc9 tsm: cache: fix semantics of snapshotCount statistic to make it useful.
Fix for #5804.

The commit for #5789 rendered the semantics of snapshotCount statistic
useless. This commit restores semantics that have diagnostic value to
this statistic.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-24 06:13:54 +11:00
Jason Wilder 017c24c98e Simplify cache snapshotting
The Cache had support for taking multiple snapshots to support writing
multiple snapshots to TSM files concurrently if that happened to be
a bottleneck.  In practice, this is never a bottleneck and we only
run one snappshoting goroutine continously per shard which has worked
well for all workloads.

The multiple snapshot support introduces some unhandled failure scenarios
where wal segments could be removed without writing them to TSM files.  If
a snapshot compaction fails to write due to transient disk errors, subsequent
snapshots will continue, but the failed one will not be retried.  When the
subsequent ones succeeded, all closed wal segments are removed causing data
loss.

This change simplifies the snapshotting capability to ensure that there is only
ever one snapshot.  If one fails, the next snapshot will update the existing
snapshot and retry all of old and new data.

Fixes #5686
2016-02-23 09:38:51 -07:00
Jonathan A. Sternberg 50753de032 Merge pull request #5782 from influxdata/js-5777-audit-panics-in-influxql
Remove the non-unreachable panics in the new query engine
2016-02-22 17:18:57 -05:00
Mark Rushakoff 191de2670c Fix non-compiling test 2016-02-22 13:49:11 -08:00
Mark Rushakoff fc5c8597ab Merge pull request #5758 from influxdata/mr-disk-stats
Track cache, WAL, filestore stats within tsm1 engine
2016-02-22 13:01:55 -08:00
Jason Wilder aa2e878019 Fix cache not deduplicating points in some cases
The cache had some incorrect logic for determine when a series needed
to be deduplicated.  The logic was checking for unsorted points and
not considering duplicate points.  This would manifest itself as many
points (duplicate) points being returned from the cache and after a
snapshot compaction run, the points would disappear because snapshot
compaction always deduplicates and sorts the points.

Added a test that reproduces the issue.

Fixes #5719
2016-02-22 13:24:42 -07:00
Jonathan A. Sternberg 7a03df2af1 Remove the non-unreachable panics in the new query engine
The only panics left are ones that should be unreachable unless there is
a bug.

Fixes #5777.
2016-02-22 12:52:43 -05:00
Jon Seymour c93da21a61 tsm: cache: only use NewCache for engine cache's snapshots use a simpler constructor
The intent of this change is to avoid writing caches created for
snapshot cache instances into the tsm1_cache measurement. We can do
this by avoiding use of the NewCache constructor. All other methods
are only intended to be called from on the engine cache - never
on a snapshot.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-22 15:17:43 +11:00
Jon Seymour 510ee2c790 tsm: cache: during writes, update the memSize statistic outside the lock
Since we are not locking but relying on atomic arithmetic,
use Add rather than Set. Will also result in slightly less garbage
being created.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-22 08:26:35 +11:00
Jon Seymour 9c6efe99f1 tsm: cache: ensure all statistics are initialised on cache creation.
The intent of this change is to ensure that all statistic fields of the
resulting tsm1_cache measurement are initialized on initialization of
the cache. That way, any consumer of those measurements doesn't
have to deal with the null case.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-21 15:33:50 +11:00
Jon Seymour 6697c721fb tsm: cache: add cache throughput related statistics.
Complementing and extending the changes in #5758.

Add 2 level statistics:

  * snapshotCount
  * cacheAgeMs

Add 2 counter statistics

  * cachedBytes
  * WALCompactionTimeMs

snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate

cacheAgeMs can be used to guage the level of write activity into the cache

The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates

The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput.

The ratio of difference between first and last WAL compaction time over the interval
length is an estimate of percentage of cache throughput consumed.

Signed-off-by: Jon Seymour <jon@wildducktheories.com>
2016-02-20 22:18:57 +11:00
Mark Rushakoff 602043e11b Add disk stats for FileStore 2016-02-19 16:37:34 -08:00
Mark Rushakoff d99c09cedd Add stats for current and old WAL segment sizes 2016-02-19 16:37:34 -08:00
Mark Rushakoff e76967efb6 Add stats to tsm1.Cache 2016-02-19 16:37:34 -08:00
Joe LeGasse dc8ed7953d Remove custom binary-conversion functions
Also cleaned up some excess allocations, and other cruft from the code
2016-02-18 13:56:35 -05:00
Ben Johnson f7e04abef7 remove NaN from query engine
This commit removes `math.NaN` returns from float iterators.
2016-02-17 14:11:31 -07:00
liang@qiniu.com 1ad0f933f4 Remove redundant wal files 2016-02-16 20:45:13 +08:00
Ben Johnson 0b3d367e5c Merge pull request #5623 from influxdata/jw-query-panic
Fix panic: runtime error: index out of range
2016-02-10 14:59:04 -07:00
Jason Wilder 0ce6dd1304 Fix panic: runtime error: index out of range
There was a fix in 5b1791, but is not present in the current branch likely due to a rebase issue.
The current code panics with a query like:

select value from cpu group by host order by time desc limit 1

This fixes the panic as well as prevents #5193 from re-occurring.  The issue is that agressively
closing the cursors clears out the seeks slice so re-seeking will fail.
2016-02-10 14:00:58 -07:00
Justin Nuß 82c276756a Lint tsdb and tsdb/engine package 2016-02-10 21:33:46 +01:00
Ben Johnson d9a6a7340f add canonical paths 2016-02-10 11:30:52 -07:00
Ben Johnson 5a0d1ab7c1 rename influxdb/influxdb to influxdata/influxdb
This commit changes all the import and URL references from:

    github.com/influxdb/influxdb

to:

    github.com/influxdata/influxdb
2016-02-10 10:26:18 -07:00
Jonathan A. Sternberg d1f7c445e7 Modify iterators to work across shards
Aux iterators now ask the iterator creator what series will be returned
and determine which aux fields to create based on the results.

The `tsdb.Shards` struct also creates a call iterator around the
iterators returned from each shard.
2016-02-10 09:40:29 -07:00
Jonathan A. Sternberg c2d1206177 Implement the fill iterator
Fill requires an additional function for IteratorCreator to retrieve the
series that will be returned from the iterator. When fill is required
for an aggregate, the IteratorCreator will be asked what series will be
returned by the created iterator.
2016-02-10 09:40:29 -07:00
Ben Johnson 6204350d65 fix math operations 2016-02-10 09:40:27 -07:00
Ben Johnson b4cb770a7f refactor aux iterators 2016-02-10 09:40:27 -07:00
Ben Johnson b8918a780c integer support 2016-02-10 09:40:25 -07:00
Jonathan A. Sternberg 583477064c Check for `tsdb.EOF` when looking for the lowest timestamp of aux fields 2016-02-10 09:40:25 -07:00
Jonathan A. Sternberg 34f14424dd Filter tags from the condition when building cursors on tsm1 2016-02-10 09:40:25 -07:00
Ben Johnson 00806de9b8 refactor query engine 2016-02-10 09:40:25 -07:00
Ben Johnson cde973f409 refactor query engine 2016-02-10 09:40:24 -07:00
Gabriel Levine 7d4217ab97 enabled golint for tsdb/engine/wal.go and wal_test.go and updated changelog. 2016-02-09 10:29:09 -05:00
Jason Wilder 2b3c640695 Fix reading too far in fileAccess.readBytes
Fixes #5566
2016-02-08 09:08:57 -07:00
Jason Wilder 28ae8b6fe0 Merge pull request #5434 from runner-mei/tsm_tombstone_windows
fix TSMReader.Delete() and all unit tests is pass in the windows
2016-02-04 16:27:26 -07:00
Jason Wilder b635e516e5 Merge pull request #5485 from runner-mei/patch-7
fix munmap bug in the windows
2016-02-04 13:47:51 -07:00
Jason Wilder 5a124e0e0b Merge pull request #5431 from runner-mei/patch-5
fix determine the file size
2016-02-04 10:24:05 -07:00
INADA Naoki 80a637904d tsm1: Use unixnano instead of time.Time 2016-02-03 10:05:40 +09:00
INADA Naoki 771253256b FloatValue uses unixnano instead of time.Time 2016-02-03 09:57:00 +09:00
INADA Naoki 898babf616 add float bench 2016-02-03 03:12:16 +09:00
runner.mei 4ca47103b1 fix TSMReader.Delete() and all unit tests is pass in the windows 2016-01-31 11:32:08 +08:00
runner bc992fea5e fix munmap bug in the windows
fix munmap bug in the windows

fix munmap bug in the windows

fix munmap bug in the windows

fix munmap bug in the windows
2016-01-31 10:46:46 +08:00
runner 4b7fe70cd3 fix determine the file size
fix determine the file size
2016-01-30 14:16:53 +08:00
runner.mei 53f7e03f72 fix TSMReader.Delete() and all unit tests is pass in the windows 2016-01-30 14:15:46 +08:00
Jason Wilder 924275b337 Fix panic preventing wal file truncation
Fixes #5455
2016-01-28 21:50:51 -07:00
Jason Wilder 9528c3ea70 Merge pull request #5465 from influxdata/jw-remote-writes
Optimize remote writes
2016-01-27 15:47:02 -07:00