influxdb

Commit Graph

Author	SHA1	Message	Date
Jason Wilder	452d77cbaf	tsm: cache: introduce entry locks. Based on @jwilder's alternative to the 'dirty' slice that featured in previous iterations of this fix. Suggested-by: Jason Wilder <jason@influxdb.com> Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-26 00:05:38 +11:00
Jon Seymour	eb7eec078d	tsm: cache: introduce commit lock to Cache Currently two compactors can execute Engine.WriteSnapshot at once. This isn't thread safe since both threads want to make modifications to Cache.snapshot at the same time. This commit introduces a lock which is acquired during Snapshot() and released during ClearSnapshot(), ensuring that at most one thread executes within Engine.WriteSnapshot() at once. To ensure that we always release this lock, but only release the snapshot resources on a successful commit, we modify ClearSnapshot() to accept a boolean which indicates whether the write was successful or not and guarantee to call this function if Snapshot() has been called. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-25 12:10:37 +11:00
Jon Seymour	45d025db99	tsm: cache: add a tests to demonstrate thread safety vulnerabilities There are two tests that show two different one vulnerability. One test shows that Cache.Deduplicate modifies entries in a snapshot's store without a lock while cache readers are deduplicating those same entries while correctly locked. A second test shows that two threads trying to execute the methods that Engine.WriteSnapshot calls will cause concurrent, unsynchronized mutating access to the snapshot's store and entries. The tests fail at this commit and are fixed by subsequent commits. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-25 12:10:31 +11:00
Jon Seymour	d7d81f79da	tsm: cache: add a test that demonstrates concurrent reads are safe Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-25 12:06:10 +11:00
Jon Seymour	530b86ba7d	tsm: cache: restore the semantics of cachedBytes and memSize stats Fixes #5805. This commit undoes a regression introduced by #5789. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-24 06:16:46 +11:00
Jon Seymour	3475356dc9	tsm: cache: fix semantics of snapshotCount statistic to make it useful. Fix for #5804. The commit for #5789 rendered the semantics of snapshotCount statistic useless. This commit restores semantics that have diagnostic value to this statistic. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-24 06:13:54 +11:00
Jason Wilder	017c24c98e	Simplify cache snapshotting The Cache had support for taking multiple snapshots to support writing multiple snapshots to TSM files concurrently if that happened to be a bottleneck. In practice, this is never a bottleneck and we only run one snappshoting goroutine continously per shard which has worked well for all workloads. The multiple snapshot support introduces some unhandled failure scenarios where wal segments could be removed without writing them to TSM files. If a snapshot compaction fails to write due to transient disk errors, subsequent snapshots will continue, but the failed one will not be retried. When the subsequent ones succeeded, all closed wal segments are removed causing data loss. This change simplifies the snapshotting capability to ensure that there is only ever one snapshot. If one fails, the next snapshot will update the existing snapshot and retry all of old and new data. Fixes #5686	2016-02-23 09:38:51 -07:00
Jonathan A. Sternberg	50753de032	Merge pull request #5782 from influxdata/js-5777-audit-panics-in-influxql Remove the non-unreachable panics in the new query engine	2016-02-22 17:18:57 -05:00
Mark Rushakoff	191de2670c	Fix non-compiling test	2016-02-22 13:49:11 -08:00
Mark Rushakoff	fc5c8597ab	Merge pull request #5758 from influxdata/mr-disk-stats Track cache, WAL, filestore stats within tsm1 engine	2016-02-22 13:01:55 -08:00
Jason Wilder	aa2e878019	Fix cache not deduplicating points in some cases The cache had some incorrect logic for determine when a series needed to be deduplicated. The logic was checking for unsorted points and not considering duplicate points. This would manifest itself as many points (duplicate) points being returned from the cache and after a snapshot compaction run, the points would disappear because snapshot compaction always deduplicates and sorts the points. Added a test that reproduces the issue. Fixes #5719	2016-02-22 13:24:42 -07:00
Jonathan A. Sternberg	7a03df2af1	Remove the non-unreachable panics in the new query engine The only panics left are ones that should be unreachable unless there is a bug. Fixes #5777.	2016-02-22 12:52:43 -05:00
Jon Seymour	c93da21a61	tsm: cache: only use NewCache for engine cache's snapshots use a simpler constructor The intent of this change is to avoid writing caches created for snapshot cache instances into the tsm1_cache measurement. We can do this by avoiding use of the NewCache constructor. All other methods are only intended to be called from on the engine cache - never on a snapshot. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-22 15:17:43 +11:00
Jon Seymour	510ee2c790	tsm: cache: during writes, update the memSize statistic outside the lock Since we are not locking but relying on atomic arithmetic, use Add rather than Set. Will also result in slightly less garbage being created. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-22 08:26:35 +11:00
Jon Seymour	9c6efe99f1	tsm: cache: ensure all statistics are initialised on cache creation. The intent of this change is to ensure that all statistic fields of the resulting tsm1_cache measurement are initialized on initialization of the cache. That way, any consumer of those measurements doesn't have to deal with the null case. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-21 15:33:50 +11:00
Jon Seymour	6697c721fb	tsm: cache: add cache throughput related statistics. Complementing and extending the changes in #5758. Add 2 level statistics: * snapshotCount * cacheAgeMs Add 2 counter statistics * cachedBytes * WALCompactionTimeMs snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate cacheAgeMs can be used to guage the level of write activity into the cache The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput. The ratio of difference between first and last WAL compaction time over the interval length is an estimate of percentage of cache throughput consumed. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-20 22:18:57 +11:00
Mark Rushakoff	602043e11b	Add disk stats for FileStore	2016-02-19 16:37:34 -08:00
Mark Rushakoff	d99c09cedd	Add stats for current and old WAL segment sizes	2016-02-19 16:37:34 -08:00
Mark Rushakoff	e76967efb6	Add stats to tsm1.Cache	2016-02-19 16:37:34 -08:00
Joe LeGasse	dc8ed7953d	Remove custom binary-conversion functions Also cleaned up some excess allocations, and other cruft from the code	2016-02-18 13:56:35 -05:00
Ben Johnson	f7e04abef7	remove NaN from query engine This commit removes `math.NaN` returns from float iterators.	2016-02-17 14:11:31 -07:00
liang@qiniu.com	1ad0f933f4	Remove redundant wal files	2016-02-16 20:45:13 +08:00
Ben Johnson	0b3d367e5c	Merge pull request #5623 from influxdata/jw-query-panic Fix panic: runtime error: index out of range	2016-02-10 14:59:04 -07:00
Jason Wilder	0ce6dd1304	Fix panic: runtime error: index out of range There was a fix in 5b1791, but is not present in the current branch likely due to a rebase issue. The current code panics with a query like: select value from cpu group by host order by time desc limit 1 This fixes the panic as well as prevents #5193 from re-occurring. The issue is that agressively closing the cursors clears out the seeks slice so re-seeking will fail.	2016-02-10 14:00:58 -07:00
Justin Nuß	82c276756a	Lint tsdb and tsdb/engine package	2016-02-10 21:33:46 +01:00
Ben Johnson	d9a6a7340f	add canonical paths	2016-02-10 11:30:52 -07:00
Ben Johnson	5a0d1ab7c1	rename influxdb/influxdb to influxdata/influxdb This commit changes all the import and URL references from: github.com/influxdb/influxdb to: github.com/influxdata/influxdb	2016-02-10 10:26:18 -07:00
Jonathan A. Sternberg	d1f7c445e7	Modify iterators to work across shards Aux iterators now ask the iterator creator what series will be returned and determine which aux fields to create based on the results. The `tsdb.Shards` struct also creates a call iterator around the iterators returned from each shard.	2016-02-10 09:40:29 -07:00
Jonathan A. Sternberg	c2d1206177	Implement the fill iterator Fill requires an additional function for IteratorCreator to retrieve the series that will be returned from the iterator. When fill is required for an aggregate, the IteratorCreator will be asked what series will be returned by the created iterator.	2016-02-10 09:40:29 -07:00
Ben Johnson	6204350d65	fix math operations	2016-02-10 09:40:27 -07:00
Ben Johnson	b4cb770a7f	refactor aux iterators	2016-02-10 09:40:27 -07:00
Ben Johnson	b8918a780c	integer support	2016-02-10 09:40:25 -07:00
Jonathan A. Sternberg	583477064c	Check for `tsdb.EOF` when looking for the lowest timestamp of aux fields	2016-02-10 09:40:25 -07:00
Jonathan A. Sternberg	34f14424dd	Filter tags from the condition when building cursors on tsm1	2016-02-10 09:40:25 -07:00
Ben Johnson	00806de9b8	refactor query engine	2016-02-10 09:40:25 -07:00
Ben Johnson	cde973f409	refactor query engine	2016-02-10 09:40:24 -07:00
Gabriel Levine	7d4217ab97	enabled golint for tsdb/engine/wal.go and wal_test.go and updated changelog.	2016-02-09 10:29:09 -05:00
Jason Wilder	2b3c640695	Fix reading too far in fileAccess.readBytes Fixes #5566	2016-02-08 09:08:57 -07:00
Jason Wilder	28ae8b6fe0	Merge pull request #5434 from runner-mei/tsm_tombstone_windows fix TSMReader.Delete() and all unit tests is pass in the windows	2016-02-04 16:27:26 -07:00
Jason Wilder	b635e516e5	Merge pull request #5485 from runner-mei/patch-7 fix munmap bug in the windows	2016-02-04 13:47:51 -07:00
Jason Wilder	5a124e0e0b	Merge pull request #5431 from runner-mei/patch-5 fix determine the file size	2016-02-04 10:24:05 -07:00
INADA Naoki	80a637904d	tsm1: Use unixnano instead of time.Time	2016-02-03 10:05:40 +09:00
INADA Naoki	771253256b	FloatValue uses unixnano instead of time.Time	2016-02-03 09:57:00 +09:00
INADA Naoki	898babf616	add float bench	2016-02-03 03:12:16 +09:00
runner.mei	4ca47103b1	fix TSMReader.Delete() and all unit tests is pass in the windows	2016-01-31 11:32:08 +08:00
runner	bc992fea5e	fix munmap bug in the windows fix munmap bug in the windows fix munmap bug in the windows fix munmap bug in the windows fix munmap bug in the windows	2016-01-31 10:46:46 +08:00
runner	4b7fe70cd3	fix determine the file size fix determine the file size	2016-01-30 14:16:53 +08:00
runner.mei	53f7e03f72	fix TSMReader.Delete() and all unit tests is pass in the windows	2016-01-30 14:15:46 +08:00
Jason Wilder	924275b337	Fix panic preventing wal file truncation Fixes #5455	2016-01-28 21:50:51 -07:00
Jason Wilder	9528c3ea70	Merge pull request #5465 from influxdata/jw-remote-writes Optimize remote writes	2016-01-27 15:47:02 -07:00

1 2 3 4 5 ...

460 Commits (452d77cbaf073dc9862c6f0782211b675bcebe91)