influxdb

Commit Graph

Author	SHA1	Message	Date
Jason Wilder	3fd40d48a1	Merge pull request #6006 from influxdata/jw-deadlock Fix deadlock when running backup	2016-03-14 13:36:45 -06:00
Jason Wilder	9984cd5d6d	Fix skipping blocks at query time when overlaps exist Depending on how data is written across TSM files, it was possible to skip over some blocks at query time making it looks like data was missing.	2016-03-14 13:11:11 -06:00
Jason Wilder	000459e350	Fix deadlock when running backup A deadlock occurs under write load if a backup is run in between the time when a snapshot compactions has snapshotted the cache and successfully written it to disk. The issus is that the second snapshot call will block on the commit lock while it is holding the engine write lock. This causes all writes to block as well as prevents the currently runnign snapshot compaction from completing because it needs to acquire a read-lock. This PR removes the commit lock and just returns an error if a snapshot is in progress to all any locks being held to be released. The caller can determine whether to retry or giveup.	2016-03-14 12:36:48 -06:00
Joe LeGasse	344e5abd41	Changed type-switch a few places to reduce allocations. Slices of tsm1.Value interfaces are only ever used with all the same types, and the previous code would switch on the type returned from a call to Value(), which allocated and returned an interface{} object for the underlying value. This change instead type-switches on the tsm1.Value object itself, allowing it direct access to the underlying value field, eliminating the unecessary allocations.	2016-03-11 15:57:05 -05:00
Jason Wilder	992c78ee22	Remove period shard maintenance goroutine This is no longer used in tsm and just peridocially locks everything for no reason now.	2016-03-09 17:31:02 -07:00
Edd Robinson	58c03448aa	Merge pull request #5514 from influxdata/er-engine-panic Ensure shards and engine are safely closed	2016-03-09 18:56:36 +00:00
Jason Wilder	e3fef5593c	Merge pull request #5855 from jonseymour/jss-5854-go-master-breaks-build fix tests to cope with future changes to testing.quick.Check - see #5854	2016-03-01 19:03:21 -07:00
Mark Rushakoff	cdcb079769	Tag TSM stats with database, retention policy ... by extracting the db/rp from the given path. Now that the code has "standardized" on extracting db/rp this way, the ShardLocation struct is no longer necessary and thus has been removed. We're back on the previous style of passing the path and walPath to NewShard.	2016-02-29 09:17:34 -08:00
Jon Seymour	73b3a2a056	Merge #5855 (issue: #5854 ). RHS merges cleanly with 0.10.0 Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-29 20:37:32 +11:00
Jon Seymour	716cdd7f41	tsm: modify encoding tests to deal with possible nil slices from testing.quick.Check in go master The current go compiler at the tip of the go master (1d5001af) has a modified implementation of testing.quick.Check that now generates nil slices as test data. (See: https://gophers.slack.com/archives/general/p14567053570110). The existing tests expect round tripping to work in this case but it does not. So, in these cases we change the expectation to reflect actual behaviour. This needs to be checked for reasonableness.	2016-02-29 20:36:19 +11:00
Jason Wilder	8d70d65a82	Convert time.Time to int64	2016-02-25 15:15:01 -07:00
Jon Seymour	11123d2694	Merge #5833 (issue: #5832 ). Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-26 07:59:03 +11:00
Jon Seymour	2c7cd06b99	tsm: cache: need to check that snapshot has been sorted. Previously, the for loop at the end of the method assumed that all entries had been deduplicated, including the entry discovered in the snapshot. However, this wasn't actually true. With this change, we make it true. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-26 07:56:25 +11:00
Jon Seymour	7eabae68de	tsm: cache: add a test for the write sequence {6,1,snapshot,7,2} Consider the write sequence: 6,1,snapshot,7,2. The hot cache gets deduplicated, so is 2,7. Now consider the test if 1 >= 2, this is false, so needSort is not set to true. The problem is the implicit assumption that the snapshot is always sorted by the time that merged() runs, but this may not be true. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-26 07:43:50 +11:00
Jason Wilder	6ebc192298	Merge pull request #5678 from jonseymour/typo doc: typographical, spelling, grammar, word-choice and phrasing improvements.	2016-02-25 09:33:41 -07:00
Jason Wilder	daf68dbbd2	Merge pull request #5701 from jonseymour/js-deduplicate-safety tsm: cache: improve thread safety of Cache.Deduplicate (see #5699)	2016-02-25 09:18:10 -07:00
Jon Seymour	4d98a1cf28	tsm: cache: remove unnecessary lock escalation. Previously, we needed a write lock on the cache because it was the only lock we had available to guard updates to entry.values and entry.needSort. However, now we have a entry-scoped lock for this purpose, we don't need the cache write lock for this purpose. Since merged() doesn't modify the .store or the c.snapshot.sort, there is no need for a write lock on the cache to protect the cache. So, we don't need to escalate here - we simply rely on the entry lock to protect the entries we are iterating over. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-26 01:31:54 +11:00
Jason Wilder	452d77cbaf	tsm: cache: introduce entry locks. Based on @jwilder's alternative to the 'dirty' slice that featured in previous iterations of this fix. Suggested-by: Jason Wilder <jason@influxdb.com> Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-26 00:05:38 +11:00
Jon Seymour	eb7eec078d	tsm: cache: introduce commit lock to Cache Currently two compactors can execute Engine.WriteSnapshot at once. This isn't thread safe since both threads want to make modifications to Cache.snapshot at the same time. This commit introduces a lock which is acquired during Snapshot() and released during ClearSnapshot(), ensuring that at most one thread executes within Engine.WriteSnapshot() at once. To ensure that we always release this lock, but only release the snapshot resources on a successful commit, we modify ClearSnapshot() to accept a boolean which indicates whether the write was successful or not and guarantee to call this function if Snapshot() has been called. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-25 12:10:37 +11:00
Jon Seymour	45d025db99	tsm: cache: add a tests to demonstrate thread safety vulnerabilities There are two tests that show two different one vulnerability. One test shows that Cache.Deduplicate modifies entries in a snapshot's store without a lock while cache readers are deduplicating those same entries while correctly locked. A second test shows that two threads trying to execute the methods that Engine.WriteSnapshot calls will cause concurrent, unsynchronized mutating access to the snapshot's store and entries. The tests fail at this commit and are fixed by subsequent commits. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-25 12:10:31 +11:00
Jon Seymour	d7d81f79da	tsm: cache: add a test that demonstrates concurrent reads are safe Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-25 12:06:10 +11:00
Mark Rushakoff	fb83374389	Track stats for number of series, measurements Per database: track number of series and measurements Per measurement: track number of series	2016-02-24 08:10:16 -08:00
Jon Seymour	530b86ba7d	tsm: cache: restore the semantics of cachedBytes and memSize stats Fixes #5805. This commit undoes a regression introduced by #5789. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-24 06:16:46 +11:00
Jon Seymour	3475356dc9	tsm: cache: fix semantics of snapshotCount statistic to make it useful. Fix for #5804. The commit for #5789 rendered the semantics of snapshotCount statistic useless. This commit restores semantics that have diagnostic value to this statistic. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-24 06:13:54 +11:00
Jason Wilder	017c24c98e	Simplify cache snapshotting The Cache had support for taking multiple snapshots to support writing multiple snapshots to TSM files concurrently if that happened to be a bottleneck. In practice, this is never a bottleneck and we only run one snappshoting goroutine continously per shard which has worked well for all workloads. The multiple snapshot support introduces some unhandled failure scenarios where wal segments could be removed without writing them to TSM files. If a snapshot compaction fails to write due to transient disk errors, subsequent snapshots will continue, but the failed one will not be retried. When the subsequent ones succeeded, all closed wal segments are removed causing data loss. This change simplifies the snapshotting capability to ensure that there is only ever one snapshot. If one fails, the next snapshot will update the existing snapshot and retry all of old and new data. Fixes #5686	2016-02-23 09:38:51 -07:00
Jonathan A. Sternberg	50753de032	Merge pull request #5782 from influxdata/js-5777-audit-panics-in-influxql Remove the non-unreachable panics in the new query engine	2016-02-22 17:18:57 -05:00
Mark Rushakoff	191de2670c	Fix non-compiling test	2016-02-22 13:49:11 -08:00
Mark Rushakoff	fc5c8597ab	Merge pull request #5758 from influxdata/mr-disk-stats Track cache, WAL, filestore stats within tsm1 engine	2016-02-22 13:01:55 -08:00
Jason Wilder	aa2e878019	Fix cache not deduplicating points in some cases The cache had some incorrect logic for determine when a series needed to be deduplicated. The logic was checking for unsorted points and not considering duplicate points. This would manifest itself as many points (duplicate) points being returned from the cache and after a snapshot compaction run, the points would disappear because snapshot compaction always deduplicates and sorts the points. Added a test that reproduces the issue. Fixes #5719	2016-02-22 13:24:42 -07:00
Jonathan A. Sternberg	7a03df2af1	Remove the non-unreachable panics in the new query engine The only panics left are ones that should be unreachable unless there is a bug. Fixes #5777.	2016-02-22 12:52:43 -05:00
Jon Seymour	c93da21a61	tsm: cache: only use NewCache for engine cache's snapshots use a simpler constructor The intent of this change is to avoid writing caches created for snapshot cache instances into the tsm1_cache measurement. We can do this by avoiding use of the NewCache constructor. All other methods are only intended to be called from on the engine cache - never on a snapshot. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-22 15:17:43 +11:00
Jon Seymour	510ee2c790	tsm: cache: during writes, update the memSize statistic outside the lock Since we are not locking but relying on atomic arithmetic, use Add rather than Set. Will also result in slightly less garbage being created. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-22 08:26:35 +11:00
Jon Seymour	9c6efe99f1	tsm: cache: ensure all statistics are initialised on cache creation. The intent of this change is to ensure that all statistic fields of the resulting tsm1_cache measurement are initialized on initialization of the cache. That way, any consumer of those measurements doesn't have to deal with the null case. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-21 15:33:50 +11:00
Jon Seymour	6697c721fb	tsm: cache: add cache throughput related statistics. Complementing and extending the changes in #5758. Add 2 level statistics: * snapshotCount * cacheAgeMs Add 2 counter statistics * cachedBytes * WALCompactionTimeMs snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate cacheAgeMs can be used to guage the level of write activity into the cache The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput. The ratio of difference between first and last WAL compaction time over the interval length is an estimate of percentage of cache throughput consumed. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-20 22:18:57 +11:00
Mark Rushakoff	602043e11b	Add disk stats for FileStore	2016-02-19 16:37:34 -08:00
Mark Rushakoff	d99c09cedd	Add stats for current and old WAL segment sizes	2016-02-19 16:37:34 -08:00
Mark Rushakoff	e76967efb6	Add stats to tsm1.Cache	2016-02-19 16:37:34 -08:00
Joe LeGasse	dc8ed7953d	Remove custom binary-conversion functions Also cleaned up some excess allocations, and other cruft from the code	2016-02-18 13:56:35 -05:00
Ben Johnson	f7e04abef7	remove NaN from query engine This commit removes `math.NaN` returns from float iterators.	2016-02-17 14:11:31 -07:00
Jon Seymour	ab702eb44a	doc: remove the implication that the wal directory is inside the shard directory. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-15 05:33:22 +11:00
Jon Seymour	ed0a112f8e	doc: Add an Errata section intended to capture clarifications prior to full revisions of the text. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-15 00:29:02 +11:00
Jon Seymour	5e563d53c1	doc: revise discussion about cache design The description of the cache design was out of date - reflecting an older design based on checkpoints and evictions. This revision updates the design to describe snapshots and also clarify that if compaction performance falls behind the inbound write rate then writes will fail. Updates based in part of clarifications provided by Jason Wilder. See https://goo.gl/L7AzVu Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-15 00:29:02 +11:00
Jon Seymour	cdc7e28338	doc: rephrasing of how sets of SeriesIterators are generated. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-15 00:29:02 +11:00
Jon Seymour	58d1b7223a	doc: refine TSM file system layout description Minor improvements to phrasing to use the English word 'directory' and slight improvements to grammar.	2016-02-15 00:29:02 +11:00
Jon Seymour	285e0ad17a	doc: refine description of the conclusion of the compaction process. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-15 00:29:02 +11:00
Jon Seymour	008af05f7b	doc: various grammar/word-choice improvements in TSM design document Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-15 00:29:02 +11:00
Jon Seymour	88598f78dc	doc: fix up some spelling errors/typos in .MD files Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-15 00:29:02 +11:00
Jason Wilder	0ce6dd1304	Fix panic: runtime error: index out of range There was a fix in 5b1791, but is not present in the current branch likely due to a rebase issue. The current code panics with a query like: select value from cpu group by host order by time desc limit 1 This fixes the panic as well as prevents #5193 from re-occurring. The issue is that agressively closing the cursors clears out the seeks slice so re-seeking will fail.	2016-02-10 14:00:58 -07:00
Ben Johnson	d9a6a7340f	add canonical paths	2016-02-10 11:30:52 -07:00
Ben Johnson	5a0d1ab7c1	rename influxdb/influxdb to influxdata/influxdb This commit changes all the import and URL references from: github.com/influxdb/influxdb to: github.com/influxdata/influxdb	2016-02-10 10:26:18 -07:00

1 2 3 4 5 ...

327 Commits (0703b85589d55e2ea6fcd42312a6d40e1cd46caa)