influxdb

Commit Graph

Author	SHA1	Message	Date
Jon Seymour	4d98a1cf28	tsm: cache: remove unnecessary lock escalation. Previously, we needed a write lock on the cache because it was the only lock we had available to guard updates to entry.values and entry.needSort. However, now we have a entry-scoped lock for this purpose, we don't need the cache write lock for this purpose. Since merged() doesn't modify the .store or the c.snapshot.sort, there is no need for a write lock on the cache to protect the cache. So, we don't need to escalate here - we simply rely on the entry lock to protect the entries we are iterating over. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-26 01:31:54 +11:00
Jason Wilder	452d77cbaf	tsm: cache: introduce entry locks. Based on @jwilder's alternative to the 'dirty' slice that featured in previous iterations of this fix. Suggested-by: Jason Wilder <jason@influxdb.com> Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-26 00:05:38 +11:00
Jon Seymour	eb7eec078d	tsm: cache: introduce commit lock to Cache Currently two compactors can execute Engine.WriteSnapshot at once. This isn't thread safe since both threads want to make modifications to Cache.snapshot at the same time. This commit introduces a lock which is acquired during Snapshot() and released during ClearSnapshot(), ensuring that at most one thread executes within Engine.WriteSnapshot() at once. To ensure that we always release this lock, but only release the snapshot resources on a successful commit, we modify ClearSnapshot() to accept a boolean which indicates whether the write was successful or not and guarantee to call this function if Snapshot() has been called. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-25 12:10:37 +11:00
Jon Seymour	45d025db99	tsm: cache: add a tests to demonstrate thread safety vulnerabilities There are two tests that show two different one vulnerability. One test shows that Cache.Deduplicate modifies entries in a snapshot's store without a lock while cache readers are deduplicating those same entries while correctly locked. A second test shows that two threads trying to execute the methods that Engine.WriteSnapshot calls will cause concurrent, unsynchronized mutating access to the snapshot's store and entries. The tests fail at this commit and are fixed by subsequent commits. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-25 12:10:31 +11:00
Jon Seymour	d7d81f79da	tsm: cache: add a test that demonstrates concurrent reads are safe Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-25 12:06:10 +11:00
Jason Wilder	e32e5ff481	Merge pull request #5807 from jonseymour/jss-5804+5805 tsm: cache: undo statistics regressions #5804, #5805.	2016-02-23 13:46:27 -07:00
Jon Seymour	530b86ba7d	tsm: cache: restore the semantics of cachedBytes and memSize stats Fixes #5805. This commit undoes a regression introduced by #5789. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-24 06:16:46 +11:00
Jon Seymour	3475356dc9	tsm: cache: fix semantics of snapshotCount statistic to make it useful. Fix for #5804. The commit for #5789 rendered the semantics of snapshotCount statistic useless. This commit restores semantics that have diagnostic value to this statistic. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-24 06:13:54 +11:00
Jason Wilder	e0b23fd5b0	Merge pull request #5765 from oiooj/master No need check Meta.Dir twice	2016-02-23 11:20:24 -07:00
Jason Wilder	92ae2a0e2d	Merge pull request #5787 from chris-ramon/handler-query-authorizer Improvement on `run.NewServer` related to `meta.QueryAuthorizer`.	2016-02-23 11:11:56 -07:00
Gunnar	54957a4838	Merge pull request #5785 from influxdata/ga-build Implement --generate option in build script	2016-02-23 09:54:39 -08:00
Jason Wilder	e7c29d5a37	Merge pull request #5789 from influxdata/jw-5686 Simplify cache snapshotting	2016-02-23 10:46:54 -07:00
gunnaraasen	e2d83e53cc	Implement -generate option in build script	2016-02-23 09:08:25 -08:00
Jason Wilder	017c24c98e	Simplify cache snapshotting The Cache had support for taking multiple snapshots to support writing multiple snapshots to TSM files concurrently if that happened to be a bottleneck. In practice, this is never a bottleneck and we only run one snappshoting goroutine continously per shard which has worked well for all workloads. The multiple snapshot support introduces some unhandled failure scenarios where wal segments could be removed without writing them to TSM files. If a snapshot compaction fails to write due to transient disk errors, subsequent snapshots will continue, but the failed one will not be retried. When the subsequent ones succeeded, all closed wal segments are removed causing data loss. This change simplifies the snapshotting capability to ensure that there is only ever one snapshot. If one fails, the next snapshot will update the existing snapshot and retry all of old and new data. Fixes #5686	2016-02-23 09:38:51 -07:00
Jason Wilder	0df6d558c2	Merge pull request #5800 from influxdata/jw-5757-regression Fix data nodes not getting created	2016-02-23 09:22:03 -07:00
Jason Wilder	9ead458399	Fix data nodes not getting created This fixes a regression introduced in #5757 due to the node.ID getting assigned by both the meta and data services. When both roles are active, the data CreateDataNode path was not getting called because a node ID was already assigned. This fixes the issue by seeing if a DataNode already exists for our node ID, and if it does not, we create one.	2016-02-23 09:01:02 -07:00
Chris Ramón	f235852c0b	updates changelog	2016-02-23 00:03:32 -05:00
Chris Ramón	e52accaf90	adds missing srv.Handler.QueryAuthorizer	2016-02-23 00:02:48 -05:00
Jason Wilder	2894234b1e	Merge pull request #5757 from influxdata/jw-cluster Meta node only fixes	2016-02-22 15:44:07 -07:00
Jonathan A. Sternberg	50753de032	Merge pull request #5782 from influxdata/js-5777-audit-panics-in-influxql Remove the non-unreachable panics in the new query engine	2016-02-22 17:18:57 -05:00
Jason Wilder	6f39b355bc	Code cleanups	2016-02-22 15:06:05 -07:00
Jason Wilder	a2d3d44505	Fix creating meta only nodes This fixes a couple of issues with starting meta-only nodes. 1. We were always calling CreateDataNode regardless of whether the the node is running data services. We only call that now when node is data enabled. 2. The node.json was created along-side creating the data node. Since we are not creatinga a data node, this didn't happen anymore. There wasn't a simple way to do this in one place so it's actually handle for when creating a meta or a data node now. Since the ID assigned to the node is the same regardless of role this works in all combinations of roles. 3. The JoinMetaServer didn't return the ID of the joining node which created some races when multiple nodes were joining. The join call now returns that information to the caller. Fixes #5754	2016-02-22 15:06:05 -07:00
Jason Wilder	194d8d4693	Ensure monitor store is disabled for meta only nodes We can't store points locally so ensure it's disabled for now.	2016-02-22 15:05:47 -07:00
Jason Wilder	a437002969	Fix join option in config file The join option was incorrectly exposed on the meta config. It should be at the top-level as a string and propogate down to the meta config as a slice.	2016-02-22 15:05:46 -07:00
Mark Rushakoff	7f457b8852	Merge pull request #5786 from influxdata/mr-fix-tsm1-test-compilation Fix non-compiling test	2016-02-22 14:04:05 -08:00
Mark Rushakoff	191de2670c	Fix non-compiling test	2016-02-22 13:49:11 -08:00
Mark Rushakoff	fc5c8597ab	Merge pull request #5758 from influxdata/mr-disk-stats Track cache, WAL, filestore stats within tsm1 engine	2016-02-22 13:01:55 -08:00
Mark Rushakoff	688863cec5	Update changelog	2016-02-22 12:51:52 -08:00
Jason Wilder	e25b5abf61	Merge pull request #5751 from influxdata/jw-5719 Fix cache not deduplicating points in some cases	2016-02-22 13:41:17 -07:00
Jason Wilder	aa2e878019	Fix cache not deduplicating points in some cases The cache had some incorrect logic for determine when a series needed to be deduplicated. The logic was checking for unsorted points and not considering duplicate points. This would manifest itself as many points (duplicate) points being returned from the cache and after a snapshot compaction run, the points would disappear because snapshot compaction always deduplicates and sorts the points. Added a test that reproduces the issue. Fixes #5719	2016-02-22 13:24:42 -07:00
Jonathan A. Sternberg	7a03df2af1	Remove the non-unreachable panics in the new query engine The only panics left are ones that should be unreachable unless there is a bug. Fixes #5777.	2016-02-22 12:52:43 -05:00
Mark Rushakoff	c7223157a6	Merge commit 'c93da21' into mr-disk-stats	2016-02-22 09:32:56 -08:00
Jonathan A. Sternberg	b6a0b6a65a	Merge pull request #5742 from influxdata/js-ensure-non-empty-column-names Ensure column names get implicitly renamed with conflicts	2016-02-22 08:55:38 -05:00
Jonathan A. Sternberg	87e04b1a46	Merge pull request #5776 from influxdata/js-5773-unsupported-call-panic Replace a panic with returning an error when an unsupported call is used	2016-02-22 08:17:59 -05:00
Edd Robinson	10b9befd82	Merge pull request #5716 from jonseymour/js-tolerate-empty-field-names models: tolerate empty field names when unpacking binary points	2016-02-22 12:44:45 +00:00
Jon Seymour	c93da21a61	tsm: cache: only use NewCache for engine cache's snapshots use a simpler constructor The intent of this change is to avoid writing caches created for snapshot cache instances into the tsm1_cache measurement. We can do this by avoiding use of the NewCache constructor. All other methods are only intended to be called from on the engine cache - never on a snapshot. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-22 15:17:43 +11:00
Jonathan A. Sternberg	6982d5310e	Replace a panic with returning an error when an unsupported call is used Fixes #5773.	2016-02-21 19:39:14 -05:00
Mark Rushakoff	2ab79e75eb	Merge pull request #5775 from jonseymour/jss-5499-extend-tsm-cache-stats tsm: cache: during writes, update the memSize statistic outside the lock	2016-02-21 14:36:56 -08:00
Jon Seymour	510ee2c790	tsm: cache: during writes, update the memSize statistic outside the lock Since we are not locking but relying on atomic arithmetic, use Add rather than Set. Will also result in slightly less garbage being created. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-22 08:26:35 +11:00
Mark Rushakoff	feceb4dae1	Merge pull request #5769 from jonseymour/jss-5499-extend-tsm-cache-stats tsm: cache: ensure all statistics are initialised on cache creation.	2016-02-21 07:25:49 -08:00
Jon Seymour	9c6efe99f1	tsm: cache: ensure all statistics are initialised on cache creation. The intent of this change is to ensure that all statistic fields of the resulting tsm1_cache measurement are initialized on initialization of the cache. That way, any consumer of those measurements doesn't have to deal with the null case. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-21 15:33:50 +11:00
Mark Rushakoff	04645188fa	Merge pull request #5762 from jonseymour/jss-5499-extend-tsm-cache-stats tsm: cache: add cache throughput related statistics.	2016-02-20 10:15:17 -08:00
Jon Seymour	a8877badcd	Update CHANGELOG for #5716 , #5664 Note that this series also includes cherry-pick of #5697. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-21 03:56:37 +11:00
oiooj	f1c027543c	No need check Meta.Dir twice	2016-02-20 23:54:24 +08:00
Jon Seymour	d46e0407a0	Merge #5716 RHS merges cleanly with 0.10.0 maintenance branch. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-20 22:24:03 +11:00
Jon Seymour	9491846047	models: improve handling of points with empty field names or with no fields Influx does not support fields with empty names or points with no fields. NewPoint is changed to validate that all field names are non-empty. AddField is removed because we now require that all fields are specified on construction. NewPointFromByte is changed to return an error if a unmarshaled binary point does not have any fields. newFieldsFromBinary is changed to prevent an infinite loop that can arise while attempting to parse corrupt binary point data. TestNewPointsWithBytesWithCorruptData is changed to reflect the change in the behaviour of NewPointFromByte. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-20 22:22:26 +11:00
Jon Seymour	6697c721fb	tsm: cache: add cache throughput related statistics. Complementing and extending the changes in #5758. Add 2 level statistics: * snapshotCount * cacheAgeMs Add 2 counter statistics * cachedBytes * WALCompactionTimeMs snapshotCount can be used to measure transient write errors that are causing snapshots to accumulate cacheAgeMs can be used to guage the level of write activity into the cache The differences between cachedBytes stats sampled at different times can be used to calculate cache throughput rates The ratio (cachedBytes-diskBytes)/WALCompactionTimeMs can be used calculate WAL compaction throughput. The ratio of difference between first and last WAL compaction time over the interval length is an estimate of percentage of cache throughput consumed. Signed-off-by: Jon Seymour <jon@wildducktheories.com>	2016-02-20 22:18:57 +11:00
Mark Rushakoff	602043e11b	Add disk stats for FileStore	2016-02-19 16:37:34 -08:00
Mark Rushakoff	d99c09cedd	Add stats for current and old WAL segment sizes	2016-02-19 16:37:34 -08:00
Mark Rushakoff	e76967efb6	Add stats to tsm1.Cache	2016-02-19 16:37:34 -08:00

1 2 3 4 5 ...

9526 Commits (4d98a1cf28c5eb64caa194ea284f190ee380fce6) All Branches Search

9526 Commits (4d98a1cf28c5eb64caa194ea284f190ee380fce6)

All Branches