Commit Graph

163 Commits (aefe32d70bec661e00396238a1fa301f1fc016da)

Author SHA1 Message Date
Jonas Hahnfeld 89ced057cb Fix compaction logic on infrequent cache snapshots
This change fixes #10511 that manifests when a shard is considered cold
faster than its cache is snapshotted. Previously the code only looked at
the last modification of compacted tsm1 files. Instead the (restored)
Engine.lastModified() also takes the cache into account.

Ports #10522 to master where engine.go has moved and Engine.LastModified()
was deleted because it was unused.
2019-03-28 22:21:59 +01:00
Edd Robinson 9a42202b53 PR feedback 2019-03-26 09:57:01 +00:00
Edd Robinson aa4e652e43 Add reason to total compaction metric
This commit adds a reason label to the total compaction metric. For
snapshots, the reason will indicate why the cache was snapshotted. For
other compactions, the reason label will be blank.
2019-03-25 15:25:03 +00:00
Edd Robinson dbca30dac5 Add integration tests for cache snapshotting 2019-03-25 11:44:01 +00:00
Edd Robinson 55e9ed689f Allow the tsm1.Cache to be snapshotted due to age
This commit adds a new Cache option, via the
`tsm1.CacheConfig.SnapshotAgeDuration` field, which controls the maximum
age the cache can reach before it is snapshotted to a TSM file.

The default value for this option is `0`, which means that the cache
will never be snapshotted based only on age. Setting this value to, for
example, 10 seconds, would result in the cache snapshotting every 10
seconds.

Snapshotting the cache more frequently can provide better durability
guarantees in some circumstances, though more, smaller TSM files will
lead to more work needed to compact them down to larger, more dense
files.

When using InfluxDB with a WAL there isn't really a strong reason to
alter `tsm1.CacheConfig.SnapshotAgeDuration` from `0`.
2019-03-25 11:44:01 +00:00
Edd Robinson af3f7bc9cb Add new cache configuration value 2019-03-25 11:44:01 +00:00
Edd Robinson 4022db03c2 Provide explicit cache snapshot reasons 2019-03-25 11:44:01 +00:00
Edd Robinson c4cc3ca7bc Fix 2019-03-19 15:12:35 +00:00
Edd Robinson f383ec9225 Add ability to use report-tsm programmatically 2019-03-19 14:29:25 +00:00
Edd Robinson 3b39832ba5 Reduce garbage 2019-03-19 14:28:51 +00:00
Edd Robinson a6447b6ca5 Refactor tsm report for 2.0 2019-03-19 14:25:53 +00:00
Edd Robinson fdae1ae5ea Expose field key sep 2019-03-19 14:25:53 +00:00
Jacob Marble 603a1f26e0 use tracing.StartSpanFromContext 2019-03-07 12:12:31 -07:00
Jeff Wendling f53f9cd949 storage: detect conflicting types in a single batch of points
When the WAL was moved up, the validation that happened at the cache
was skipped. This moves the field type validation for a batch of
points up ahead of the WAL again.
2019-03-06 10:30:52 -07:00
Jacob Marble b9c7ec439e
feat(influxd): Tracing refactor (#12318)
* feat(launcher): Tracing to log disabled by default

* remove traceLogger and use opentracing directly

* add Jaeger tracing

* go vet && go fmt
2019-03-04 11:48:11 -08:00
Jeff Wendling 0fae44e219 storage: fix problems with keeping resources alive
This commit adds the pkg/lifecycle.Resource to help manage opening,
closing, and leasing out references to some resource. A resource
cannot be closed until all acquired references have been released.
If the debug_ref tag is enabled, all resource acquisitions keep
track of the stack trace that created them and have a finalizer
associated with them to print on stderr if they are leaked. It also
registers a handler on SIGUSR2 to dump all of the currently live
resources.

Having resources tracked in a uniform way with a data type allows us
to do more sophisticated tracking with the debug_ref tag, as well.
For example, we could panic the process if a resource cannot be
closed within a certain time frame, or attempt to figure out the
DAG of resource ownership dynamically.

This commit also fixes many issues around resources, correctness
during error scenarios, reporting of errors, idempotency of
close, tracking of memory for some data structures, resource leaks
in tests, and out of order dependency closes in tests.
2019-02-28 10:22:01 -07:00
Jacob Marble 4e5253d581
Feat/add zeros to tsm filename (#12174)
* unit tests to confirm The Old Way®

* feat: Increase TSM generation max value to 1 trillion
2019-02-27 14:59:38 -08:00
Jeff Wendling 3bb765279b storage: respond to review comments 2019-02-04 12:26:26 -07:00
Jeff Wendling b4823d11bf storage: double check the cache to avoid deleting keys that still exist 2019-02-04 10:58:17 -07:00
Jeff Wendling 3014733b20 chore: fix staticcheck issues 2019-02-04 10:32:52 -07:00
Jeff Wendling a424bf3e4c tsm1: implement DeleteBucketRange for the Cache 2019-02-04 10:32:52 -07:00
Jeff Wendling 376b347d56 wal: change deletes to be based on DeleteBucket 2019-02-04 10:32:52 -07:00
Jeff Wendling 7f54e816e3 refactor: have retention use DeleteBucketRange 2019-02-04 10:32:52 -07:00
Jeff Wendling aa12144fc7 storage: replay the WAL through the whole engine 2019-02-04 10:32:52 -07:00
Jeff Wendling 6deced1215 refactor: make the WAL part of snapshots again 2019-02-04 10:32:52 -07:00
Jeff Wendling 2989936d5a refactor: write to the WAL again 2019-02-04 10:32:52 -07:00
Jeff Wendling a3e66755ca refactor: move value aliases into its own file 2019-02-04 10:32:52 -07:00
Jeff Wendling 2f46937527 refactor: move value package up to tsdb 2019-02-04 10:32:52 -07:00
Jeff Wendling d2ddd48eea refactor: hook up metrics and wal to storage engine
It turns out that LastModified and DiskSize are unused, and so it
was easy to change to not care about the WAL.

This hooks up metrics and starts the WAL again.
2019-02-04 10:32:52 -07:00
Jeff Wendling 95de3d52b2 refactor: use concrete WAL in tsm1
At the cost of some nil checks, we don't have to have an interface, defend against
subtle bugs with nils in non-nil interfaces, an empty implementation, etc.

Also, the tsm1 engine is losing the WAL anyway.
2019-02-04 10:32:52 -07:00
Jeff Wendling c9bb55b889 refactor: move the tsm1/wal into the storage/wal package
Because the WAL relies on the tsm1.Value type, we move that into its own
tsm1/value package and set up some aliases forwarding them into tsm1. This
also required adding some methods and changing consumers to avoid the
unexported fields. I imagine this step will be useful one day when we make
the write path more efficient with respect to consuming points.

This commit additionally fixes some issues with generation. The iterator.tmpldata
and generation for array_cursor_* were removed accidentally when removing
iterators, making those generated files stale. Restore that and regenerate.

No change in functionality.
2019-02-04 10:32:52 -07:00
Edd Robinson 07b8eacf34 Fix bucket delete for all buckets
If a bucket had bytes in it that would be escaped by the models
parser/package, then the index would not be correctly purged of those
series data when the bucket was dropped.
2019-01-18 17:28:58 +00:00
Edd Robinson 9ff65f6016 Track deleted series ids to remove from series file
Previously series that were being removed were tracked at the key level.
This means that when removing them from the series file, the series id
first had to be looked up. This can cause lock thrashing when there are
many series ids to look up (such as with a bulk delete), because there
are no bulk methods to do this.

This commit changes how the series file delete is done by extracting
the series ids from the index before we remove the index entries. It's
then possible to delete all those series ids from the series file
without having to lookup the ids.
2019-01-15 11:45:10 +00:00
Edd Robinson b025d9afa9 Improve efficiency of TSI index series drop
This commit improves the performance of a mass delete on the TSI index
by deleting at the measurement level instead of deleting each series
individually.
2019-01-14 12:46:55 +00:00
Edd Robinson c7d26d8950 Rename delete method 2019-01-14 11:23:13 +00:00
Mark Rushakoff d73d73c0d4 chore: rename imports from platform to influxdb
I did this with a dumb editor macro, so some comments changed too.

Also rename root package from platform to influxdb.

In interest of minimizing risk, anyone importing the root package has
now aliased it to "platform" so that no changes beyond imports were
necessary in those files.

Lastly, replace the old platform module to local path /dev/null so that
nobody can accidentally reintroduce a platform dependency while
migrating platform code to influxdb.
2019-01-09 20:51:47 -08:00
Jeff Wendling 703c3c15ca Hook up DeleteBucket to the tsm1 engine 2019-01-09 15:24:26 -07:00
Jeff Wendling b5bfb836c0 tsm1: remove unsafe in prefixTree 2019-01-09 12:43:01 -07:00
Jeff Wendling e503ef40d1 tsm1: add comments responding to review feedback 2019-01-09 11:35:06 -07:00
Jeff Wendling 73c0ea410e tsm1: add test for engine DeletePrefix 2019-01-09 10:56:10 -07:00
Jeff Wendling 0a85e3b0dd tsm1: add initial index cleanup to DeletePrefix 2019-01-08 16:32:43 -07:00
Jeff Wendling 0fe2f02812 tsm1: initial DeletePrefix impl 2019-01-08 16:03:34 -07:00
Jeff Wendling f712828016 tsm1: refactor and rename some methods 2019-01-08 14:52:30 -07:00
Jeff Wendling 8744a82665 tsm1: add DeletePrefix to the reader 2019-01-07 21:11:49 -07:00
Jeff Wendling f65b0933f6 tsm1: move code around into smaller files and add tests 2019-01-07 21:11:49 -07:00
Jeff Wendling fed3154506 tsm1: DeletePrefix on the indirectIndex 2019-01-07 21:08:32 -07:00
Jeff Wendling ad5352926f tsm1: log when error reading entries for tsm key 2019-01-07 11:00:35 -07:00
Jeff Wendling 9cdefa8e4f tsm1: fix staticcheck and refactor closure out 2019-01-07 11:00:35 -07:00
Jeff Wendling 1ffcd77342 tsm1: fix remaining issues and add small benchmarks
- notice when keys are deleted during iteration and return an error
- make sure all the consumers check the error
- add some benchmarks for small indexes to compare
- allow concurrent readers to flag deletes

benchmarks against base:

name                                           old time/op    new time/op    delta
IndirectIndex_UnmarshalBinary-8                  70.0ms ±17%    71.0ms ±12%      ~     (p=1.000 n=8+8)
IndirectIndex_DeleteRangeLast-8                  1.48µs ± 1%    0.28µs ± 5%   -81.29%  (p=0.000 n=8+7)
IndirectIndex_DeleteRangeFull/Large-8             786ms ± 1%     363ms ± 3%   -53.89%  (p=0.000 n=7+8)
IndirectIndex_DeleteRangeFull/Small-8            2.37ms ± 0%    1.14ms ± 3%   -52.02%  (p=0.000 n=7+8)
IndirectIndex_DeleteRangeFull_Covered/Large-8     384ms ± 2%     188ms ± 3%   -51.04%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull_Covered/Small-8     470µs ± 1%     190µs ± 1%   -59.71%  (p=0.000 n=8+7)
IndirectIndex_Delete/Large-8                     74.0ms ± 1%   128.7ms ± 1%   +73.80%  (p=0.001 n=7+7)
IndirectIndex_Delete/Small-8                      142µs ± 1%     130µs ± 1%    -8.24%  (p=0.000 n=8+8)

name                                           old alloc/op   new alloc/op   delta
IndirectIndex_UnmarshalBinary-8                  11.6MB ± 0%    11.7MB ± 0%    +0.02%  (p=0.000 n=8+7)
IndirectIndex_DeleteRangeLast-8                  3.26kB ± 0%   0.00kB ±NaN%  -100.00%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Large-8             233MB ± 0%     161MB ± 0%   -30.75%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Small-8            2.13MB ± 0%    1.40MB ± 0%   -34.53%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull_Covered/Large-8    12.4MB ± 0%     0.4MB ± 0%   -96.82%  (p=0.002 n=7+8)
IndirectIndex_DeleteRangeFull_Covered/Small-8     120kB ± 0%       0kB ± 0%   -99.89%  (p=0.000 n=8+8)
IndirectIndex_Delete/Large-8                     4.54kB ± 0%    0.21kB ± 0%   -95.26%  (p=0.000 n=8+8)
IndirectIndex_Delete/Small-8                      80.0B ± 0%     0.0B ±NaN%  -100.00%  (p=0.000 n=8+8)

name                                           old allocs/op  new allocs/op  delta
IndirectIndex_UnmarshalBinary-8                    35.0 ± 0%      42.0 ± 0%   +20.00%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeLast-8                    3.00 ± 0%     0.00 ±NaN%  -100.00%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Large-8             1.53M ± 0%     0.52M ± 0%   -65.98%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull/Small-8             15.2k ± 0%      5.2k ± 0%   -65.97%  (p=0.000 n=8+8)
IndirectIndex_DeleteRangeFull_Covered/Large-8       620 ± 0%       124 ± 0%   -80.00%  (p=0.002 n=7+8)
IndirectIndex_DeleteRangeFull_Covered/Small-8      10.0 ± 0%       2.0 ± 0%   -80.00%  (p=0.000 n=8+8)
IndirectIndex_Delete/Large-8                        246 ± 0%         1 ± 0%   -99.59%  (p=0.000 n=8+8)
IndirectIndex_Delete/Small-8                       4.00 ± 0%     0.00 ±NaN%  -100.00%  (p=0.000 n=8+8)
2019-01-07 11:00:35 -07:00
Jeff Wendling 14cf01911e tsm1: change TSMFile to use an iterator style api 2019-01-07 11:00:35 -07:00