Commit Graph

507 Commits (dd34f5fd9d43c87f690baba0927642f1e76befe6)

Author SHA1 Message Date
Stuart Carnie 8753a7fd08 chore: Fix invalid string casts from integers
Newer Go versions generate a compile time error
2020-09-16 11:55:20 -07:00
Ayan George ca2055c16c
refactor: Replace ctx.Done() with ctx.Err() (#19546)
* refactor: Replace ctx.Done() with ctx.Err()

Prior to this commit we checked for context cancellation with a select
block and context.Context.Done() without multiplexing over any other
channel like:

  select {
    case <-ctx.Done():
      // handle cancellation
    default:
      // fallthrough
  }

This commit replaces those type of blocks with a simple check of
ctx.Err().  This has the following benefits:

* Calling ctx.Err() is much faster than entering a select block.

* ctx.Done() allocates a channel when called for the first time.

* Testing the result of ctx.Err() is a reliable way of determininging if
  a context.Context value has been canceled.

* fix: Fix data race in execDeleteTagValueEntry()
2020-09-16 12:20:09 -04:00
Stuart Carnie 04cff2a8d2 fix: Unit test 2020-09-04 15:56:57 -07:00
Stuart Carnie a24edb2b1c
chore: Skip tests on circleci
This is derived from 2fd8264 and 4f850b5, which skips tests on appveyor
2020-08-31 12:14:27 -07:00
Stuart Carnie f2205b37aa
chore: Skip TSI cardinality tests on circleci
This is derived from 793635d, which skips tests on appveyor
2020-08-31 12:11:04 -07:00
Stuart Carnie b1b6c1047a
chore: remove t.Parallel() in an attempt to make CircleCI happy 2020-08-31 10:45:23 -07:00
Brett Buddin b917d8d9b0
chore(influxdb): Placate the linter. 2020-08-27 15:46:32 -04:00
Stuart Carnie dee8977d2c
chore: move v2/v1/tsdb → v2/tsdb 2020-08-26 10:46:47 -07:00
Edd Robinson 2b175291be
refactor: WIP removing tsbd 2020-08-03 09:18:34 -07:00
Stuart Carnie e3060c291c
refactor: tsdb store builds an runs 2020-08-03 09:18:32 -07:00
Stuart Carnie 92efddbfbe
chore(tsdb): Initial commit of tsdb package
* pulls in 1.x tsdb, compiles and passes test
2020-08-03 09:17:23 -07:00
Ben Johnson 14a82ee65d fix(tsdb): Fix mincore wait() out of bounds calls 2020-07-27 11:48:39 -06:00
Ben Johnson 3cc2638bbf feat(tsi1): Add optional mincore limiter to TSI 2020-07-22 10:17:42 -06:00
Gavin Cabbage 3c6b728702
chore: use go generate to download large tsdb testdata (#18993)
* chore: use go generate to download large tsdb testdata

* chore(gitignore): TSM/TSI verbiage
2020-07-22 11:29:22 -04:00
Ben Johnson c476da2153
Merge pull request #18982 from influxdata/mincore-limiter
feat(mincore): Add page fault limiter
2020-07-17 12:22:54 -06:00
Ben Johnson c28eb70856 feat(mincore): Add page fault limiter
This commit adds `mincore.Limiter` which throttles page faults caused
by mmap() data. It works by periodically calling `mincore()` to determine
which pages are not resident in memory and using `rate.Limiter` to
throttle accessing using a token bucket algorithm.
2020-07-17 09:37:31 -06:00
Gavin Cabbage ef3ee96eea
chore: download tsi1 testdata with go generate (#18972)
* chore: remove tsi1 testdata and add go generate file to download

* chore: fix testdata url and rename gen file

* fix: add testdata generate command to Makefile

* chore: add testdata dir to gitignore

* refactor(tsdb): improve error message when missing testdata

* refactor(tsdb): tagged testdata and avoid stacktrace when missing
2020-07-17 11:31:29 -04:00
ricky dcf995922c test: set bigger max size of cache in TestConcurrentReadAfterWrite 2020-07-16 10:05:30 +08:00
ricky 9e82797a38 fix: missing data when reading after writing 2020-07-15 14:49:42 +08:00
Phil Bracikowski 25461dddcd
chore(testing): add missing defer to clean up test temp files (#18948) 2020-07-14 13:52:28 -07:00
Stuart Carnie 99bbbd3e4e
fix(storage): Reduce the check frequency
Checking a channel too regularly could cause
context switching to other goroutines. In tight loops,
it is prudent to check, but to do so less frequently so
as to avoid thrashing.
2020-07-09 18:44:00 -07:00
Brett Buddin 51406f4f62
feat(tsdb): SHOW TAG KEYS (no time) query using only TSI data. (#18905)
* feat(tsdb): SHOW TAG KEYS (no time) query using only TSI data.

* fix(tsdb): Allow for earlier return when scanning during show tag keys.

* fix(tsdb): Speed things up by using the key merger to reduce allocs.

* chore(tsm1): Fix golint.

* fix(tsdb): Remove sorting, because these keys should already be sorted.

* fix(tsdb): Remove dead code to placate the linter.
2020-07-09 18:01:42 -04:00
Ben Johnson be98fe3a81
Merge pull request #18901 from influxdata/tsm1-file-stat-created-at
feat(tsdb): Add CreatedAt field for tsm1.FileStat
2020-07-09 14:13:00 -06:00
jlapacik 49bdad8681 fix: descending array cursor should include end time
Fixes https://github.com/influxdata/influxdb/issues/18897.
2020-07-09 12:22:25 -07:00
jlapacik e6e55038e8 test: descending array cursor should include end time 2020-07-09 12:22:25 -07:00
Stuart Carnie d2dd19b70e
feat(storage): InfluxQL schema APIs without time range
These changes introduce optimized schema APIs for InfluxQL that
utilize the time series index (TSI) exclusively for significant
performance gains.
2020-07-09 10:09:19 -07:00
Ben Johnson 3fe7c63a0a feat(tsdb): Add CreatedAt field for tsm1.FileStat
This commit adds a "created at" field to `tsm1.FileStat` which
uses the `ModTime()` of the TSM file but excludes any updates
for tombstone files.
2020-07-09 10:38:59 -06:00
Gavin Cabbage 34ebc852c0
fix(tsm1): delimit tsmKeyPrefix with appended comma (#18785)
* fix(tsm1): delimit tsmKeyPrefix with appended comma

Fixes #7589.

Append a comma to the TSM key prefix when matching a full measurement name to avoid erroneously matching other measurement names that include the prefix in their own name. For example, this prevents matching a measurement "cpu1" when targeting "cpu" by updating the prefix to "cpu,". This relies on the fact that tag key-value pairs are separated by commas.

* fix(tsm1): regression tests for tsmKeyPrefix comma delimiting
2020-07-01 12:24:54 -04:00
Brett Buddin 0c268e205b
fix(storage): Push-down a predicate to match tags for SHOW MEASUREMENT calls (#18740)
* fix(storage): Push-down a predicate to match tags for SHOW MEASUREMENTS calls.

* chore: Address feedback.

* fix(tsm1): Split behavior based on existence of predicate for show measurements.

* fix(tsm1): Allow parenthesis expression on the LHS of a predicate.

* fix(tsm1): Create a separate tag predicate verifier that rejects negative comparisons.

* fix(tsm1): Additional test cases for show measurements with predicate.
2020-06-29 14:31:54 -04:00
Jonathan A. Sternberg 5aeca082c8
chore: update staticcheck and fix newly identified lint checks (#18737) 2020-06-26 18:54:09 -05:00
Ben Johnson 171f6586a0 fix(tsdb): Add refs for file-sourced tag keys
This commit adds ref counting for files that we pull tag keys from.
Previously, files were only ref counted during the time we extracted
tag keys but this commit adds additional ref counting for the life of
the `Engine.tagKeysNoPredicate()` function.
2020-06-17 10:27:23 -06:00
Ben Johnson 69fe9ed1ba
Merge pull request #17769 from patriczek/iss17257
fix: Migrated bucket should have correct retention policy.
2020-04-20 13:40:15 -06:00
Patrik Helia 07c89c9188 Fix fmt and reduce code
Signed-off-by: Patrik Helia <patrik.helia@kiwi.com>
2020-04-20 21:25:38 +02:00
Stuart Carnie c76f30682c
fix(storage): Feedback in response to PR review
* Adds clarifying documentation
* Regenerate protocol buffers with updated documentation
2020-04-16 15:19:28 -07:00
Stuart Carnie 6325591deb
feat(storage): New data types for measurement schema gRPC APIs
This commit

* adds new request and response data types for schema gRPC calls
* adds fmt.Stringer implementation to cursors.FieldType
* adds APIs to sort a slice of MeasurementField values,
* upgrades the gogo protobuf package to v1.3.1, which
  includes improvements to serialization.
2020-04-16 14:51:31 -07:00
Stuart Carnie 69820c08a4
feat(tsdb): Add maximum timestamp to MeasurementField
This is require in order to correctly merge results from multiple
sources.
2020-04-16 14:51:30 -07:00
Patrik Helia 7ce7e62f60 fix: Migrated bucket should have correct retention policy.
Signed-off-by: Patrik Helia <patashelia@gmail.com>
2020-04-16 21:35:48 +02:00
Stuart Carnie 21e339a32f
chore(storage): Fix documentation to reflect correct time interval 2020-04-14 11:04:56 -07:00
Stuart Carnie fe0ed6cb7e
feat(storage): Provide public MeasurementFields API 2020-04-14 10:49:16 -07:00
Stuart Carnie cb618efc65
feat(tsm1): Implementation of MeasurementFields
This commit provides an implementation of the MeasurementFields
API per the design previously outlined.
2020-04-08 16:15:34 -07:00
Stuart Carnie 7de6383adf
refactor(tsm1): Allow race-free access to cache
This commit adds a new API to `Cache` to address data races
with the `TagKeys` and `TagValues` APIs.

`Cache` and `entry` provide `AppendTimestamps`, which
appends the current timestamps to the provided slice
to reduce allocations. As noted in the documentation,
it is the responsibility of the caller to sort and deduplicate
the values, if required.

The `cursors.TimestampArray` type was extended to permit
use of the `sort.Sort` API.
2020-04-08 16:15:05 -07:00
Stuart Carnie 31df76e1e9
refactor(tsm1): Add TimeRangeMaxTimeIterator
This commit introduces a new API for finding the maximum
timestamp of a series when iterating over the keys in a
set of TSM files.

This API will be used to determine the field type of a single
field key by selecting the series with the maximum timestamp.

It has also refactored the common functionality for iterating
TSM keys into `timeRangeBlockReader`, which is shared
between `TimeRangeIterator` and `TimeRangeMaxTimeIterator`.
2020-04-08 16:05:19 -07:00
Jonathan A. Sternberg 6e4cf7ffef
refactor: fix imports from go template files (#17615) 2020-04-03 17:40:36 -05:00
Jonathan A. Sternberg 0ae8bebd75
refactor: rewrite imports to include the /v2 suffix for version 2 2020-04-03 12:39:20 -05:00
Stuart Carnie 069820ba4b
fix(models): Added error return value; use iota; fix spelling 2020-04-02 08:34:22 -07:00
Stuart Carnie d424d7d1f5
feat(tsdb): Add new measurement based schema APIs
These APIs require a measurement, permitting an additional optimization
to reduce the search space against the TSM index. Specifically, the
search key prefix is extended from `org+bucket` to
`org+bucket,\x00=<measurement>`

* MeasurementNames
* MeasurementTagKeys
* MeasurementTagValues
* Adds an api to the models package for efficiently parsing the
  measurement tag (\x00) from a normalized series key
2020-04-02 08:33:58 -07:00
Stuart Carnie 37a97437e7
fix: Invariant violated: mixed block types for a single series
The root cause is that the Unsigned data type has no representation
in the valueType function in the cache and falls back to the default
case of 0.

0 is also a sentinel value in the entry#add function that will
result in skipping the value type check.

It therefore is possible that unsigned values followed by some other
data type is stored in the cache.

It is suspected that the write may be rejected before reaching the
cache, and therefore may not occur in practice. Specifically, the
series file stores the data types on a per-series basis and would
reject the write.

This commit turns the value types into explicit constants and
ensures all existing block types are represented. In addition,
it adds a mapping function to convert these to a known Block type,
which will be used by the `MeasurementFields` schema request to
determine the type of a series in the cache.
2020-04-01 18:42:22 -07:00
Ben Johnson 7d72b4e511 feat(tsdb): Bulk delete series performance improvement 2020-03-18 15:47:35 -06:00
Edd Robinson d96cbd4f74
Merge pull request #17016 from influxdata/er-bulk-import
feat(storage): prototype 1.x–2.x migration tooling
2020-03-18 17:57:26 +00:00
Jacob Marble 679215de97
chore: Revert "refactor(tsdb): remove read from unexported field (#17279)" (#17305)
This reverts commit 0ec2b453b9.

Fixes panic.
2020-03-16 17:48:01 -07:00
Jacob Marble 0ec2b453b9
refactor(tsdb): remove read from unexported field (#17279)
* refactor(tsdb): remove read from unexported field

* fix(tsdb): add regression test to check for panic

* fix(tsdb): detect nil without panic
2020-03-16 14:26:14 -07:00
Jacob Marble 386098da36
refactor(storage): move and remove to help cleanup tsdb package (#17275)
* refactor(tsdb): move series file config to seriesfile package

* refactor(tsdb): removed unchecked const EOF

* refactor(tsdb): unexport errors

* refactor(tsdb): remove unused TagValueIterators

* refactor(tsdb): remove SeriesIDIterator usage in tsdb/seriesfile

* refactor(tsdb): remove one-use MeasurementIterators

* refactor(tsdb): remove unused type measurementSliceIterator

* refactor(tsdb): remove unused types TagKeyIterators and tagKeySliceIterator

* refactor(storage): remove unused method Engine.ApplyFnToSeriesIDSet

* refactor(tsdb): rename AllSeriesIDs() -> SeriesIDs()
2020-03-16 12:23:15 -07:00
Jacob Marble 7dbc07beda
chore: Revert "refactor(storage): move and remove to help cleanup tsdb package (#17241)" (#17272)
This reverts commit 4b8a71b97f.

Fixes incident #inc-aws-error-rate-spi-5e6c1423
2020-03-13 17:14:51 -07:00
Jacob Marble 4b8a71b97f
refactor(storage): move and remove to help cleanup tsdb package (#17241)
* refactor(tsdb): move series file config to seriesfile package

* refactor(tsdb): removed unchecked const EOF

* refactor(tsdb): unexport errors

* refactor(tsdb): remove unused TagValueIterators

* refactor(tsdb): remove SeriesIDIterator usage in tsdb/seriesfile

* refactor(tsdb): remove one-use MeasurementIterators

* refactor(tsdb): remove unused type measurementSliceIterator

* refactor(tsdb): remove unused types TagKeyIterators and tagKeySliceIterator

* refactor(storage): remove unused method Engine.ApplyFnToSeriesIDSet

* refactor(tsdb): remove read from unexported field
2020-03-13 13:04:58 -07:00
Edd Robinson 5b437a2966 refactor: fix build 2020-03-13 15:24:53 +00:00
Edd Robinson 08add490e0 fix: ensure buckets are created properly 2020-03-13 11:00:28 +00:00
Edd Robinson bbe40aeb82 feat: prototype 1.x - 2.x migration tool 2020-03-13 11:00:28 +00:00
Jacob Marble 26ca766459
refactor(tsdb): move series file to its own package (#17224)
* refactor(storage): move type ByTagKey to the only package that uses it

* refactor(tsdb): use types in tsdb/cursors

* refactor(tsdb): remove unused type SeriesIDElems

* refactor(tsdb): inline only use of tsdb.ReadAllSeriesIDIterator

* refactor(tsdb): move series file to its own package

* refactor(storage): remove platform->influxdb aliases
2020-03-12 11:32:52 -07:00
Jacob Marble cdbf532f57
refactor(storage): remove dead code and rename a few things (#17217)
* refactor(storage): remove CursorIterators type

* refactor(storage): remove unused tsdb.MarshalTags()

* refactor(storage): remove unused package tsdb/internal

* refactor(storage): rename tsdb/metrics.go to tsdb/series_file_metrics.go

* refactor(storage): remove unused type tagValueSliceIterator

* refactor(storage): rename field row to seriesRow

* refactor(storage): rename tsdb/index.go to tsdb/series_iterators.go
2020-03-12 10:45:48 -07:00
Jacob Marble b91e3f36ab
refactor(hll): remove unused Sketch interface (#17218) 2020-03-12 08:59:05 -07:00
Ben Johnson 627b6f86bb feat(storage): Series file compaction 2020-03-11 19:31:58 -06:00
Ben Johnson ce47e57089 fix(tsdb): Fix predicate clone 2020-02-04 10:12:26 -07:00
Jacob Marble b836ab9c17
feat(storage): implement backup and restore (#16504)
* feat(backup): `influx backup` creates data backup

* feat(backup): initial restore work

* feat(restore): initial restore impl

Adds a restore tool which does offline restore of data and metadata.

* fix(restore): pr cleanup

* fix(restore): fix data dir creation

* fix(restore): pr cleanup

* chore: amend CHANGELOG

* fix: restore to empty dir fails differently

* feat(backup): backup and restore credentials

Saves the credentials file to backups and restores it from backups.

Additionally adds some logging for errors when fetching backup files.

* fix(restore): add missed commit

* fix(restore): pr cleanup

* fix(restore): fix default credentials restore path

* fix(backup): actually copy the credentials file for the backup

* fix: dirs get 0777, files get 0666

* fix: small review feedback

Co-authored-by: tmgordeeva <tanya@influxdata.com>
2020-01-21 14:22:45 -08:00
Stuart Carnie 13a248a4fb
fix(tsm1): Add multiple unit tests to verify correctness
This commit adds numerous tests for ascending and descending cursors
that generate merged blocks across multiple files, which exceed the
default fixed buffer size used by the array cursors (MaxPointsPerBlock).

Tests cover two scenarios

1. Each file has one block and the block from the second file is
   entirely contained within the first block of the first file.
   When merging, the new block is 1200 values, which exceeds the
   MaxPointsPerBlock.

2. Each file has multiple blocks, and the blocks have a mixture of
   values which interleave and overwrite.
2020-01-19 22:53:58 -07:00
Edd Robinson 91551302f9 fix(storage): ensure all block data returned
This commit prevents multiple blocks for the same series key having
values truncated when they are being read into an empty buffer.

The current cursor reader code has an optimisation that incorrectly
assumes the incoming array will be limited to 1,000 values (the maximum
block size), but arrays can contain values from multiple matching
blocks.
2020-01-19 22:03:20 +00:00
Edd Robinson f11504b987 fix(storage): prevent infinite loop in matcher
Fixes #15817

This commit addresses a potential infinite loop, caused
by series keys that contain a certain pattern of escaped
characters.
2020-01-14 15:05:07 +00:00
Edd Robinson a06dc0fd7f fix(storage): prevent data-races on predicate
Fixes #15817

This commit addresses several data-races on the `tsm1.Predicate` type
that were causing a live-lock or similar in rare cases during a delete.

Because `tsm1/FileStore.Apply` executes concurrently across TSM files
the state of the delete's predicate was being unsafely mutated.

This commit adds a `Clone` method to the `influxdb.Predicate` type,
which should be used whenever an `influxdb.Predicate` implementation
needs to be used concurrently.
2020-01-09 10:00:25 +00:00
Jacob Marble 5f19c6cace
chore: Remove several instances of WithLogger (#15996)
* chore: Remove several instances of WithLogger

* chore: unexport Logger fields

* chore: unexport some more Logger fields

* chore: go fmt

chore: fix test

chore: s/logger/log

chore: fix test

chore: revert http.Handler.Handler constructor initialization

* refactor: integrate review feedback, fix all test nop loggers

* refactor: capitalize all log messages

* refactor: rename two logger to log
2019-12-04 15:10:23 -08:00
Edd Robinson 2f86815f83 fix(storage): ensure field is 64-bit aligned 2019-11-22 13:44:58 +00:00
Edd Robinson 7146af61b0 fix(storage): enable package to build on 32-bit arch 2019-11-22 12:55:20 +00:00
Edd Robinson 2471c2468c fix(storage): fixes panic when building predicates
Fixes #15916.

If a predicate was passed in with multiple key/value matches for the
same tag key, then the value index would be incorrect. This ensures that
each tag key can only be added to the location map once.
2019-11-15 15:07:36 +00:00
Edd Robinson 0dd2d38eac fix(tsi1): index defect with negated equality filters
Fixes #15859

This commit fixes a defect in the TSI index where a filter using the
negated equality operator would result in no matching series being
returned for series stored within the `IndexFile` portions of the index.

The root cause of this was due to missing legacy-handling code in the
index for this particular iterator.
2019-11-12 13:26:23 +00:00
George 3804d50fbd
fix(storage): array cursor iterator should return stats of all observed cursors (#15731)
* fix(storage): add failing test for array cursor iterator stats

* fix(storage): make arrayCursorIterator.Stats() return stats of in-focus cursor

* fix(storage): add failing test to assert arrayCursorIterator.Stats() returns accumulated result

* fix(storage): assumulate stats in arrayCursorIterator.Stats() call across all observed cursors
2019-11-05 10:41:06 +01:00
Christopher Wolff 04bc7bf76b test(tsdb): skip flaky test
https://github.com/influxdata/influxdb/issues/15220
2019-10-30 10:40:03 -07:00
Edd Robinson dc78d7c0eb
Merge pull request #14373 from zhulongcheng/add-missing-err
fix(tsdb): add missing err in SeriesPartition.Open
2019-10-24 13:13:32 +01:00
Edd Robinson 2727ae3c25 refactor: simpify Semaphore interface 2019-10-23 19:49:48 +01:00
Edd Robinson b6e911d72c refactor: move goroutine out to function 2019-10-23 19:49:46 +01:00
Edd Robinson 8f6701d4b1 feat(storage): add full compaction semaphore
By default this feature is disabled; the full compaction behaviour does
not change. When this feature is enabled compactions can be limited
across multiple storage engines running in multiple processes.

The mechanism by which this happens is not part of the abstraction added
here.
2019-10-23 19:45:01 +01:00
Edd Robinson ef1e15a0ad
Merge pull request #15318 from influxdata/er-mv-comp-limiter
feat(storage): allow compaction limiter to be injected into engine
2019-10-09 13:11:44 +01:00
Ilya Sevostyanov 596414a3ff
fix(storage): added missing string values for CacheStatus type.
Closes: #15284.
2019-10-04 23:50:21 +03:00
Edd Robinson 179c57ab2e feat(storage): allow compaction limiter to be injected 2019-10-04 12:35:21 -07:00
elbehery 663d4bb901 test(tasks): skip flaky test 2019-09-25 18:17:59 +02:00
elbehery c0b87c657c fix(storage): remove level=0 from TSM disk bytes metrics. 2019-09-25 15:57:25 +02:00
Lorenzo Affetti 053836e5a5
Merge pull request #15203 from influxdata/flux-staging-v0.48.x
build(flux): update to Flux v0.48.0
2019-09-20 18:24:02 +02:00
Edd Robinson d714be45a4
Merge pull request #15200 from influxdata/er-retention-service
refactor(storage): add more context to traces and logs
2019-09-20 09:00:00 +01:00
Lorenzo Affetti ab835c8e0e
refactor(dependencies): use new dependency injection framework (#15174)
refactor(dependencies): use new dependency injection framework
2019-09-19 17:01:17 +02:00
Edd Robinson e2f5b2bd9d refactor(storage): add more context to traces and logs 2019-09-19 13:48:06 +01:00
Stuart Carnie 9a89900785
fix(tsm1): Fix duplicate points
All seeks must be added to the c.current slice so the
min and max read values can be updated on each read pass.
2019-09-18 17:44:27 -07:00
Ben Johnson ee3cf79ae7
fix(tsdb): Fix pull request feedback. 2019-09-13 10:00:54 -06:00
Ben Johnson d08403b658
feat(tsdb): Add SQL export for TSI indexes 2019-09-13 10:00:54 -06:00
Mark Rushakoff c2f847299c ci: use latest staticcheck
We were still referring to megacheck in tools.go; this confused
dependent projects also using staticcheck.
2019-09-04 16:34:45 -07:00
Ben Johnson 9237ee6a40
fix(tsi1): Remove TSI cardinality stats cache 2019-09-04 14:48:22 -06:00
Edd Robinson 030083e1a3 perf(storage): optimistically check compactions 2019-09-04 17:38:13 +01:00
Ben Johnson 729558d64b
fix(tsdb): Replace TSI compaction wait group with counter.
Previously the TSI partition would panic if a compaction was
started while `Wait()` was waiting. This commit removes the previous
wait group and replaces it with a simple counter. The `Wait()`
function now polls the counter until it reaches zero.
2019-09-02 09:37:35 -06:00
Edd Robinson 7efb73930b refactor: address PR feedback 2019-08-30 21:07:32 +01:00
Edd Robinson 2e5ebbe251 perf(storage): reduce allocations when deleting from cache
When deleting from the cache, each cache key must be checked to
determine if it matches the prefix we're deleting. Since the keys are
stored as strings in the cache (map keys) there were a lot of allocations
happening because `applySerial` expects `[]byte` keys.

It's beneficial to reduce allocations by refacting `applySerial` to work
on strings. Whilst some allocations now have to happen the other way
(string -> []byte), they only happen if we actually need to delete the
key from the cache. Most of the keys don't get deleted so it's better
doing it this way.

Performance on the benchmark from the previous commit improved by ~40-50%.

name                                          old time/op    new time/op    delta
Engine_DeletePrefixRange_Cache/exists-24         102ms ±11%      59ms ± 3%  -41.95%  (p=0.000 n=10+8)
Engine_DeletePrefixRange_Cache/not_exists-24    97.1ms ± 4%    45.0ms ± 1%  -53.66%  (p=0.000 n=10+10)

name                                          old alloc/op   new alloc/op   delta
Engine_DeletePrefixRange_Cache/exists-24        25.5MB ± 1%     3.1MB ± 2%  -87.83%  (p=0.000 n=10+10)
Engine_DeletePrefixRange_Cache/not_exists-24    23.9MB ± 1%     0.1MB ±86%  -99.65%  (p=0.000 n=10+10)

name                                          old allocs/op  new allocs/op  delta
Engine_DeletePrefixRange_Cache/exists-24          305k ± 1%       28k ± 1%  -90.77%  (p=0.000 n=10+10)
Engine_DeletePrefixRange_Cache/not_exists-24      299k ± 1%        1k ±63%  -99.74%  (p=0.000 n=9+10)

Raw benchmarks on a 24T/32GB/NVME machine are as follows:

goos: linux
goarch: amd64
pkg: github.com/influxdata/influxdb/tsdb/tsm1
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     300	  50379720 ns/op	 3054106 B/op	   27859 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     300	  57326032 ns/op	 3124764 B/op	   28217 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     300	  58943855 ns/op	 3162146 B/op	   28527 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     300	  60565115 ns/op	 3138811 B/op	   28176 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     200	  59775969 ns/op	 3087910 B/op	   27921 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     300	  59530451 ns/op	 3120986 B/op	   28207 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     300	  59185532 ns/op	 3113066 B/op	   28302 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     300	  59295867 ns/op	 3100832 B/op	   28108 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     300	  59599776 ns/op	 3100686 B/op	   28113 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     200	  62065907 ns/op	 3048527 B/op	   27879 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     300	  44979062 ns/op	  123026 B/op	    1244 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     300	  44733344 ns/op	   52650 B/op	     479 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     300	  44534180 ns/op	   35119 B/op	     398 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     300	  45179881 ns/op	  105256 B/op	     706 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     300	  44918964 ns/op	   47426 B/op	     621 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     300	  45000465 ns/op	   63164 B/op	     564 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     300	  45332999 ns/op	  117008 B/op	    1146 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     300	  45652342 ns/op	   66221 B/op	     616 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     300	  45083957 ns/op	  154354 B/op	    1143 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     300	  44560228 ns/op	   65024 B/op	     724 allocs/op
PASS
ok  	github.com/influxdata/influxdb/tsdb/tsm1	1690.583s
2019-08-30 20:35:05 +01:00
Edd Robinson eba4dec7e6 perf(storage): reduce lock contention on Cache entries
The cache is essentially a set of maps, where a key in each map is a
series key, and the value is a slice of values associated with that key.
The cache is sharded and series keys are hashed to determine which shard
(map) they live in.

When deleting from the cache we have to check each key to see if it
matches the delete command (predicate and timestamp). If it does then
the entries for that range are removed. As part of this work we check if
the entries are already empty (already removed) and if so we don't check
if the key is valid.

This involved a lot of mutex grabbing, which has now been replaced with
atomic operations.

Benchmarking this commit against the previous commit in this branch
shows a 9% improvement:

name                                          old time/op    new time/op    delta
Engine_DeletePrefixRange_Cache/exists-24         113ms ± 8%     102ms ±11%   -9.40%  (p=0.000 n=10+10)
Engine_DeletePrefixRange_Cache/not_exists-24    95.6ms ± 2%    97.1ms ± 4%     ~     (p=0.089 n=10+10)

name                                          old alloc/op   new alloc/op   delta
Engine_DeletePrefixRange_Cache/exists-24        29.6MB ± 1%    25.5MB ± 1%  -13.71%  (p=0.000 n=10+10)
Engine_DeletePrefixRange_Cache/not_exists-24    24.3MB ± 2%    23.9MB ± 1%   -1.48%  (p=0.000 n=10+10)

name                                          old allocs/op  new allocs/op  delta
Engine_DeletePrefixRange_Cache/exists-24          334k ± 0%      305k ± 1%   -8.67%  (p=0.000 n=8+10)
Engine_DeletePrefixRange_Cache/not_exists-24      302k ± 1%      299k ± 1%   -1.25%  (p=0.000 n=10+9)

Raw benchmarks on a 24T / 32GB / NVME machine:

goos: linux
goarch: amd64
pkg: github.com/influxdata/influxdb/tsdb/tsm1
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     200	  91035525 ns/op	25557809 B/op	  305258 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     200	  99416796 ns/op	25385052 B/op	  303584 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 100149484 ns/op	25570062 B/op	  305761 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 100222516 ns/op	25474372 B/op	  303089 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     200	 101868258 ns/op	25531572 B/op	  304736 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 106268683 ns/op	25648213 B/op	  306768 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 102905477 ns/op	25572314 B/op	  305798 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 108742857 ns/op	25483068 B/op	  304788 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 103292149 ns/op	25401388 B/op	  303401 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 107178026 ns/op	25573602 B/op	  305821 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     200	  95082692 ns/op	23942491 B/op	  299116 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     200	  96088487 ns/op	23957028 B/op	  298545 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     200	  94279165 ns/op	23620981 B/op	  294536 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     200	  94509000 ns/op	23989593 B/op	  299453 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     200	  98530062 ns/op	23935846 B/op	  299237 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     200	  98008093 ns/op	23821683 B/op	  297875 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     200	  97603172 ns/op	23878336 B/op	  298350 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     200	  96867920 ns/op	23782588 B/op	  296236 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     200	  99148908 ns/op	23997702 B/op	  299277 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     100	 100866840 ns/op	24019916 B/op	  300339 allocs/op
PASS
ok  	github.com/influxdata/influxdb/tsdb/tsm1	1144.213s
2019-08-30 20:35:05 +01:00
Edd Robinson da2fb27cb9 perf(storage): reduce amount of tracing
In a previous PR I added some tracing to help investigate delete
performance within the cache. Ironically this makes performance
significantly worse when you have a very high cardinality cache.

This keeps the main benefits of the tracing, but reduces the number of
spans created. The remaining spans are smarter with context, and include
useful information about the size of the operation being traced.

Performance on a benchmark shows a significant improvement:

name                                          old time/op    new time/op    delta
Engine_DeletePrefixRange_Cache/exists-24         262ms ± 6%     113ms ± 8%  -57.06%  (p=0.000 n=10+10)
Engine_DeletePrefixRange_Cache/not_exists-24     266ms ± 4%      96ms ± 2%  -64.09%  (p=0.000 n=8+10)

name                                          old alloc/op   new alloc/op   delta
Engine_DeletePrefixRange_Cache/exists-24        62.7MB ± 0%    29.6MB ± 1%  -52.82%  (p=0.000 n=9+10)
Engine_DeletePrefixRange_Cache/not_exists-24    59.2MB ± 0%    24.3MB ± 2%  -59.03%  (p=0.000 n=8+10)

name                                          old allocs/op  new allocs/op  delta
Engine_DeletePrefixRange_Cache/exists-24          711k ± 0%      334k ± 0%  -53.07%  (p=0.000 n=9+8)
Engine_DeletePrefixRange_Cache/not_exists-24      700k ± 0%      302k ± 1%  -56.79%  (p=0.000 n=8+10)

Raw benchmarks on a 24T/32GB/Nvme machine:

goos: linux
goarch: amd64
pkg: github.com/influxdata/influxdb/tsdb/tsm1
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 104851012 ns/op	29442514 B/op	  333599 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 107838824 ns/op	29485649 B/op	  334369 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 108020671 ns/op	29443324 B/op	  333610 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 106507506 ns/op	29977931 B/op	  338597 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 116393032 ns/op	29443516 B/op	  333614 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 112581877 ns/op	29691455 B/op	  334699 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      50	 119833106 ns/op	29444712 B/op	  333625 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	     100	 113851895 ns/op	29921119 B/op	  337419 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      50	 121735395 ns/op	29445551 B/op	  333634 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      50	 115387319 ns/op	29444513 B/op	  333627 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     100	  94474658 ns/op	24696698 B/op	  306702 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     100	  94767020 ns/op	24004763 B/op	  300066 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     100	  97869523 ns/op	24556560 B/op	  305827 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     100	  93916119 ns/op	24172163 B/op	  301244 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     100	  96591891 ns/op	24006021 B/op	  300081 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     100	  93521244 ns/op	24266467 B/op	  303190 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     100	  95419569 ns/op	24006501 B/op	  300087 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     100	  96694570 ns/op	24521126 B/op	  306041 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     100	  95075965 ns/op	24299409 B/op	  301649 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	     100	  97182864 ns/op	24007644 B/op	  300101 allocs/op
PASS
ok  	github.com/influxdata/influxdb/tsdb/tsm1	490.287s
2019-08-30 20:35:05 +01:00
Edd Robinson 15ade8c162 perf(storage): remeove erroneous variable
This commit removes an unused slice that was being built up. Comparting
to the base-line performance yields a slight improvement when deleting
from the cache.

name                                          old time/op    new time/op    delta
Engine_DeletePrefixRange_Cache/exists-24         268ms ± 5%     262ms ± 6%    ~     (p=0.218 n=10+10)
Engine_DeletePrefixRange_Cache/not_exists-24     265ms ± 5%     266ms ± 4%    ~     (p=0.965 n=10+8)

name                                          old alloc/op   new alloc/op   delta
Engine_DeletePrefixRange_Cache/exists-24        64.1MB ± 0%    62.7MB ± 0%  -2.16%  (p=0.000 n=9+9)
Engine_DeletePrefixRange_Cache/not_exists-24    59.2MB ± 0%    59.2MB ± 0%    ~     (p=0.505 n=8+8)

name                                          old allocs/op  new allocs/op  delta
Engine_DeletePrefixRange_Cache/exists-24          711k ± 0%      711k ± 0%  -0.00%  (p=0.000 n=9+9)
Engine_DeletePrefixRange_Cache/not_exists-24      700k ± 0%      700k ± 0%    ~     (p=0.687 n=8+8)

Raw benchmarks using a 24T / 32GB / NVME machine:

goos: linux
goarch: amd64
pkg: github.com/influxdata/influxdb/tsdb/tsm1
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 267664312 ns/op	62689106 B/op	  711400 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 255017152 ns/op	62688809 B/op	  711398 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 258136039 ns/op	62689626 B/op	  711404 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 273982453 ns/op	62688325 B/op	  711395 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 252670795 ns/op	62688704 B/op	  711397 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 277700985 ns/op	61801204 B/op	  702520 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 272353886 ns/op	62688767 B/op	  711403 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 258717468 ns/op	62689461 B/op	  711408 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 252909070 ns/op	62688949 B/op	  711404 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 255436837 ns/op	62689712 B/op	  711409 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 241173429 ns/op	59202122 B/op	  700036 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 247961098 ns/op	60507541 B/op	  714102 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      20	 263380230 ns/op	59202750 B/op	  700044 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 266035285 ns/op	59202758 B/op	  700043 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 266512878 ns/op	59202759 B/op	  700044 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 262065769 ns/op	59202726 B/op	  700043 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 270485538 ns/op	59202733 B/op	  700043 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 263355678 ns/op	62562757 B/op	  727794 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 260440337 ns/op	59203324 B/op	  700050 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 276122362 ns/op	59203316 B/op	  700050 allocs/op
PASS
ok  	github.com/influxdata/influxdb/tsdb/tsm1	259.435s
2019-08-30 20:35:05 +01:00
Edd Robinson f2d6c93e65 test: add benchmark to track cache deletion perf
Benchmarks using a 24T / 32GB / NVME disk machine:

goos: linux
goarch: amd64
pkg: github.com/influxdata/influxdb/tsdb/tsm1
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      20	 280039668 ns/op	64073374 B/op	  711421 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 271810284 ns/op	64073207 B/op	  711420 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 263464797 ns/op	64072589 B/op	  711415 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         		      30	 269460489 ns/op	64073344 B/op	  711420 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 268319443 ns/op	64073947 B/op	  711425 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 254945449 ns/op	64073463 B/op	  711421 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 270202990 ns/op	65616337 B/op	  724440 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 274113444 ns/op	64074764 B/op	  711435 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 264234897 ns/op	64073748 B/op	  711428 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/exists-24         	      30	 264406196 ns/op	64073797 B/op	  711429 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 250130623 ns/op	59202124 B/op	  700036 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 255092042 ns/op	59552365 B/op	  706287 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 274121068 ns/op	59202753 B/op	  700043 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 273088065 ns/op	59202702 B/op	  700043 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 264184087 ns/op	59202724 B/op	  700043 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 268075364 ns/op	59202718 B/op	  700043 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 265067057 ns/op	59202709 B/op	  700043 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 254749976 ns/op	60118957 B/op	  701435 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 266953837 ns/op	59203376 B/op	  700051 allocs/op
BenchmarkEngine_DeletePrefixRange_Cache/not_exists-24     	      30	 275083559 ns/op	59203329 B/op	  700050 allocs/op
PASS
ok  	github.com/influxdata/influxdb/tsdb/tsm1	261.273s
2019-08-30 20:35:05 +01:00