Commit Graph

332 Commits (db/wait-timeout-utility)

Author SHA1 Message Date
Edd Robinson 55ffeb563a Tidy up logging of compaction settings 2018-07-18 17:26:34 +01:00
Jeff Wendling 7bdbe26534 Make store include context in logs
If some error or message is in the context of some shard or database
be sure to include it in the message.
2018-07-18 10:22:53 -06:00
David Norton 6016a80997 allow tag keys to contain underscores 2018-07-17 09:39:08 -04:00
Stuart Carnie 88cd9f3fcf pr(influx-tools): Improvements per PR review 2018-06-13 10:29:59 -07:00
Stuart Carnie 7e998779e6 feat(tsdb/store): Option to disable compactions for offline tools
Allows an offline tool to open the tsdb.Store with compactions disabled.
2018-06-13 10:29:59 -07:00
Stuart Carnie 7abf3ec048 fix(tsdb/store): Fix hang when closing Store if monitor is disabled 2018-06-13 10:29:59 -07:00
Ben Johnson d3e3b05a49
Add tsm1 open limiter
This commit restricts the number of TSM1 files that can be opened
concurrently across the entire `tsdb.Store`. There is currently
a limit for the number of shards that can be opened concurrently,
however, this limit does not help when the number of CPU cores
is higher than the number of shards. Because TSM1 files have a 2GB
limit and there is no limit on the number of files per shard,
extremely large shards (1TB+) can load 1,000s of files simultaneously.
2018-05-29 10:21:53 -06:00
Jacob Marble 735aa2d7dc Add SeriesIDSet() to Index interface 2018-05-18 09:22:43 -07:00
Jacob Marble 7f8b7af61e
Cleanup index memory footprint counting code (#9828)
* Fix IndexSet.DedupeInmemIndexes

* Cleanup index memory footprint code
2018-05-15 11:25:19 -07:00
Jacob Marble 0763d1789e Get inmem index bytes without double-counting 2018-05-10 11:33:52 -07:00
Jacob Marble 7de2dcd3d9 TSM: TSMReader.Close blocks until reads complete 2018-04-30 13:46:03 -07:00
Edd Robinson 0b4a403679 Provide warning when mixed index used on db 2018-04-25 13:57:08 +01:00
Edd Robinson 32e195860b Log index type when opening shard 2018-04-25 13:02:09 +01:00
Stuart Carnie 14dcc5d6e7 PR feedback 2018-04-19 18:05:55 -07:00
Stuart Carnie e7389b18c0 tsdb: add additional engine options
* filters allow specific combinations of database, retention policy and
  shard groups to be opened. This was added to reduce the start-up time
  of the export tool and limit the memory usage.
2018-04-19 18:05:55 -07:00
Ben Johnson d0688201ba
Fix missing Store.Close() unlock. 2018-03-06 10:36:44 -07:00
Stuart Carnie a74d296200 use underscore vs period, fix doc comment, add database name to CQ 2018-02-26 10:08:43 -07:00
Stuart Carnie d135aecf02 Generate trace logs for a number of significant influx operations
* tsdb Store.Open traces all events related to opening files
    * op.name : tsdb.open
* retention policy shard deletions
    * op.name : retention.delete_check
* all TSM compaction strategies
    * op.name : tsm1.compact_group
* series file compactions
    * op.name : series_partition.compaction
* continuous query execution (if logging enabled)
    * op.name : continuous_querier.execute
* TSI log file compaction
    * op_name: index.tsi.compact_log_file
* TSI level compaction
    * op.name: index.tsi.compact_to_level
2018-02-21 15:08:49 -07:00
Jonathan A. Sternberg d38413a849
Merge pull request #9454 from influxdata/js-structured-logging
Update logging calls to take advantage of structured logging
2018-02-21 09:14:40 -06:00
Jonathan A. Sternberg 0727ffbf4e
Mark a shard as in process of being deleted
Without this, deleting a shard could trigger things so that a write
would attempt to create the shard again before it was actually deleted.
2018-02-20 12:17:30 -07:00
Jonathan A. Sternberg 2bbd96768d Update logging calls to take advantage of structured logging
Includes a style guide that details the basics of how to log.
2018-02-20 10:04:19 -06:00
Edd Robinson 433e643364 Fix data race when collecting sketches 2018-02-15 11:16:32 +00:00
Edd Robinson e5c8fd9dc5 Ensure nil sketches never returned 2018-02-09 15:29:42 +00:00
Edd Robinson 544329380f
Add empty series sketches back to tsi1 index
This commit adds initial empty sketches back to the tsi1 index, as well
as ensuring that ephemeral sketches in the index `LogFile` are updated
accordingly.

The commit also adds a test that verifies that the merged sketches at
the store level produce the correct results under writes, deletions and
re-opening of the store.

This commit does not provide working sketches for post-compaction on the
tsi1 index.
2018-02-07 14:52:13 -07:00
Edd Robinson 42c3adeffc simplify packages under tsdb 2018-01-21 09:41:27 -08:00
Edd Robinson 4ccb6ada69 Remove unused code/cleanup tsdb package 2018-01-20 14:06:15 +00:00
Jason Wilder 8f52e442e6 Fix deadlock in DeleteSeries
The Store.Delete series held an RLock while deleting from each shard.
While deleting, the Engine uses shardSet to see if a series is fully
deleted.  The shardSet.ForEach also takes and RLock.  If a Lock is
requested between these two calls, a deadlock occurs.

To fix, we don't need to hold an RLock for the duration of the delete
in the store as each Shard handles concurrency itself and we have a
snapshot of the shards we need to access.
2018-01-17 10:28:21 -07:00
Edd Robinson bd762380b0 Use bitsets to calculate series cardinality 2018-01-16 23:22:52 +00:00
Edd Robinson ceb3abd118 Remove series when shard rolls over
Series should only be removed from the series file when they're no
longer present in any shard. This commit ensures that during a shard
rollover, the series local to the shard are checked against all other
series in the database.

Series that are no longer present in any other shards' bitsets, are then
marked as deleted in the series file.
2018-01-16 15:58:20 +00:00
Edd Robinson e902998f4e All closes are now fast 2018-01-16 14:56:54 +00:00
Edd Robinson 8039165ab4 Ensure no double r-locking occurs in IndexSet
use. However, because the reference counting was implemented via
mutexes, it was possible to double `RLock` the series file mutex. This
allows a `Lock` to arrive in-between the two `RLock`s, (such as when
deleting the database), causing deadlock.

This commit addresses this by ensuring that from within `IndexSet`
methods, when calling other `IndexSet` methods, that they're all
unexported, and that those unexported methods never take a lock on the
series file.

Keeping series file locking in exported `IndexSet` methods only, allows
one to see any future races more easily.
2018-01-16 14:56:34 +00:00
Jason Wilder ba9a5af7eb Mark series deleted in series file
This commit adds the ability to correctly mark a series as deleted in
the global series file. Whenever a shard engine determines that a series
should be deleted, it checks with each shard's bitset for series that
are to be deleted and are no longer contained in any shard-local
bitsets.

These series are then removed from the series file.
2018-01-15 12:00:30 +00:00
Edd Robinson 286c8f4c09 Return to original DELETE/DROP SERIES semantics
This reverts commit 59afd8cc90.
2018-01-15 12:00:30 +00:00
Jason Wilder 874d5839da Don't return error for non-existent series file
When dropping series, if the series file does not exists we returned
and error.  This breaks compatibility with prior versions that would
not return an error if the series do not exists.
2018-01-14 12:53:26 -07:00
Jason Wilder 5d1f76192a Ensure series file is not closed while in use 2018-01-12 16:58:33 -07:00
Ben Johnson d610a79487
Merge pull request #9295 from influxdata/partition-series-file
Partition series file
2018-01-11 08:45:18 -07:00
Ben Johnson ac4dc91c64
Partition series file. 2018-01-10 08:33:25 -07:00
Edd Robinson 6eeecb477e Fix race in DeleteDatabase 2018-01-10 14:33:14 +00:00
Ben Johnson 3108eea330
Merge pull request #9291 from influxdata/bj-fix-series-file-delete
Fix series file removal after DROP DATABASE.
2018-01-08 13:20:02 -07:00
Ben Johnson fe2116a4fc
Fix series file removal after DROP DATABASE. 2018-01-08 11:40:06 -07:00
David Norton 1ea41b0dd6
Merge pull request #9287 from influxdata/dn-return-digest-size
fix #9286: return digest size
2018-01-08 13:30:56 -05:00
David Norton 1c452d83cb fix #9286: return digest size 2018-01-08 13:15:14 -05:00
Ben Johnson 88ce43a639
Merge pull request #9285 from influxdata/bj-series-file-windows
WIP: Close series file on database deletion.
2018-01-08 09:51:58 -07:00
Ben Johnson 370d363d38
Close series file on database deletion. 2018-01-05 13:33:35 -07:00
Edd Robinson 86c443cb02 Change series dir location 2018-01-05 16:40:23 +00:00
Edd Robinson 83d0ec8359 Optimise TagKeys and fix duplication bug 2018-01-05 12:51:21 +00:00
Edd Robinson c13910a51f Don't try to load .series directory 2018-01-04 16:23:50 +00:00
Edd Robinson f9ea54198f rename series directory 2018-01-03 15:44:58 +00:00
Ben Johnson 52630e69d7
Integrate SeriesFileCompactor 2018-01-02 12:20:03 -07:00
Ben Johnson 56980b0d24
Segment series file 2017-12-29 11:57:45 -07:00
Ben Johnson 8b2dbf4d83
Merge branch 'er-tsi-index-part' of https://github.com/influxdata/influxdb into er-tsi-index-part 2017-12-19 10:33:02 -07:00
Ben Johnson 107291c6b0
series file refactor 2017-12-19 10:31:33 -07:00
Edd Robinson c476a0b4a1 Merge branch 'master' into er-tsi-index-part 2017-12-15 18:31:24 +00:00
Edd Robinson 73fcf894b6 Fix shard races when accessing index 2017-12-15 18:19:55 +00:00
Edd Robinson 3bfe525705 Add 32-bit support to series file
This commit ensures that the series file should work appropriately on
32-bit architecturs. It does this by reducing the maximum size of a
series file to 512MB on 32-bit systems, which should be fully
addressable.

It further updates tests so that the series file size can be reduced
further when running many tests in parallel on 32-bit architectures.
2017-12-15 15:47:26 +00:00
Jason Wilder 749c9d2483 Rate limit disk IO when writing TSM files
This limits the disk IO for writing TSM files during compactions
and snapshots.  This helps reduce the spiky IO patterns on SSDs and
when compactions run very quickly.
2017-12-14 22:02:32 -07:00
Edd Robinson 59afd8cc90 Return to original DELETE/DROP SERIES semantics
Since possibly v0.9 DELETE SERIES has had the unwanted side effect of
removing series from the index when the last traces of series data are
removed from TSM. This occurred because the inmem index was rebuilt on
startup, and if there was no TSM data for a series then there could be
not series to add to the index.

This commit returns to the original (documented) DROP/DETETE SERIES
behaviour. As such, when issuing DROP SERIES all instances of matching
series will be removed from both the TSM engine and the index. When
issuing DELETE SERIES only TSM data will be removed.

It is up to the operator to remove series from the index.

NB, this commit does not address how to remove series data from the
series file when a shard rolls over.
2017-12-15 00:02:06 +00:00
Edd Robinson 9e3b17fd09 Ensure deleted series are not returned via iterators 2017-12-14 21:29:35 +00:00
David Norton 4e13248d85 feat #9212: add ability to generate shard digests 2017-12-13 09:28:34 -05:00
Edd Robinson f1bcc97e89 Fix auth tests 2017-12-12 21:25:35 +00:00
Edd Robinson 7d13bf3262 merge master 2017-12-08 17:21:58 +00:00
Edd Robinson f6835632e7 Merge master into branch 2017-12-08 17:11:07 +00:00
Adam a0b2195d6b
Pulled in backup-relevant code for review (#9193)
for issue #8879
2017-12-07 11:35:20 -05:00
Jason Wilder 56d8f05f12 Cap concurrent compactions when large number of cores exists
The default max-concurrent-compactions settings allows up to 50%
of cores to be used for compactions.  When the number of cores is
high (>8), this can lead to high disk utilization.  Capping at
4 and combined with high snapshot sizes seems to keep the compaction
backlog reasonable and not tax the disks as much.  Systems with lots
of IOPS, RAM and CPU cores may want to increase these.
2017-12-06 13:45:08 -07:00
Ben Johnson 493c1ed0d1
inmem tests passing. 2017-12-05 10:49:58 -07:00
Ben Johnson f5f85d65f9
Fixing more tests. 2017-12-04 10:29:04 -07:00
Ben Johnson f1cf55ca99
Merge branch 'er-tsi-index-part' of https://github.com/influxdata/influxdb into er-tsi-index-part 2017-11-30 05:45:40 -07:00
Ben Johnson ca09f18e65
intermediate: tsdb compile 2017-11-29 11:20:18 -07:00
Edd Robinson 6dbb070ce9 Fix race on sfiles in Store 2017-11-27 15:41:16 +00:00
Ben Johnson fc966a1b67
Add series file backup/restore. 2017-11-22 08:55:54 -07:00
Edd Robinson 68dd5e27c8 Improve performance of TagKeys 2017-11-21 17:16:47 +00:00
Edd Robinson 6851db3fc9 Add FGA support to SHOW MEASUREMENTS 2017-11-17 11:06:43 +00:00
Ben Johnson ede3fcf98e
intermediate 2017-11-15 16:09:25 -07:00
Ben Johnson ba4c9e0317
Merge remote-tracking branch 'upstream/master' into er-tsi-index-part 2017-11-14 16:14:13 -07:00
Jason Wilder aee395d3bd Make DeleteSeriesRange take SeriesIterator 2017-11-13 09:02:10 -07:00
Jason Wilder f893beb6d8 Use MeasurementSeriesKeysByExprIterator for deletes 2017-11-13 09:02:10 -07:00
Jonathan A. Sternberg 0b7c56bcd8 Update the zap logger dependency
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.

This updates the usage of the logger and adds a simple package for
constructing the base logger.

The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
2017-11-10 16:27:16 -06:00
Ben Johnson 9ad2b53881
intermediate 2017-11-09 09:18:33 -07:00
Edd Robinson 59c4e4b1bc Skip shards we don't have 2017-11-08 13:33:52 +00:00
Ben Johnson 156f25ac23
Improve SHOW TAG KEYS performance. 2017-11-07 10:59:19 -07:00
Edd Robinson e762da9aca Fix race on store close
There was a very small window where it was possible to deadlock during
the close of the Store. When closing, the Store waited on its Waitgroup
under a `Lock`. Naturally, all other goroutines must have been in a
position to call `Done` on the `Waitgroup` before the `Wait` call in
`Close` would return.

For the goroutine running the `monitorShards` method it was possible
that it would be unable to do this. Specifically, if the `monitorShards`
goroutine was jumping into the `t.C` case as the `Close()` goroutine was
acquiring the `Lock` then then `monitorShards` goroutine would be unable
to acquire the `RLock`. Since it would also be unable to progress around
its loop to jump into the `s.closing` case, it would be unable to call
`Done` on the `WaitGroup` and we would have a deadlock.

This was identified during an AppVeyor CI run, though I was unable to
reproduce this locally.
2017-11-07 15:26:46 +00:00
Edd Robinson 88e2ea822d Add inmem shard optimisation to SHOW MEASUREMENTS 2017-11-06 19:15:01 +00:00
Edd Robinson f8353bf300 Check shard index type correctly
Previously we used the EngineOptions to determine which shard index
type we were using. However, these options are set once at runtime
initialisation. Therefore if you're running with TSI enabled but then
accessing a legacy database with the inmem index, TagValues would not
have taken advantage of the inmem index.

This change ensures we always check the actual index of the shard(s).
2017-11-06 19:15:01 +00:00
Edd Robinson fbcb299b8a Support WHERE time clause in SHOW TAG VALUES
This commit adds time support to SHOW TAG VALUES. Time can be used as
both a lower and upper boundary. However, there are some caveats.

For the `inmem` index, filtering by time will still return all results
because the index data is shared across shards.

For the `tsi1` index, filtering by time will only work down to the shard
lever. Specifically, when querying by time all shards within that time
range will be used to generate the results.
2017-11-06 19:15:01 +00:00
Stuart Carnie f3d45ba301 influxdata/influxdb/influxql -> influxdata/influxql 2017-10-30 14:40:26 -07:00
Jason Wilder 71071ed67a Add compaction backlog stat
This gives an indication as to whether compactions are backed up
or not.
2017-10-03 10:48:14 -06:00
Jason Wilder ae821f4e2d Rework compaction scheduling
This changes the compaction scheduling to better utilize the available
cores that are free.  Previously, a level was planned in its own goroutine
and would kick off a number of compactions groups.  The problem with this
model was that if there were 4 groups, and 3 completed quickly, the planning
would be blocked for that level until the last group finished.  If the compactions
at the prior level are running more quickly, a large backlog could accumlate.

This now moves the planning to a single goroutine that plans each level in
succession and starts as many groups as it can.  When one group finishes,
the planning will start the next group for the level.
2017-10-03 10:48:13 -06:00
Joe LeGasse 1443b22379 auth: add series auth to 'show tag values' 2017-09-27 20:01:18 -04:00
Edd Robinson 4a67f92acc Prevent store from directly accessing Shard's engine 2017-09-25 17:43:01 +01:00
Edd Robinson 8e9cabbb9c Fix race in TagValues when reaching into engine 2017-09-25 17:43:01 +01:00
Jason Wilder db204f3eb7 Default concurrent compactions to 50% of available cores 2017-09-21 12:48:11 -06:00
Jason Wilder 31646aae3a Release mmap pages when shard is cold
This instructs the kernel that it can release memory used by mmap'd
TSM files when they are not actively being used.  It the mappings are
use, the kernel will fault the pages back in.  On linux, this causes
RES memory to drop immediately when run.
2017-09-18 11:51:51 -06:00
Jason Wilder 38460ec37e Re-enable compactions during writes
A cold shard that suddenly receives a lot of writes could get a very
big cache that takes a long time to snapshot or causes the cache
max memory limit to be hit more quickly.  This re-enables the compactions
if necessary during writes so we don't have to wait for the shard monitor
goroutine to re-enable them.
2017-09-11 15:29:26 -06:00
Jonathan A. Sternberg 697759613c Remove time comparisons from the inner sections of the storage engine 2017-08-16 16:51:13 -05:00
Jonathan A. Sternberg 8bd04ebe39 Remove TimeRange function and replace with a more accurate ConditionExpr function
The ConditionExpr function is more accurate because it parses the
condition and ensures that time conditions are actually used correctly.
That means that attempting to combine conditions with OR will not result
in the query silently pretending it's an AND and nested conditions work
correctly so there is only one way to read the query.

It also extracts the non-time conditions into a separate condition so we
can stop attempting to parse around the time conditions in lower layers
of the storage engine. This change does not remove those hacks, but a
following commit should be able to sanitize the condition and remove
them.
2017-08-16 16:45:35 -05:00
Jason Wilder c74932de94 Limit shard cardinality checks to 1 per database
The tag cardinality checks were run for all inmem shards.  Since inmem
shards share the same index, a lot of the work is redundant.  Inmem shards
also need to sort their measurmenet and tag keys which can be CPU intensive
with many shards or higher cardinality.

This changes the monitoring to just check one shard in each database which
should lower CPU usage due to excessive sorting.  The longer term solution
is to use TSI which would not have this check or required sorting.
2017-08-15 12:17:18 -06:00
Edd Robinson aa7095be5a Use a merge-based approach for TagValues 2017-08-02 14:10:52 +01:00
Jason Wilder 94a48774b7 Pull in new index filter 2017-08-02 14:10:52 +01:00
Edd Robinson 1e9ce8e0a7 Add test for TagValues 2017-08-02 14:10:52 +01:00
Jason Wilder c75ac3076f Limit delete to run one shard at a time
There was a change to speed up deleting and dropping measurements
that executed the deletes in parallel for all shards at once. #7015

When TSI was merged in #7618, the series keys passed into Shard.DeleteMeasurement
were removed and were expanded lower down.  This causes memory to blow up
when a delete across many shards occurs as we now expand the set of series
keys N times instead of just once as before.

While running the deletes in parallel would be ideal, there have been a number
of optimizations in the delete path that make running deletes serially pretty
good.  This change just limits the concurrency of the deletes which keeps memory
more stable.
2017-07-27 16:01:47 -06:00