Commit Graph

105 Commits (db/6263/compaction-debug-logging)

Author SHA1 Message Date
Geoffrey Wossum 8497fbf0af
chore: remove unnecessary fmt.Sprintf calls (#25536)
Remove unnecessary fmt.Sprintf calls for static code checks in main-2.x.
2024-11-12 11:06:39 -06:00
Geoffrey Wossum 0bc167bbd7
chore: loadShards changes to more cleanly support 2.x feature (#25513)
* chore: move shardID parsing and shard filtering into walkShardsAndProcess

* chore: make it impossible to miss sending shardResponse or marking shard as complete

* chore: always count number of shards (preparation for 2.x related feature)

* chore: explicitly load series files and create indices serially

Explicitly load series files and create indices serially. Also
avoid passing them to work functions that don't need them.

* chore: rework loadShards for changes necessary to cancel loading process

* chore: comment improvements

* fix: fix race conditions in TestStore_StartupShardProgress and TestStore_BadShardLoading

* chore: avoid logging nil error

* chore: refactor shard loading and shard walking

Refactor loadShards and CreateShard to use a common shardLoader class that
makes thread-safety easier. Refactor walkShardsAndProcess into findShards.

* chore: improve comment

* chore: rename OpenShard to ReopenShard and implement with shardLoader

Rename Store.OpenShard to Store.ReopenShard and implement using a
shardLoader object. Changes to tests as necessary.

* chore: avoid resetting shard options and locking on Reopen

Avoid resetting shard options when reopening a shard.
Proper mutex locker in Shard.ReopenShard.

* chore: fix formatting issue

* chore: warn on mixed index types in Store.CreateShard

* chore: change from info to warn when invalid shard IDs found in path

* chore: use coarser locking in Store.ReopenShard

* chore: fix typo in comment

* chore: code simplification
2024-11-08 15:49:48 -06:00
WeblWabl 2cab9a2a1f
feat: Adds functionality to clear out bad shard list (#25398)
* feat(tsdb): Adds functionality to clear bad shards list

This PR adds test and new method to clear out the bad shards list
the method will return the values of the shards that it cleared out
along with the errors. This is the first part in the feature
for adding a load-shards command to influxd-ctl.

Closes influxdata/feature-requests#591
2024-10-18 13:22:32 -05:00
WeblWabl 3c87f524ed
feat(logging): Add startup logging for shard counts (#25378)
* feat(tsdb): Adds shard opening progress checks to startup
This PR adds a check to see how many shards are remaining
vs how many shards are opened. This change displays the percent
completed too.

closes influxdata/feature-requests#476
2024-10-16 10:09:15 -05:00
WeblWabl 8eaa24d813
feat(tsm): Allow for deletion of series outside default rp (#25312)
* feat(tsm): Allow for deletion of series outside default RP
9d116f6
This PR adds the ability for deletion of series that are outside
of the default retention policy. This updates InfluxQL to include changes
from: influxdata/influxql#71

closes: influxdata/feature-requests#175

* feat(tsm): Allow for deletion of series outside default RP
9d116f6
This PR adds the ability for deletion of series that are outside
of the default retention policy. This updates InfluxQL to include changes
from: influxdata/influxql#71

closes: influxdata/feature-requests#175
2024-09-17 16:34:14 -05:00
Geoffrey Wossum 23008e5286
chore: improve error messages and logging during shard opening (#25314)
* chore: improve error messages and logging during shard opening
2024-09-12 15:11:56 -05:00
Geoffrey Wossum b4bd607eef
fix: prevent retention service from hanging (#25055)
* fix: prevent retention service from hanging

Fix issue that can cause the retention service to hang waiting on a
`Shard.Close` call. When this occurs, no other shards will be deleted
by the retention service. This is usually noticed as an increase in
disk usage because old shards are not cleaned up.

The fix adds to new methods to `Store`, `SetShardNewReadersBlocked`
and `InUse`. `InUse` can be used to poll if a shard has active readers,
which the retention service uses to skip over in-use shards to prevent
the service from hanging. `SetShardNewReadersBlocked` determines if
new read access may be granted to a shard. This is required to prevent
race conditions around the use of `InUse` and the deletion of shards.

If the retention service skips over a shard because it is in-use, the
shard will be checked again the next time the retention service is run.
It can be deleted on subsequent checks if it is no longer in-use. If
the shards is stuck in-use, the retention service will not be able to
delete the shards, which can be observed in the logs for manual
intervention. Other shards can still be deleted by the retention service
even if a shard is stuck with readers.

closes: #25054
2024-06-13 11:07:17 -05:00
davidby-influx 54ac7e54ed
fix: remember shards that fail Open(), avoid repeated attempts (#23437)
If a shard cannot be opened, store its ID and last error.
Prevent future attempts to open during this invocation of
influxDB. This information is not persisted.

closes https://github.com/influxdata/influxdb/issues/23428
closes https://github.com/influxdata/influxdb/issues/23426
2022-06-13 10:32:47 -07:00
Geoffrey Wossum 160cf678d5
fix: MeasurementsCardinality should not be less than 0 (#23286)
Clamp the value of Store.MeasurementsCardinality so that it can not be less
than 0. This primarily shows up as a negative `numMeasurements` value in
/debug/vars under some circumstances.

refs #23285
2022-04-21 13:32:12 -05:00
Dane Strandboge 0574163566
build: upgrade to go1.18 (#23250) 2022-03-31 16:17:57 -05:00
davidby-influx 7d182158f4
fix: add database to MaxSeriesPerDatabase error message (#23113)
To simplify debugging, print the database name when the
max-series-per-database limit is exceeded in InMem indices.

closes https://github.com/influxdata/influxdb/issues/23112
2022-02-08 11:52:14 -08:00
davidby-influx 0c3dca883e
fix: correctly handle MaxSeriesPerDatabaseExceeded (#23091)
Check for the correctly returned PartialWriteError
in (*shard).validateSeriesAndFields, allow partial
writes.

closes https://github.com/influxdata/influxdb/issues/23090
2022-02-01 19:08:51 -08:00
Sam Arnold 611a4370a2
feat: show measurements database and retention policy wildcards (#22388)
* feat: show measurements database and retention policy wildcards

Closes #3318

* chore: run formatter
2021-10-05 09:07:25 -04:00
davidby-influx 092c7a9976 feat: Make meta queries respect QueryTimeout values
Meta queries (SHOW TAG VALUES, SHOW TAG KEYS, SHOW SERIES CARDINALITY, etc.) do not respect
the QueryTimeout config parameter. Meta queries should check the query context when possible
to allow cancellation and timeout. This will not be as frequent as regular queries, which
use iterators, because meta queries return data in batches.

Add a context.Context to
(*Store).MeasurementNames()
(*Store).MeasurementsCardinality()
(*Store).SeriesCardinality()
(*Store).TagValues()
(*Store).TagKeys()
(*Store).SeriesSketches()
(*Store).MeasurementsSketches()
which is tested for timeout or cancellation
to allow limitation of time spent in meta queries

https://github.com/influxdata/influxdb/issues/20736
2021-02-23 12:52:44 -08:00
Sam Arnold dd3baf6d4a
feat: measurement metrics by login (#20687)
After turning on authentication and both forms of ingress metrics:

"ingress": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"cq","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":76}},
"ingress:1": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"database","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":152}},
"ingress:2": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"httpd","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":874}},
"ingress:3": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"ingress","rp":"monitor"},"values":{"pointsWritten":534,"valuesWritten":1068}},
"ingress:4": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"localStore","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":76}},
"ingress:5": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"queryExecutor","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":190}},
"ingress:6": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"runtime","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":570}},
"ingress:7": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"shard","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":836}},
"ingress:8": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"subscriber","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":114}},
"ingress:9": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_cache","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":684}},
"ingress:10": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_engine","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":2204}},
"ingress:11": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_filestore","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":152}},
"ingress:12": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_wal","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":304}},
"ingress:13": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"write","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":342}},
"ingress:14": {"name":"ingress","tags":{"db":"telegraf","login":"admin","measurement":"cpu","rp":"autogen"},"values":{"pointsWritten":1,"valuesWritten":1}},
"ingress:15": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"cpu","rp":"autogen"},"values":{"pointsWritten":1316,"valuesWritten":13160}},
"ingress:16": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"disk","rp":"autogen"},"values":{"pointsWritten":642,"valuesWritten":4494}},
"ingress:17": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"diskio","rp":"autogen"},"values":{"pointsWritten":214,"valuesWritten":2354}},
"ingress:18": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"mem","rp":"autogen"},"values":{"pointsWritten":107,"valuesWritten":963}},
"ingress:19": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"processes","rp":"autogen"},"values":{"pointsWritten":107,"valuesWritten":856}},
"ingress:20": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"swap","rp":"autogen"},"values":{"pointsWritten":214,"valuesWritten":642}},
"ingress:21": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"system","rp":"autogen"},"values":{"pointsWritten":321,"valuesWritten":749}},

Only by login:

"ingress": {"name":"ingress","tags":{"login":"_systemuser_monitor"},"values":{"pointsWritten":42,"valuesWritten":354}},
"ingress:1": {"name":"ingress","tags":{"login":"admin"},"values":{"pointsWritten":1,"valuesWritten":1}},
"ingress:2": {"name":"ingress","tags":{"login":"telegraf"},"values":{"pointsWritten":3547,"valuesWritten":28246}},

Notice writes by users 'telegraf', '_systemuser_monitor', and 'admin'.
2021-02-04 11:52:53 -05:00
davidby-influx 6ec446f422 fix(tsm1): "snapshot in progress" error during backup
This fix adds a skipCacheOk flag to
tsdb.Store.CreateShardSnapshot() and tsdb.Shard.CreateSnapshot()
to pass to tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
This flag is set to false in tsm1.Engine.Export()

https://github.com/influxdata/plutonium/issues/3227
2020-11-05 11:08:08 -08:00
Ayan George 42873d4424
chore: Quiet static analysis tools (#19509)
* Remove redundant type in slice/array declarations.
* Call t.Fatal() from test-functions, not non-test go-routines.
* Remove unnecessary empty value operator from ranges.
* Call defer .Close() methods only after checking for error on Open().
2020-09-05 12:43:29 -04:00
Ayan George 4cbc6a4269
fix(tsdb): Fix variables masked by a declaration (#18129)
Before this commit, the to and from variables were being re-declared in
a block in such a way that the values were not being used.

This patch uses regular assignment so that the values are visable
outside of the block where they're set.

Closes: 18128
2020-05-17 21:40:12 -04:00
David Norton 903d2c2d28 fix(tsm1): improve series cardinality limit
Prior to this change, new series would be added to the series file
before checking the series cardinality limit. If the limit was exceeded,
the write was rejected even though the series had already been added to
the series file.
2020-01-21 16:45:13 -05:00
Edd Robinson 3a055a6107 Fix cardinality estimation error
This commit fixes an error in the TSI index with estimating the
cardinality of series recently added and then removed.
2019-01-10 17:46:30 +00:00
Tanya Gordeeva 0a39786ea7 tsdb: mixed shard tests
Specifically tests around the global index for fields with mixed shard types.
2018-12-13 08:31:49 -08:00
Edd Robinson cade59e253 Fix panic in IndexSet
This commit fixes a panic where a concurrent removal of a shard and meta
query could cause a `nil` index to be added to the IndexSet`.
2018-10-26 18:23:54 +01:00
Ben Johnson 88d006a18c
Remove TSI1 HLL sketches from heap.
This commit removes the HLL sketches on each `tsi1.LogFile` and
`tsi1.IndexFile` and instead caches the data at the `tsi1.Index`
level. This reduces the heap size significantly for servers with
many TSI-enabled shards.
2018-09-12 08:48:40 -06:00
Edd Robinson dece5b847f Refactor index names 2018-08-21 14:32:30 +01:00
Tanya Gordeeva cff3a1120e Fix flaky test TestStore_BackupRestoreShard
Iterator could be left open.

Fixes #9965
2018-06-18 09:45:26 -07:00
Edd Robinson c1e1412dae Don't panic when checking for field 2018-03-12 15:25:20 +00:00
Edd Robinson ac0c0756bf Alter test to trigger panic 2018-03-12 13:07:08 +00:00
Edd Robinson 544329380f
Add empty series sketches back to tsi1 index
This commit adds initial empty sketches back to the tsi1 index, as well
as ensuring that ephemeral sketches in the index `LogFile` are updated
accordingly.

The commit also adds a test that verifies that the merged sketches at
the store level produce the correct results under writes, deletions and
re-opening of the store.

This commit does not provide working sketches for post-compaction on the
tsi1 index.
2018-02-07 14:52:13 -07:00
Edd Robinson c8f30da88a
Tidy up tests 2018-02-07 14:52:13 -07:00
Edd Robinson b19edd55ac Ensure shard-level cardinality is correct 2018-01-29 16:22:42 +00:00
Edd Robinson bd762380b0 Use bitsets to calculate series cardinality 2018-01-16 23:22:52 +00:00
Edd Robinson ceb3abd118 Remove series when shard rolls over
Series should only be removed from the series file when they're no
longer present in any shard. This commit ensures that during a shard
rollover, the series local to the shard are checked against all other
series in the database.

Series that are no longer present in any other shards' bitsets, are then
marked as deleted in the series file.
2018-01-16 15:58:20 +00:00
Edd Robinson ee8d9e41f0 Update test with new DELETE method format 2018-01-15 13:27:05 +00:00
Ben Johnson 47851f4b7d Fix tag value auth check iterator. 2018-01-15 12:00:31 +00:00
Edd Robinson 286c8f4c09 Return to original DELETE/DROP SERIES semantics
This reverts commit 59afd8cc90.
2018-01-15 12:00:30 +00:00
Jason Wilder 874d5839da Don't return error for non-existent series file
When dropping series, if the series file does not exists we returned
and error.  This breaks compatibility with prior versions that would
not return an error if the series do not exists.
2018-01-14 12:53:26 -07:00
Ben Johnson 98486a284a
Merge pull request #9265 from benbjohnson/series-file-compaction
Sequential series file id & series file segmentation
2018-01-03 10:05:59 -07:00
Ben Johnson 52630e69d7
Integrate SeriesFileCompactor 2018-01-02 12:20:03 -07:00
Stuart Carnie 5dfe3b2645 inmem startup improvments
* only call ParseTags when necessary
* remove dependency on inmem.Series in tsdb test package
* Measurement and Series are no longer exported. Their use is restricted
  to the inmem package
* improve Measurement and Series types by exporting immutable
  fields and removing unnecessary APIs and locks

Reduced startup time from 28s to 17s. Overall improvement including
#9162 reduces startup from 46s to 17s for 1MM series across 14 shards.
2017-12-29 07:58:52 -07:00
Edd Robinson 72c0ec89fd Fix race on in-memory index 2017-12-18 16:22:19 +00:00
Edd Robinson 3bfe525705 Add 32-bit support to series file
This commit ensures that the series file should work appropriately on
32-bit architecturs. It does this by reducing the maximum size of a
series file to 512MB on 32-bit systems, which should be fully
addressable.

It further updates tests so that the series file size can be reduced
further when running many tests in parallel on 32-bit architectures.
2017-12-15 15:47:26 +00:00
Edd Robinson 59afd8cc90 Return to original DELETE/DROP SERIES semantics
Since possibly v0.9 DELETE SERIES has had the unwanted side effect of
removing series from the index when the last traces of series data are
removed from TSM. This occurred because the inmem index was rebuilt on
startup, and if there was no TSM data for a series then there could be
not series to add to the index.

This commit returns to the original (documented) DROP/DETETE SERIES
behaviour. As such, when issuing DROP SERIES all instances of matching
series will be removed from both the TSM engine and the index. When
issuing DELETE SERIES only TSM data will be removed.

It is up to the operator to remove series from the index.

NB, this commit does not address how to remove series data from the
series file when a shard rolls over.
2017-12-15 00:02:06 +00:00
Edd Robinson 9e3b17fd09 Ensure deleted series are not returned via iterators 2017-12-14 21:29:35 +00:00
Edd Robinson f6835632e7 Merge master into branch 2017-12-08 17:11:07 +00:00
Edd Robinson a5af19fc06 Address PR feedback 2017-11-17 12:43:48 +00:00
Edd Robinson 6851db3fc9 Add FGA support to SHOW MEASUREMENTS 2017-11-17 11:06:43 +00:00
Edd Robinson 5298339f21 Add test coverage for FGA on Tag Keys/Values 2017-11-17 11:06:43 +00:00
Ben Johnson ba4c9e0317
Merge remote-tracking branch 'upstream/master' into er-tsi-index-part 2017-11-14 16:14:13 -07:00
Jonathan A. Sternberg 0b7c56bcd8 Update the zap logger dependency
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.

This updates the usage of the logger and adds a simple package for
constructing the base logger.

The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
2017-11-10 16:27:16 -06:00
Ben Johnson d3cd750509
Refactor series file tombstoning. 2017-11-09 09:30:19 -07:00