Commit Graph

55 Commits (master-1.x)

Author SHA1 Message Date
Geoffrey Wossum b4bd607eef
fix: prevent retention service from hanging (#25055)
* fix: prevent retention service from hanging

Fix issue that can cause the retention service to hang waiting on a
`Shard.Close` call. When this occurs, no other shards will be deleted
by the retention service. This is usually noticed as an increase in
disk usage because old shards are not cleaned up.

The fix adds to new methods to `Store`, `SetShardNewReadersBlocked`
and `InUse`. `InUse` can be used to poll if a shard has active readers,
which the retention service uses to skip over in-use shards to prevent
the service from hanging. `SetShardNewReadersBlocked` determines if
new read access may be granted to a shard. This is required to prevent
race conditions around the use of `InUse` and the deletion of shards.

If the retention service skips over a shard because it is in-use, the
shard will be checked again the next time the retention service is run.
It can be deleted on subsequent checks if it is no longer in-use. If
the shards is stuck in-use, the retention service will not be able to
delete the shards, which can be observed in the logs for manual
intervention. Other shards can still be deleted by the retention service
even if a shard is stuck with readers.

closes: #25054
2024-06-13 11:07:17 -05:00
Geoffrey Wossum 7bd3f89d18
fix: prevent retention service creating orphaned shard files (#24530)
* fix: prevent retention service creating orphaned shard files

Under certain circumstances, the retention service can fail to delete shards from
the store in a timely manner. When the shard groups are pruned based on age, this
leaves orphaned shard files on the disk. The retention service will then not attempt
to remove the obsolete shard files because the meta store does not know about them.
This can cause excessive disk space usage for some users.

This corrects that by requiring shards files be deleted before they can be removed
from the meta store.

fixes: #24529
2024-01-04 11:59:28 -06:00
Edd Robinson 663566e3e0 Ensure go fmt passes on 1.10/11 2018-08-21 17:39:42 +01:00
Ben Johnson 1fe9abd66f
Delete deleted shards in retention service. 2018-03-28 10:44:14 -06:00
Stuart Carnie a74d296200 use underscore vs period, fix doc comment, add database name to CQ 2018-02-26 10:08:43 -07:00
Stuart Carnie d135aecf02 Generate trace logs for a number of significant influx operations
* tsdb Store.Open traces all events related to opening files
    * op.name : tsdb.open
* retention policy shard deletions
    * op.name : retention.delete_check
* all TSM compaction strategies
    * op.name : tsm1.compact_group
* series file compactions
    * op.name : series_partition.compaction
* continuous query execution (if logging enabled)
    * op.name : continuous_querier.execute
* TSI log file compaction
    * op_name: index.tsi.compact_log_file
* TSI level compaction
    * op.name: index.tsi.compact_to_level
2018-02-21 15:08:49 -07:00
Jonathan A. Sternberg 2bbd96768d Update logging calls to take advantage of structured logging
Includes a style guide that details the basics of how to log.
2018-02-20 10:04:19 -06:00
Edd Robinson 6a66b5faf0 Cleanup services package 2018-01-21 10:52:37 -08:00
Jonathan A. Sternberg b775ad3d5d Expand unit test code coverage in services that were undercovered
This expands code coverage for the following packages:
* monitor (3.5% -> 86.9%)
* services/precreator (31.6% -> 83.8%)
* services/retention (83.0% -> 84.9%)
* services/snapshotter (0.0% -> 82.1%)
* tcp (48.7% -> 60.0%)
2017-11-28 15:44:35 -06:00
Jonathan A. Sternberg ca5a773c34 Initial jenkinsfile 2017-11-13 14:02:23 -06:00
Jonathan A. Sternberg 0b7c56bcd8 Update the zap logger dependency
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.

This updates the usage of the logger and adds a simple package for
constructing the base logger.

The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
2017-11-10 16:27:16 -06:00
Edd Robinson 2ea2abb001 Remove possibility of race when dropping shards
Fixes #8819.

Previously, the process of dropping expired shards according to the
retention policy duration, was managed by two independent goroutines in
the retention policy service. This behaviour was introduced in #2776,
at a time when there were both data and meta nodes in the OSS codebase.
The idea was that only the leader meta node would run the meta data
deletions in the first goroutine, and all other nodes would run the
local deletions in the second goroutine.

InfluxDB no longer operates in that way and so we ended up with two
independent goroutines that were carrying out an action that was really
dependent on each other.

If the second goroutine runs before the first then it may not see the
meta data changes indicating shards should be deleted and it won't
delete any shards locally. Shortly after this the first goroutine will
run and remove the meta data for the shard groups.

This results in a situation where it looks like the shards have gone,
but in fact they remain on disk (and importantly, their series within
the index) until the next time the second goroutine runs. By default
that's 30 minutes.

In the case where the shards to be removed would have removed the last
occurences of some series, then it's possible that if the database was already at its
maximum series limit (or tag limit for that matter), no further new series
can be inserted.
2017-10-26 16:15:13 +01:00
Edd Robinson 77977af685 Add repro test for #8819 2017-10-26 14:47:30 +01:00
Edd Robinson 1629ec7f5f Add tests to Retention service 2017-10-26 14:47:30 +01:00
Ben Johnson 358b1e0b05
Merge remote-tracking branch 'upstream/master' into tsi 2017-03-15 10:13:32 -06:00
Mark Rushakoff 535cf597f1 Report subset of config values in SHOW DIAGNOSTICS
This includes hand-selected config settings that are safe to expose and
not expected to include any kind of secrets.

Fixes #7821
2017-03-14 11:34:19 -07:00
Mark Rushakoff 601cbcd084 Merge branch '1.2' into mr-merge-12 2017-02-17 16:14:22 -08:00
Jonathan A. Sternberg 2fe48d6781 Rename zap import back to github.com/uber-go/zap
They rebased a revision we were previously relying upon that allowed us
to use the vanity name so we are reverting back to an older version with
the old import path.
2017-02-17 17:17:22 -06:00
Ben Johnson 047c21f4d9
Merge remote-tracking branch 'upstream/master' into tsi 2017-01-24 09:28:58 -07:00
Edd Robinson 9d30ee0a6b Fix subtle bugs and remove dead code from services 2017-01-17 09:47:34 -08:00
Mark Rushakoff bbb43faad2 Add more config validation 2017-01-10 10:28:49 -08:00
Ben Johnson 2b3cd415e2
Fixing rebase. 2017-01-06 09:52:16 -07:00
Edd Robinson 33623c1fa9
Revert back to original approach 2017-01-05 09:58:39 -07:00
Edd Robinson 05bc4dec00
Refactor 2017-01-05 09:50:23 -07:00
Mark Rushakoff 218fc3890d Update godoc for services
The admin service was deliberately skipped due to it being deprecated.
2016-12-30 18:03:01 -08:00
Jonathan A. Sternberg ec57108520 Use proper uber-go/zap import path
It looks like the real import path to the project is go.uber.org/zap
instead of github.com/uber-go/zap since the example in the project
references that path.
2016-12-15 08:54:14 -06:00
Jonathan A. Sternberg 21502a39e8 Switch logging to use structured logging everywhere
The logging library has been switched to use uber-go/zap. While the
logging has been changed to use structured logging, this commit does not
change any of the logging statements to take advantage of the new
structured log or new log levels. Those changes will come in future
commits.
2016-12-14 10:45:15 -06:00
Cory LaNou d54d32b9d4
prune shards in meta data 2016-12-01 11:22:16 -06:00
Seif Lotfy ced79acac5 Remove redundant error return from (Client) Database(name string) and (Client) Databases() 2016-04-30 17:04:38 -05:00
Stephen Gutekanst 9dc09c5257 Make logging output location more programmatically configurable (#6213)
This has various benefits:

- Users embedding InfluxDB within other Go programs can specify a different logger / prefix easily.
- More consistent with code used elsewhere in InfluxDB (e.g. services, other `run.Server.*` fields, etc).
- This is also more efficient, because it means `executeQuery` no longer allocates a single `*log.Logger` each time it is called.
2016-04-20 21:07:08 +01:00
Ben Johnson d9a6a7340f add canonical paths 2016-02-10 11:30:52 -07:00
Ben Johnson 5a0d1ab7c1 rename influxdb/influxdb to influxdata/influxdb
This commit changes all the import and URL references from:

    github.com/influxdb/influxdb

to:

    github.com/influxdata/influxdb
2016-02-10 10:26:18 -07:00
Cory LaNou 9fd651277b use local logger 2016-01-21 15:28:34 -05:00
Cory LaNou e36eaa0378 fix vet warnings 2016-01-21 15:28:34 -05:00
Paul Dix c99b214e87 Fix retention policy meta client interface 2016-01-21 15:28:34 -05:00
Paul Dix 0341bc3532 Update meta client and retention service.
* Remove VisitRetentionPolicies from meta client.
* Update retention enforcer to run on every data node.
2016-01-21 15:28:33 -05:00
Cory LaNou 8d878fff91 buildable meta -> services/meta 2016-01-21 15:28:32 -05:00
Paweł Kowalak 8c2f6eb7e0 Comment additions to services to satisfy golint 2015-11-19 13:25:07 +01:00
Philip O'Toole bece8fed2a Better retention enforcement logging
Fixes issue #4727.
2015-11-09 17:22:24 -08:00
Philip O'Toole ef72c3c64d Fix typo in retention service comment
[ci skip]
2015-10-19 14:24:25 -07:00
Philip O'Toole d771612718 Set default retention check interval to 30 minutes
Since the minimum retention period is 1 hour, checking every 10 minutes
seems excessive and generates noise in the logs.
2015-08-27 16:08:03 -07:00
Philip O'Toole ae825fdf3d Correct typo in retention service logs 2015-08-27 16:08:03 -07:00
Philip O'Toole 6415944d01 Don't repeat retention policy log message 2015-08-17 16:15:51 -07:00
Jason Wilder 668181d275 Make log statements more consistent
* Capitalize first letter of message
* Log all services staring consistently
* Remove some extraneous log statements in meta.Store
* Log data dirs for meta, data and hinted handoff
2015-08-13 10:01:42 -06:00
Philip O'Toole a94cf8a4ca Consolidate imports for retention service
Minor cosmetic change
2015-06-10 11:25:17 -07:00
Cory LaNou 8a5cf394d8 add ability to silence logging for testing 2015-06-10 10:27:57 -05:00
Philip O'Toole c59a234e7b Deleted shard groups are not counted as "expired" 2015-06-05 13:08:36 -07:00
Philip O'Toole a4a05241ba Set retention enforcement default config 2015-06-05 11:27:30 -07:00
Philip O'Toole 967ebe0568 Stop retention timers on service shutdown 2015-06-05 08:17:27 -07:00
Philip O'Toole 5e5f2cd37d Move expired and deleted shard logic to MetaStore 2015-06-04 22:18:52 -07:00