PlanOptimize is being checked far too frequently. This PR is the simplest change that can be made in order to ensure that PlanOptimize is not being ran too much. To alleviate the frequency I've added a lastWrite parameter to PlanOptimize and added an additional test that mocks the edge cause out in the wild that led to this PR.
Previously in test cases for PlanOptimize I was not checked to see if certain cases would be picked up by Plan I've adjusted a few of the existing test cases after modifying Plan and PlanOptimize to have the same lastWrite time.
* feat: Add CompactPointsPerBlock config opt
This PR adds an additional parameter for influxd
CompactPointsPerBlock. It adjusts the DefaultAggressiveMaxPointsPerBlock
to 10,000. We had discovered that with the points per block set to
100,000 compacted TSM files were increasing. After modifying the
points per block to 10,000 we noticed that the file sizes decreased.
The value has been set as a parameter that can be adjusted by administrators
this allows there to be some tuning if compression problems are encountered.
* feat: Modify optimized compaction to cover edge cases
This PR changes the algorithm for compaction to account for the following
cases that were not previously accounted for:
- Many generations with a groupsize over 2 GB
- Single generation with many files and a groupsize under 2 GB
- Where groupsize is the total size of the TSM files in said shard directory.
- shards that may have over a 2 GB group size but
many fragmented files (under 2 GB and under aggressive
point per block count)
closes https://github.com/influxdata/influxdb/issues/25666
* chore: upgrade Go to 1.19.3
This re-runs ./generate.sh and ./checkfmt.sh to format and update
source code (this is primarily responsible for the huge diff.)
* fix: update tests to reflect sorting algorithm change
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.
closes https://github.com/influxdata/influxdb/issues/16289
When the compaction planner runs, if it cannot acquire
a lock on the files it plans to compact, it returns a
nil list of compaction groups. This, in turn, sets the
engine statistics for compactions queues to zero,
which is incorrect. Instead, use the length of pending
files which would have been returned.
closes https://github.com/influxdata/influxdb/issues/22138
Compaction logging will generate intermediate information on
volume of data written and output files created, as well as
improve some of the anti-entropy messages related to compaction.
This will also apply to `influx_tools compact`
Closes https://github.com/influxdata/influxdb/issues/21704
* fix: backport tsdb fix for window pushdowns
From https://github.com/influxdata/influxdb/pull/19855
* fix(storage): cursor requests are [start, stop] instead of [start, stop)
The cursors were previously [start, stop) to be consistent with how flux
requests data, but the underlying storage file store was [start, stop]
because that's how influxql read data. This reverts back the cursor
behavior so that it is now [start, stop] everywhere and the conversion
from [start, stop) to [start, stop] is performed when doing the cursor
request to get the next cursor.
cherry-pick from #21318
Co-authored-by: Sam Arnold <sarnold@influxdata.com>
(cherry picked from commit 7766672797)
* chore: fix formatting
Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
This is a backport of #14262 to the 1.x storage engine.
This also ports the table tests that existed with the pre-beta version of the
storage engine to the one that is now used in the production version.
A few of the tests are skipped. These are portions of the storage engine
that have not been ported over. They should be unskipped when that
functionality is ported over.
Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
Extending the context instead of fixing the API breaks type safety.
For tracking the number of points / values written, it is much clearer
to pass an explicit tracker.
When using queries like 'select count(_seriesKey) from bigmeasurement`, we
should iterate over the tsi structures to serve the query instead of loading
all the series into memory up front.
Closes#20543
* feat(engine/tsm1): Add WritePointsWithContext()
Add WritePontsWithContext() and make WritePoints() a thin wrapper for
it.
The purpose is to add statistics context values that we'll use to
propagate the number of fields and points written to calls up the call
chain.
* feat(tsdb): Add WriteToShardWithContext()
When applied, this patch adds WriteToShardWithContext() and wraps it
with WriteToShard() to preserve the API.
The the purpose of this addition is to propagate a context.Context value
to Shard.WritePointsWithContext().
* feat(tsdb/shard): Add WritePointsWithContext()
The purpose of adding WritePointsWithContext() is to propage context
values down to engine code and propage statistics via the context.Value
up to callers.
This patch also adds values written statistics to the shard.
* feat(http): Gather values written stats
WritePointsWithContext() was added to propagate context values down to
the engine and communicate stats to the caller.
* feat(http): Gather values written stats
WritePointsWithContext() was added to propagate context values down to
the engine and communicate stats to the caller.
* refactor: Change MetricKey to ContextKey
This patch gives the type we're useing for context keys a better name.
This change adds a lock around digest creation so that it is safe for
concurrent calls. Prior to this change, calls from multiple goroutines
resulted in "Digest aborted, problem renaming tmp digest" errors.
This is the start of per-series validation that occurs in the
Engine write path. It uses an in-memory radix tree to reduce
memory usage and is re-built on demand the first time a series
is written.
This moves the time range to delete to be returned by the predicate
func in DeleteSeriesRangeWithPredicate. It allows for a single delete
to delete different ranges of times per series instead of a single
range of time for all series.
Re-open the last wal segment instead of creating a new one. This fixes
an issue where the last modified time of the WAL would change on
restart. It also avoids a lot of IO file churn on restart.
This commit adds the ability to correctly mark a series as deleted in
the global series file. Whenever a shard engine determines that a series
should be deleted, it checks with each shard's bitset for series that
are to be deleted and are no longer contained in any shard-local
bitsets.
These series are then removed from the series file.