Commit Graph

94 Commits (1.10)

Author SHA1 Message Date
Jeffrey Smith II fce0d1c863
chore: update to go 1.19 (#24119)
* chore: update to go 1.19.6

* chore: gofmt

* test: fix tests for sort order change

* chore: generate pb

* feat: upgrade flux to v0.188.0 (#23911)

* feat: upgrade flux to 0.171.0

Tests failing, safety commit

First step in https://github.com/influxdata/influxdb/issues/23815

* fix: remove "org" parameter" from writeOptSource

I attempted to implement the "orgOpt" argument in a similar fashion
to f6669f7512. However, it looks like Flux doesn't accept "org" as
a parameter to "load". It responds with:

Error calling function \"load\" @113:16-113:30: error calling function \"to\" @6:19-6:47: unused arguments [org]

This brings us from 194 passing to 570 passing.

* fix: temporarily disable broken flux tests

These tests expect rows to be stored in a certain order. However,
nothing is specifying the sort order. This has been fixed in a
later update to flux: (see 3d6f47ded).

Temporarily disable these tests until we include a fixed
version of the flux tests.

* chore: add tests from a492993012

This fixes "test-flux.sh" so it runs tests within the "flux/"
directory. This uncovered some other issues with the tests
located within "flux/". These also needed to be updated
to match the newer flux API.

* feat: upgrade flux to 0.172.0

This includes changes made in "cbbf4b27da". Since "test.go" in 2.x
diverged from 1.x, some modifications were required to make this
compatible.

* feat: upgrade flux to 0.173.0

* feat: upgrade flux to v0.174.0

* fix: Update the condition when reseting cursor (#23522)

Filters that contain `or` may change between cursor resets so we must remember to update the condition in the read cursor.

```flux
|> filter(fn: (r) => ((r["_field"] == "field1" and r["_value"]==true) or (r["_field"] == "field2" and r["_value"] == false)))
```

Closes https://github.com/influxdata/flux/issues/4804

* feat: upgrade flux to 0.174.1

* feat: upgrade flux to 0.175.0

* chore: remove end-to-end tests

These were removed in a492993 for 2.x. These tests prevent "go test ./..."
from completing. As stated in the original commit, these tests should now be
handled by the "fluxtest" harness.

* feat: upgrade flux to 0.176.0

Some tests needed to be disabled within the flux harness. This is a
result of enabling "Optimize Aggregate Window" in flux@05a1065f.
These tests are not present in 2.x. Therefore, I am unsure if
the breakage is resolved in a later commit.

* feat: upgrade flux to 0.177.0

* feat: upgrade flux to 0.178.0

* feat: upgrade flux to v0.179.0

This removes all invocations of "flux.RegisterOpSpec". According
to flux@e39096d5, "flux.RegisterOpSpec" does nothing in the
current version of flux and was removed.

* chore: update fluxtest skip list (#23633)

* chore: manually backport 785a465e9a

This removes the reference to "flux.Spec".

* build(flux): update flux to v0.181.0 (#23682)

* build(flux): update flux to v0.184.2

* chore: skip more Flux acceptance tests

There are issues for each skip detailed in test-flux.sh.

* feat: upgrade flux to v0.185.0

This adds "FluxTesting" to the "HTTPD" configuration. This option is
hidden and disabled by default. When "FluxTesting" is set, it
enables the default testing flags for "Flux".

These flags allow the "vectorized float tests" and tests requiring
the "removeRedundantSortNodes" and "labelPolymorphism" flag
enabled to work. These changes are based off of d8553c002e.

flux@3d6f47ded is included within this version of Flux. Therefore
we can now include the "group_*" tests.

* feat: upgrade flux to 0.186.0

* feat: upgrade flux to 0.187.0

* feat: upgrade flux to 0.188.0

* fix: re-run ./generate.sh with updated protoc

* fix: restrict cores to match CircleCI documentation

Co-authored-by: davidby-influx <dbyrne@influxdata.com>
Co-authored-by: Markus Westerlind <marwes91@gmail.com>
Co-authored-by: Sean Brickley <sean@wabr.io>
Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
Co-authored-by: Christopher M. Wolff <chris.wolff@influxdata.com>

---------

Co-authored-by: Brandon Pfeifer <bpfeifer@influxdata.com>
Co-authored-by: davidby-influx <dbyrne@influxdata.com>
Co-authored-by: Markus Westerlind <marwes91@gmail.com>
Co-authored-by: Sean Brickley <sean@wabr.io>
Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
Co-authored-by: Christopher M. Wolff <chris.wolff@influxdata.com>
2023-03-03 10:05:05 -05:00
davidby-influx 7d3efe1e9e
fix: avoid compaction queue stats flutter. (#22195)
When the compaction planner runs, if it cannot acquire
a lock on the files it plans to compact, it returns a
nil list of compaction groups. This, in turn, sets the
engine statistics for compactions queues to zero,
which is incorrect. Instead, use the length of pending
files which would have been returned.

closes https://github.com/influxdata/influxdb/issues/22138
2021-08-16 09:21:07 -07:00
davidby-influx 73bdb2860e
chore: add logging to compaction (#21707)
Compaction logging will generate intermediate information on 
volume of data written and output files created, as well as 
improve some of the anti-entropy messages related to compaction.

This will also apply to `influx_tools compact`

Closes https://github.com/influxdata/influxdb/issues/21704
2021-06-16 15:28:44 -07:00
Ayan George c0e391935d
fix(tsdb): address staticcheck warning SA4006 (#18127)
This commit addresses staticcheck warning SA4006 "this value of VAR is
never used" by removing unused variables
2020-05-18 13:11:44 -04:00
Ben Johnson 3f59d4be8e fix(tsdb): Fix error lists 2020-04-01 14:54:39 -06:00
tmgordeeva f1d26652e9
fix(storage): skip TSM files with block read errors (#15885)
* fix(storage): skip TSM files with block read errors

When we find a bad TSM file during compaction, propagate the error up and move
the bad file aside. The engine will disregard the file so the next compaction
will not hit the same error.
2019-12-13 15:05:39 -08:00
Stuart Carnie 86734e7fcd
fix(storage): Don't panic when length of source slice is too large
StringArrayEncodeAll will panic if the total length of strings
contained in the src slice is > 0xffffffff. This change adds a unit
test to replicate the issue and an associated fix to return an error.

This also raises an issue that compactions will be unable to make
progress under the following condition:

* multiple string blocks are to be merged to a single block and
* the total length of all strings exceeds the maximum block size that
  snappy will encode (0xffffffff)

The observable effect of this is errors in the logs indicating a
compaction failure.

Fixes #13687
2019-06-04 17:05:01 -07:00
Ben Johnson 2dd913d71b
Revert "Limit force-full and cold compaction size."
This reverts commit 40db64d0b9.
2019-02-11 11:07:44 -07:00
Ben Johnson 40db64d0b9
Limit force-full and cold compaction size.
This commit limits the number of files that can be compacted in
a single group when forcing a full compaction or when a shard
becomes cold. This is to prevent too many files being compacted
at the same time.
2018-12-05 10:18:56 -07:00
Stuart Carnie 5d083887a5 chore: Compactor test which replicates issue #10465
Due to an encoding bug with simple8b, it is possible that the
MaxTime for a TSM index entry does not match the last encoded timestamp.
2018-11-14 09:13:13 -07:00
Jacob Marble 0dc5393441 tsm/cache: Remove unused function parameter 2018-06-13 15:22:37 -07:00
Jacob Marble bb313765e4 tsdb/tsm1: Clean up TSM filename format/parse 2018-05-29 09:57:48 -07:00
Jeff Wendling ce565965a4 tsdb: avoid nil checks on the observer
this avoids nil panics in the case that someone eventually forgets.
2018-05-23 13:15:41 -06:00
Jeff Wendling 1a8931af42
Merge pull request #9841 from influxdata/jmw-ensure-no-race-conditions
tsm1: ensure some race conditions are impossible
2018-05-16 11:56:10 -06:00
Jeff Wendling 7d2bb19b74 tsm1: ensure some race conditions are impossible
The InUse call on TSMFiles is inherently racy in the presence of
Ref calls outside of the file store mutex. In addition, we return
some TSMFiles to callers without them being Ref'd which might allow
them to be closed from underneath. While I believe it is the case
that it would be impossible, as the only thing that gets a handle
externally is compaction, and compaction enforces that only one
handle exists at a time, and thus is only deleted once after the
compaction is done with it, it's not very obvious or enforced.

Instead, always return a TSMFile with a Ref call under the read
lock, and require that no one else calls Ref. That way, it cannot
transition to referenced if the InUse call returns false under the
write lock.

The CreateSnapshot method was racy in a number of ways in the presence
of multiple calls or compactions: it did not take references to the
TSMFiles, and the temporary directory it creates could have been
shared with concurrent CreateSnapshot calls. In addition, the
files slice could have been concurrently mutated during a compaction
as well.

Instead, under the write lock, make a local copy of the state for
the compaction, including Ref calls (write locks are implicitly
read locks). Then, there is no need for a lock at all afterward.

Add some comments to explain these issues at the call sites of InUse,
and document that the Files method that returns the slice unprotected
is only for tests.
2018-05-14 19:45:42 -06:00
Ben Johnson 35a64dee99
Inject tsm file naming. 2018-05-14 10:46:38 -06:00
Stuart Carnie 0f6e6fb9ef
Merge pull request #9192 from influxdata/sgc-writer
ensure tsmWriter#Write returns ErrMaxBlocksExceeded
2018-02-01 15:39:01 -07:00
Hans Petter Bieker 7a273ccdb5 Fixed issue where compacting did not sort when block are unsorted and overlapping. 2018-01-04 15:25:26 +01:00
hpbieker ee185e18b7 Added unit test TestCompactor_Compact_UnsortedBlocks. 2018-01-03 09:42:36 +01:00
Jason Wilder 2d85ff1d09 Adjust compaction planning
Increase level 1 min criteria, fix only fast compactions getting run,
and fix very large generations getting included in optimize plans.
2017-12-14 22:41:34 -07:00
Jason Wilder 7dc5327a0a Adjust snapshot concurrency by latency
This changes the approach to adjusting the amount of concurrency
used for snapshotting to be based on the snapshot latency vs
cardinality.  The cardinality approach could use too much concurrency
and increase the number of level 1 TSM files too quickly which incurs
more disk IO.

The latency model seems to adjust better to different workloads.
2017-12-13 13:17:56 -07:00
Jason Wilder 0a85ce2b73 Schedule compactions less aggressively
This runs the scheduler every 5s instead of every 1s as well as reduces
the scope of a level 1 plan.
2017-12-06 13:45:43 -07:00
Stuart Carnie fffd123646 update unit test 2017-12-01 12:19:28 -07:00
Jason Wilder b6096414c2 Fix compactions aborting early
If there were many individual deletes to a series that ended up
deleting every value in the block and the tombstone timestamps
were not contigous, it was possible for the TSMKeyIterator to
return false for Next incorrectly.  This causes the compaction to
drop any remaining data in the file.

Normally, if all the data is deleted via tombstones, we remove the
whole key from the TSM index.  In this case, we're not able to determine
that the key is fully deleted until the block is decode and tombstones
are applied.

This changes the TSMKeyIterator to detect this condition and continue
to the next key instead of aborting.
2017-11-30 14:38:09 -07:00
Jason Wilder 97e0d496a6 Add capability to force a full compaction
This adds the capability to the engine to force a full compaction
to be scheduled.  When called, it snapshots any data in the cache,
aborts running compactions and prevents level plans from returning
level plans.
2017-11-15 07:14:27 -07:00
Jason Wilder 17bae05370 Allow buffering tombstones before writing to disk 2017-11-13 08:48:03 -07:00
Jason Wilder 06226d6fd3 Handle orphan lower level TSM files during full planning
Some files seem to get orphan behind higher levels.  This causes
the compactions to get blocked as the lowere level files will not
get picked up by their lower level planners.  This allows the full
plan to identify them and pull them into their plans.
2017-10-04 08:13:14 -06:00
Jason Wilder 0c0505881f Remove multiple file skipping for full compaction planning
This check doesn't make sense for high cardinality data as the files
typically get big and sparse very quickly.  This causes a lot of extra
disk space to be used which is taken up by large indexes and sparse
data.
2017-10-03 10:48:14 -06:00
Jason Wilder 4ff4ba0841 Use first file in generation for level
With higher cardinality or larger series keys, the files can roll
over early which causes them to take longer to be compacted by higher
levels.  This causes larger disk usage and higher numbers of tsm files
at times.
2017-10-03 10:48:14 -06:00
Jason Wilder 1610ae5727 Don't return tsm files part of a compaction plan 2017-10-03 10:48:13 -06:00
Jason Wilder ddeba2c86b Split large snapshots and write concurrently 2017-09-19 15:27:25 -06:00
Jason Wilder 7388eb9499 Use disk when writing TSM index 2017-09-11 15:29:25 -06:00
Jason Wilder d3e832b462 Use offheap memory for indirect index offsets slice 2017-09-11 15:29:25 -06:00
Jason Wilder 91eb9de341 Use existing TSMReader from file store during compactions
Compactions would create their own TSMReaders for simplicity. With
very high cardinality compactions, creating the reader and indirectIndex
can start to use a significant amount of memory.

This changes the compactions to use a reader that is already allocated
and managed by the FileStore.
2017-09-11 15:29:25 -06:00
Jason Wilder 778000435a Conver all keys from string to []byte in TSM engine
This switches all the interfaces that take string series key to
take a []byte.  This eliminates many small allocations where we
convert between to two repeatedly.  Eventually, this change should
propogate futher up the stack.
2017-07-28 11:00:50 -06:00
Jason Wilder 18a02d50d7 Interrupt in progress TSM compactions
When snapshots and compactions are disabled, the check to see if
the compaction should be aborted occurs in between writing to the
next TSM file.  If a large compaction is running, it might take
a while for the file to be finished writing causing long delays.

This now interrupts compactions while iterating over the blocks to
write which allows them to abort immediately.
2017-07-27 15:58:56 -06:00
Jason Wilder 4d002bb370 Limit concurrent compactions within a shard
This changes full compactions within a shard to run sequentially
instead of running all the compaction groups in parallel.  Normally,
there is only 1 full compaction group to run.  At times, there could
be several which causes instability if they are all running concurrently
as they tie up a cpu for long periods of time.

Level compactions are also capped to a max of 4 concurrently running for each level
in a shard.  This prevents sudden spikes in CPU and disk usage due to a large backlog
of tsm files at a given level.
2017-05-12 14:05:24 -06:00
Jason Wilder 3d1c0cd981 Don't return compaction plans for files already part of a plan
The compactor prevents the same file from being compacted by different
compaction runs, but it can result in warning errors in the logs that
are confusing.

This adds compaction plan tracking to the planner so that files are
only part of one plan at a given time.
2017-05-03 16:31:57 -06:00
Jason Wilder 675d7c9d65 Merge branch '1.2' into jw-merge12 2017-03-06 11:09:05 -07:00
Jason Wilder eab012ef61 Fix points missing after compaction
If blocks containing overlapping ranges of time where partially
recombined, it was possible for the some points to get dropped
during compactions.  This occurred because the window of time of
the points we need to merge did not account for the partial blocks
created from a prior merge.

Fixes #8084
2017-03-06 10:17:11 -07:00
Edd Robinson 292b30b82b Fix subtle bugs and remove dead code from tsdb 2017-01-17 09:47:34 -08:00
Cory LaNou 3c518f8927
panicing is bad -> error returns are good 2017-01-03 14:28:29 -06:00
Jason Wilder 1a35c0a3fc Fix neverending full compactions
The full compaction planner could return a plan that only included
one generation.  If this happened, a full compaction would run on that
generation producing just one generation again.  The planner would then
repeat the plan.

This could happen if there were two generations that were both over
the max TSM file size and the second one happened to be in level 3 or
lower.

When this situation occurs, one cpu is pegged running a full compaction
continuously and the disks become very busy basically rewriting the
same files over and over again.  This can eventually cause disk and CPU
saturation if it occurs with more than one shard.

Fixes #7074
2016-09-03 17:35:14 -06:00
Jason Wilder ded6e40d47 Remove lastPlanCheck var
This causes full compactions to not run if the server is running, but
after a restart they do run.
2016-07-26 12:58:25 -06:00
Jason Wilder 0264966f5c Add index optimize planning step
For larger datasets, it's possible for shards to get into a state where
many large, dense TSM files exist.  While the shard is still hot for
writes, full compactions will skip these files since they are already
fairly optimized and full compactions are expensive.  If the write volume
is large enough, the shard can accumulate lots of these files.  When
a file is in this state, it's index can contain every series which
causes startup times to increase since each file must parse the full
set of series keys for every file.  If the number of series is high,
the index can be quite large causing large amount of disk IO at startup.

To fix this, a optmize compaction is run when a full compaction planning
step decides there is nothing to do.  The optimize compaction combines
and spreads the data and series keys across all files resulting in each
file containing the full series data for that shard and a subset of the
total set of keys in the shard.

This allows a shard to only store a series key once in the shard reducing
storage size as well allows a shard to only load each key once at startup.
2016-07-14 11:32:36 -06:00
Jason Wilder 5ee20e04a8 Fix compaction level planner
Large files created early in the leveled compactions could cause
a shard to get into a bad state.  This reworks the level planner
to handle those cases as well as splits large compactions up into
multiple groups to leverage more CPUs when possible.
2016-07-14 11:14:09 -06:00
Jason Wilder 1ff8ecf4fb Add ability to disable shards
Disabling a shard causes all writes and queries to a shard to return
an error.  This also disables compactions for the shard.
2016-05-31 10:51:54 -06:00
Jason Wilder 7d50970631 Fix continous compaction edge case
The level planner would keep including the same TSM files to be
recompacted even if they were already quite compacted and split
across several TSM files.

Fixes #6683
2016-05-25 10:36:24 -06:00
Jason Wilder f2bcf9d9ab Code review fixes 2016-05-18 15:25:56 -06:00
Jason Wilder e859141b75 Speed up tests
Switched the max keys test to write int64 of the same value so RLE
would kick in and the file size will be smaller (84MB vs 3.8MB).

Removed the chunking test which was skipped because the code will
not downsize a block into smaller chunks now.

Skip MaxKeys tests in various environments because it needs to
write too much data to run reliably.
2016-05-18 15:25:56 -06:00