Fix issue that can cause the retention service to hang waiting on a
`Shard.Close` call. When this occurs, no other shards will be deleted
by the retention service. This is usually noticed as an increase in
disk usage because old shards are not cleaned up.
The fix adds to new methods to `Store`, `SetShardNewReadersBlocked`
and `InUse`. `InUse` can be used to poll if a shard has active readers,
which the retention service uses to skip over in-use shards to prevent
the service from hanging. `SetShardNewReadersBlocked` determines if
new read access may be granted to a shard. This is required to prevent
race conditions around the use of `InUse` and the deletion of shards.
If the retention service skips over a shard because it is in-use, the
shard will be checked again the next time the retention service is run.
It can be deleted on subsequent checks if it is no longer in-use. If
the shards is stuck in-use, the retention service will not be able to
delete the shards, which can be observed in the logs for manual
intervention. Other shards can still be deleted by the retention service
even if a shard is stuck with readers.
closes: #25063
(cherry picked from commit b4bd607eef)
* feat: upgrade flux to 0.171.0
Tests failing, safety commit
First step in https://github.com/influxdata/influxdb/issues/23815
* fix: remove "org" parameter" from writeOptSource
I attempted to implement the "orgOpt" argument in a similar fashion
to f6669f7512. However, it looks like Flux doesn't accept "org" as
a parameter to "load". It responds with:
Error calling function \"load\" @113:16-113:30: error calling function \"to\" @6:19-6:47: unused arguments [org]
This brings us from 194 passing to 570 passing.
* fix: temporarily disable broken flux tests
These tests expect rows to be stored in a certain order. However,
nothing is specifying the sort order. This has been fixed in a
later update to flux: (see 3d6f47ded).
Temporarily disable these tests until we include a fixed
version of the flux tests.
* chore: add tests from a492993012
This fixes "test-flux.sh" so it runs tests within the "flux/"
directory. This uncovered some other issues with the tests
located within "flux/". These also needed to be updated
to match the newer flux API.
* feat: upgrade flux to 0.172.0
This includes changes made in "cbbf4b27da". Since "test.go" in 2.x
diverged from 1.x, some modifications were required to make this
compatible.
* feat: upgrade flux to 0.173.0
* feat: upgrade flux to v0.174.0
* fix: Update the condition when reseting cursor (#23522)
Filters that contain `or` may change between cursor resets so we must remember to update the condition in the read cursor.
```flux
|> filter(fn: (r) => ((r["_field"] == "field1" and r["_value"]==true) or (r["_field"] == "field2" and r["_value"] == false)))
```
Closes https://github.com/influxdata/flux/issues/4804
* feat: upgrade flux to 0.174.1
* feat: upgrade flux to 0.175.0
* chore: remove end-to-end tests
These were removed in a492993 for 2.x. These tests prevent "go test ./..."
from completing. As stated in the original commit, these tests should now be
handled by the "fluxtest" harness.
* feat: upgrade flux to 0.176.0
Some tests needed to be disabled within the flux harness. This is a
result of enabling "Optimize Aggregate Window" in flux@05a1065f.
These tests are not present in 2.x. Therefore, I am unsure if
the breakage is resolved in a later commit.
* feat: upgrade flux to 0.177.0
* feat: upgrade flux to 0.178.0
* feat: upgrade flux to v0.179.0
This removes all invocations of "flux.RegisterOpSpec". According
to flux@e39096d5, "flux.RegisterOpSpec" does nothing in the
current version of flux and was removed.
* chore: update fluxtest skip list (#23633)
* chore: manually backport 785a465e9a
This removes the reference to "flux.Spec".
* build(flux): update flux to v0.181.0 (#23682)
* build(flux): update flux to v0.184.2
* chore: skip more Flux acceptance tests
There are issues for each skip detailed in test-flux.sh.
* feat: upgrade flux to v0.185.0
This adds "FluxTesting" to the "HTTPD" configuration. This option is
hidden and disabled by default. When "FluxTesting" is set, it
enables the default testing flags for "Flux".
These flags allow the "vectorized float tests" and tests requiring
the "removeRedundantSortNodes" and "labelPolymorphism" flag
enabled to work. These changes are based off of d8553c002e.
flux@3d6f47ded is included within this version of Flux. Therefore
we can now include the "group_*" tests.
* feat: upgrade flux to 0.186.0
* feat: upgrade flux to 0.187.0
* feat: upgrade flux to 0.188.0
* fix: re-run ./generate.sh with updated protoc
* fix: restrict cores to match CircleCI documentation
Co-authored-by: davidby-influx <dbyrne@influxdata.com>
Co-authored-by: Markus Westerlind <marwes91@gmail.com>
Co-authored-by: Sean Brickley <sean@wabr.io>
Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
Co-authored-by: Christopher M. Wolff <chris.wolff@influxdata.com>
Partially implements the /v2/api/buckets
POST for create a bucket
DELETE for deleting a bucket
GET for listing buckets
GET for retrieving one bucket
PATCH for modifying a bucket
See here for API details:
https://docs.influxdata.com/influxdb/cloud/api/#tag/Buckets
* fix(storage): Detect need for descending cursor in WindowAggregate
* test: add tests for bare aggregate pushdowns
* test: add test cases for window aggregate pushdowns
* test: add tests for aggregate-by-time (aggregateWindow pushdown)
Co-authored-by: Sean Brickley <sean@wabr.io>
This is a backport of #14262 to the 1.x storage engine.
This also ports the table tests that existed with the pre-beta version of the
storage engine to the one that is now used in the production version.
A few of the tests are skipped. These are portions of the storage engine
that have not been ported over. They should be unskipped when that
functionality is ported over.
Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
* test: add script to run flux tests
* feat(flux): enable test capabilities in Flux controller
* feat(flux): add MergeFiltersRule
* build: bump existing Dockerfiles to go 1.15
* build: add flux tests to CI
* refactor: allow for overriding tcp.Mux logger
* build: upgrade to Flux v0.111.0
* chore: Update flux to 0.67
* chore: Builds against 0.68 flux
* chore: Builds against 0.80.0
* chore: Builds against 0.90.0
* chore: Everything builds on latest flux
* chore: goimports fixed
* chore: fix tests locally
* chore: fix CI dockerfiles
* chore: clean up some unused code
* chore: remove flux repl and Spec in flux query json
* chore: port flux end to end tests from 2.x
* chore: fix up goimports
* chore: remove 32 bit build support
Meta queries (SHOW TAG VALUES, SHOW TAG KEYS, SHOW SERIES CARDINALITY, etc.) do not respect
the QueryTimeout config parameter. Meta queries should check the query context when possible
to allow cancellation and timeout. This will not be as frequent as regular queries, which
use iterators, because meta queries return data in batches.
Add a context.Context to
(*Store).MeasurementNames()
(*Store).MeasurementsCardinality()
(*Store).SeriesCardinality()
(*Store).TagValues()
(*Store).TagKeys()
(*Store).SeriesSketches()
(*Store).MeasurementsSketches()
which is tested for timeout or cancellation
to allow limitation of time spent in meta queries
https://github.com/influxdata/influxdb/issues/20736
The flux in influxdb has been upgraded to use v0.33.2. A lot of
interfaces for the storage engine were changed during this so code had
to change to accomodate the new interfaces and remove the old ones.
Included in this commit is a patch file for the changes that were made.
A patch was generated for the following packages:
* `flux/stdlib/influxdata/influxdb`
* `storage/reads`
* `tsdb/cursors`
These are the three packages that are in common with version 2 of the
database and the first of these packages contains the specific
implementations that are used for version 1.
It is very possible that the next time we upgrade this, the patch will
not apply cleanly just like it wouldn't have applied cleanly to this
update. The patch is mostly meant to document exactly what changed
during the copy over to help ensure we don't forget things when adapting
the interfaces.
Add a patch file to hopefully make this easier in the future
This integrates the influxdb 1.x series to the latest version of Flux
and updates the code to use it. It also removes the dependency on
platform and copies the necessary code from storage into the 1.x series
so the dependency is unneeded.
The flux functions specific to 1.x have been moved to the same structure
that flux changed to with having a `stdlib` directory instead of a
`functions` directory. It also adds a `databases()` function that
returns the databases from the meta client.
This commit deletes most of the code to service reads from influxdb
and pulls it in from platform instead.
Of note, the models.Tag and models.Tags types are now aliases to the
platform models.Tag and models.Tags types. Additionally, many types
in the tsdb package relating to cursors are also aliases to the same
types in the platform cursors package.
This updates the platform and flux repos to the current master in the
Gopkg.lock.
* the protocol service definition, ReadRequest and ReadResponse is
reused across projects, rather than requiring redefinition.
* the ReadRequest protocol buffer definition removes the concept of a
database and retention policy, replacing it with a field named
ReadSource of type google.protobuf.Any. OSS requests will use the
ReadSource message structure defined in local to this package, which
defines fields to represent a Database and RetentionPolicy. Other
implementations can provide their own data structure allowing the
remainder of the ReadRequest to be reused.
* The RPC service and Store are expected to be redefined to handle their
specific requirements for resolving a ReadSource
* ResultSet and GroupResultSet are interfaces representing non-grouping
and grouping read behavior respectively. Calling NewResultSet or
NewGroupResultSet will construct instances of these types
* The ResponseWriter type is exported to deal with serialization of
the ResultSet and GroupResultSet types
Since possibly v0.9 DELETE SERIES has had the unwanted side effect of
removing series from the index when the last traces of series data are
removed from TSM. This occurred because the inmem index was rebuilt on
startup, and if there was no TSM data for a series then there could be
not series to add to the index.
This commit returns to the original (documented) DROP/DETETE SERIES
behaviour. As such, when issuing DROP SERIES all instances of matching
series will be removed from both the TSM engine and the index. When
issuing DELETE SERIES only TSM data will be removed.
It is up to the operator to remove series from the index.
NB, this commit does not address how to remove series data from the
series file when a shard rolls over.
The previous sha was taken from a revision on a devel branch that I
thought would continue staying in the tree after it was merged. That
revision was rebased away and the API was changed for the logger.
This updates the usage of the logger and adds a simple package for
constructing the base logger.
The 1.0 version of zap changed the format of the default console logger
so this change moves over to this new logger instead of attempting to
retain backwards compatibility with the old format.
This commit adds time support to SHOW TAG VALUES. Time can be used as
both a lower and upper boundary. However, there are some caveats.
For the `inmem` index, filtering by time will still return all results
because the index data is shared across shards.
For the `tsi1` index, filtering by time will only work down to the shard
lever. Specifically, when querying by time all shards within that time
range will be used to generate the results.
Fixes#8819.
Previously, the process of dropping expired shards according to the
retention policy duration, was managed by two independent goroutines in
the retention policy service. This behaviour was introduced in #2776,
at a time when there were both data and meta nodes in the OSS codebase.
The idea was that only the leader meta node would run the meta data
deletions in the first goroutine, and all other nodes would run the
local deletions in the second goroutine.
InfluxDB no longer operates in that way and so we ended up with two
independent goroutines that were carrying out an action that was really
dependent on each other.
If the second goroutine runs before the first then it may not see the
meta data changes indicating shards should be deleted and it won't
delete any shards locally. Shortly after this the first goroutine will
run and remove the meta data for the shard groups.
This results in a situation where it looks like the shards have gone,
but in fact they remain on disk (and importantly, their series within
the index) until the next time the second goroutine runs. By default
that's 30 minutes.
In the case where the shards to be removed would have removed the last
occurences of some series, then it's possible that if the database was already at its
maximum series limit (or tag limit for that matter), no further new series
can be inserted.