Stop noisy logging about phantom shards that do not belong to the
current node by checking the shard ownership before logging about the
phantom shard. Note that only the logging was inaccurate. This did not
accidentally remove shards from the metadata that weren't really phantom
shards due to checks in `DropShardMetaRef` implementations.
closes: #26525
* feat(tsdb): Adds shard opening progress checks to startup
This PR adds a check to see how many shards are remaining
vs how many shards are opened. This change displays the percent
completed too.
closesinfluxdata/feature-requests#476
* fix: prevent retention service creating orphaned shard files
Under certain circumstances, the retention service can fail to delete shards from
the store in a timely manner. When the shard groups are pruned based on age, this
leaves orphaned shard files on the disk. The retention service will then not attempt
to remove the obsolete shard files because the meta store does not know about them.
This can cause excessive disk space usage for some users.
This corrects that by requiring shards files be deleted before they can be removed
from the meta store.
fixes: #24529
* chore: upgrade Go to 1.19.3
This re-runs ./generate.sh and ./checkfmt.sh to format and update
source code (this is primarily responsible for the huge diff.)
* fix: update tests to reflect sorting algorithm change
* feat: upgrade flux to 0.171.0
Tests failing, safety commit
First step in https://github.com/influxdata/influxdb/issues/23815
* fix: remove "org" parameter" from writeOptSource
I attempted to implement the "orgOpt" argument in a similar fashion
to f6669f7512. However, it looks like Flux doesn't accept "org" as
a parameter to "load". It responds with:
Error calling function \"load\" @113:16-113:30: error calling function \"to\" @6:19-6:47: unused arguments [org]
This brings us from 194 passing to 570 passing.
* fix: temporarily disable broken flux tests
These tests expect rows to be stored in a certain order. However,
nothing is specifying the sort order. This has been fixed in a
later update to flux: (see 3d6f47ded).
Temporarily disable these tests until we include a fixed
version of the flux tests.
* chore: add tests from a492993012
This fixes "test-flux.sh" so it runs tests within the "flux/"
directory. This uncovered some other issues with the tests
located within "flux/". These also needed to be updated
to match the newer flux API.
* feat: upgrade flux to 0.172.0
This includes changes made in "cbbf4b27da". Since "test.go" in 2.x
diverged from 1.x, some modifications were required to make this
compatible.
* feat: upgrade flux to 0.173.0
* feat: upgrade flux to v0.174.0
* fix: Update the condition when reseting cursor (#23522)
Filters that contain `or` may change between cursor resets so we must remember to update the condition in the read cursor.
```flux
|> filter(fn: (r) => ((r["_field"] == "field1" and r["_value"]==true) or (r["_field"] == "field2" and r["_value"] == false)))
```
Closes https://github.com/influxdata/flux/issues/4804
* feat: upgrade flux to 0.174.1
* feat: upgrade flux to 0.175.0
* chore: remove end-to-end tests
These were removed in a492993 for 2.x. These tests prevent "go test ./..."
from completing. As stated in the original commit, these tests should now be
handled by the "fluxtest" harness.
* feat: upgrade flux to 0.176.0
Some tests needed to be disabled within the flux harness. This is a
result of enabling "Optimize Aggregate Window" in flux@05a1065f.
These tests are not present in 2.x. Therefore, I am unsure if
the breakage is resolved in a later commit.
* feat: upgrade flux to 0.177.0
* feat: upgrade flux to 0.178.0
* feat: upgrade flux to v0.179.0
This removes all invocations of "flux.RegisterOpSpec". According
to flux@e39096d5, "flux.RegisterOpSpec" does nothing in the
current version of flux and was removed.
* chore: update fluxtest skip list (#23633)
* chore: manually backport 785a465e9a
This removes the reference to "flux.Spec".
* build(flux): update flux to v0.181.0 (#23682)
* build(flux): update flux to v0.184.2
* chore: skip more Flux acceptance tests
There are issues for each skip detailed in test-flux.sh.
* feat: upgrade flux to v0.185.0
This adds "FluxTesting" to the "HTTPD" configuration. This option is
hidden and disabled by default. When "FluxTesting" is set, it
enables the default testing flags for "Flux".
These flags allow the "vectorized float tests" and tests requiring
the "removeRedundantSortNodes" and "labelPolymorphism" flag
enabled to work. These changes are based off of d8553c002e.
flux@3d6f47ded is included within this version of Flux. Therefore
we can now include the "group_*" tests.
* feat: upgrade flux to 0.186.0
* feat: upgrade flux to 0.187.0
* feat: upgrade flux to 0.188.0
* fix: re-run ./generate.sh with updated protoc
* fix: restrict cores to match CircleCI documentation
Co-authored-by: davidby-influx <dbyrne@influxdata.com>
Co-authored-by: Markus Westerlind <marwes91@gmail.com>
Co-authored-by: Sean Brickley <sean@wabr.io>
Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
Co-authored-by: Christopher M. Wolff <chris.wolff@influxdata.com>
Dump all active queries to the log when a SIGTERM
is received and the termination-query-log flag is
true in the coordinator section of the config. The
default is false.
* chore: harmonize OSS and Enterprise subscriber initialization
* refactor: chanwriter close blocks until chanwriter is flushed
* refactor: remove redundant state
* test: add test for blocked subscriber during subscription update
* fix: blocked subscriber does not fail subscription update
* fix: only iterate matching subscriptions
* fix: use serialized points in subscriber queues
* feat: add total-buffer-bytes config parameter to subscriptions
* fix: put subscription serialization on the write path
* fix: subscription service only needs regular mutex, not RWMutex
* fix: review comments
* chore: update changelog
* feat: make flux controller limits configurable
A sample of the new config:
```
[flux-controller]
query-concurrency = 0
query-initial-memory-bytes = 0
query-max-memory-bytes = 0
total-max-memory-bytes = 0
query-queue-size = 0
```
Also use the prometheus metrics in debug/vars, here is a sample:
```
"query_control_all_active": {"name":"query_control_all_active","tags":null,"values":{"gauge":0}},
"query_control_all_duration_seconds": {"name":"query_control_all_duration_seconds","tags":null,"values":{"0.001":0,"0.005":0,"0.025":0,"0.125":0,"0.625":0,"15.625":2,"3.125":2,"count":2,"sum":2.9953034240000003}},
"query_control_compiling_active": {"name":"query_control_compiling_active","tags":null,"values":{"gauge":0}},
"query_control_compiling_duration_seconds": {"name":"query_control_compiling_duration_seconds","tags":null,"values":{"0.001":2,"0.005":2,"0.025":2,"0.125":2,"0.625":2,"15.625":2,"3.125":2,"count":2,"sum":0.0010411650000000001}},
"query_control_executing_active": {"name":"query_control_executing_active","tags":null,"values":{"gauge":0}},
"query_control_executing_duration_seconds": {"name":"query_control_executing_duration_seconds","tags":null,"values":{"0.001":0,"0.005":0,"0.025":0,"0.125":0,"0.625":0,"15.625":2,"3.125":2,"count":2,"sum":2.994032791}},
"query_control_memory_unused_bytes": {"name":"query_control_memory_unused_bytes","tags":null,"values":{"gauge":0}},
"query_control_queueing_active": {"name":"query_control_queueing_active","tags":null,"values":{"gauge":0}},
"query_control_queueing_duration_seconds": {"name":"query_control_queueing_duration_seconds","tags":null,"values":{"0.001":2,"0.005":2,"0.025":2,"0.125":2,"0.625":2,"15.625":2,"3.125":2,"count":2,"sum":0.000087963}},
"query_control_requests_total": {"name":"query_control_requests_total","tags":null,"values":{"counter":1}},
"query_control_requests_total:1": {"name":"query_control_requests_total","tags":null,"values":{"counter":1}}
```
* chore: update changelog
* fix: shorten metric names for query control
* fix: zaptest logger and goimports
* fix: races in the query controller
Previously some tests were failing due to logging after the end of the test.
* chore: pull in controller from 2.x
* chore: fix up 2.x controller to work with 1.x
* feat: Default query limits in flux code
Partial fix of https://github.com/influxdata/influxdb/issues/17212
* chore: update changelog
* chore: refactor to remove panic and reformat code
* test: add script to run flux tests
* feat(flux): enable test capabilities in Flux controller
* feat(flux): add MergeFiltersRule
* build: bump existing Dockerfiles to go 1.15
* build: add flux tests to CI
* refactor: allow for overriding tcp.Mux logger
* build: upgrade to Flux v0.111.0
* chore: Update flux to 0.67
* chore: Builds against 0.68 flux
* chore: Builds against 0.80.0
* chore: Builds against 0.90.0
* chore: Everything builds on latest flux
* chore: goimports fixed
* chore: fix tests locally
* chore: fix CI dockerfiles
* chore: clean up some unused code
* chore: remove flux repl and Spec in flux query json
* chore: port flux end to end tests from 2.x
* chore: fix up goimports
* chore: remove 32 bit build support
Meta queries (SHOW TAG VALUES, SHOW TAG KEYS, SHOW SERIES CARDINALITY, etc.) do not respect
the QueryTimeout config parameter. Meta queries should check the query context when possible
to allow cancellation and timeout. This will not be as frequent as regular queries, which
use iterators, because meta queries return data in batches.
Add a context.Context to
(*Store).MeasurementNames()
(*Store).MeasurementsCardinality()
(*Store).SeriesCardinality()
(*Store).TagValues()
(*Store).TagKeys()
(*Store).SeriesSketches()
(*Store).MeasurementsSketches()
which is tested for timeout or cancellation
to allow limitation of time spent in meta queries
https://github.com/influxdata/influxdb/issues/20736
When a SELECT INTO query generates an illegal value that cannot be inserted,
like +/- Inf, it should return an error, rather than failing silently.
This adds a boolean parameter to the [data] section of influxdb.conf:
* strict-error-handling
When false, the default, the old behavior is preserved. When true,
unsupported values will return an error from SELECT INTO queries
Fixes https://github.com/influxdata/influxdb/issues/20426
refactor(influxdb): Refactor trace code for clarity and reliability
* Make startProfile a method of Server. startProfile() is only used one
place. We bother to copy the Server.CPUProfile and Server.MemProfile
values out of our Server struct into it's parameters. It makes more
sense to make startProfile() a method of Server and have it access
those members directory via its receiver value.
* Have startProfile() return an error instead of log.Fatal()ing. We can
simply propagate the error up the stack and let the caller handle the
error -- we shouldn't be exiting deep in the bowels of a non-main
package.
* Capture and return errors from pprof.StartCPUProfile(). Currently
there is only one possible error it can return but if it returns an
error, we should handle it.
* add CPUProfileWriteCloser and MemProfileWriteCloser to Server struct.
* make stopProfile() a method of Server
* remove prof variable
* fix captialization of log messages.
This PR lazily initializes Flux built-in functions when Flux is used.
It significantly reduces the startup time of the `influxd` and `influx`
binaries.
Before (4.66s):
```
↳ time bin/18/influx
bin/18/influx 4.66s user 0.19s system 198% cpu 2.441 total
```
After (10ms):
```
↳ time bin/18/influx
bin/18/influx 0.01s user 0.01s system 88% cpu 0.021 total
```
This integrates the influxdb 1.x series to the latest version of Flux
and updates the code to use it. It also removes the dependency on
platform and copies the necessary code from storage into the 1.x series
so the dependency is unneeded.
The flux functions specific to 1.x have been moved to the same structure
that flux changed to with having a `stdlib` directory instead of a
`functions` directory. It also adds a `databases()` function that
returns the databases from the meta client.
* Add AuthorizeDatabase API to QueryAuthorizer to verify a user has
appropriate access to the specified database
* Update serverFluxQuery handler to require a meta.User when auth is
enabled
* update Flux createFromSource and createBucketsSource dependencies to
require Authorizer when auth is enabled in configuration
* update createFromSource to verify read permissions for each bucket
specified in a Flux query
* update BucketsDecoder, which implements the buckets() Flux function,
to return buckets that the user has read or write permissions to
* add unit tests to verify authentication is required for Flux HTTP
requests when auth is enabled in configuration
This commit deletes most of the code to service reads from influxdb
and pulls it in from platform instead.
Of note, the models.Tag and models.Tags types are now aliases to the
platform models.Tag and models.Tags types. Additionally, many types
in the tsdb package relating to cursors are also aliases to the same
types in the platform cursors package.
This updates the platform and flux repos to the current master in the
Gopkg.lock.
* the protocol service definition, ReadRequest and ReadResponse is
reused across projects, rather than requiring redefinition.
* the ReadRequest protocol buffer definition removes the concept of a
database and retention policy, replacing it with a field named
ReadSource of type google.protobuf.Any. OSS requests will use the
ReadSource message structure defined in local to this package, which
defines fields to represent a Database and RetentionPolicy. Other
implementations can provide their own data structure allowing the
remainder of the ReadRequest to be reused.
* The RPC service and Store are expected to be redefined to handle their
specific requirements for resolving a ReadSource
* ResultSet and GroupResultSet are interfaces representing non-grouping
and grouping read behavior respectively. Calling NewResultSet or
NewGroupResultSet will construct instances of these types
* The ResponseWriter type is exported to deal with serialization of
the ResultSet and GroupResultSet types
* Update Prometheus remote write to use metric name as measurement name and value as the field name.
* Update Prometheus remote read to use the storage.Read method to bypass the InfluxQL query engine.
This commit adds `debug-pprof-enabled` which will start the default
`net/http/pprof` endpoint and bind against `localhost:6060`. This
will help to debug startup performance issues.
This code has been duplicated to other projects and its implementations
have grown out of sync. Now the code can live as a package-level
function rather than a method coupled with particular structs.
Remove the `Query` prefix from some structs and interfaces. They were
there so when the query engine was in the same package as influxql,
these would be differentiated. Now that the package name is query, the
extra prefix seems redundant.