Commit Graph

2814 Commits (db/6263/compaction-debug-logging)

Author SHA1 Message Date
Jakub Bednář dbbe4611c0
build(deps): upgrade google.golang.org/protobuf to v1.33.0 (master-1.x) (#24818) 2024-03-26 14:07:28 +01:00
davidby-influx fe6c64b21e
fix: return and respect cursor errors (#24791)
ArrayCursors were ignoring errors, which led to panics when nil
cursors were operated on. This fix passes errors back up the stack
and uses them to enforce healthy cursor creation.

Closes https://github.com/influxdata/influxdb/issues/24789
---------
Co-authored-by: Stuart Carnie <stuart.carnie@gmail.com>
2024-03-25 17:22:33 -07:00
davidby-influx 8ff06d5a92
fix: improved shard deletion (#24602)
Avoid unnecessarily deleting series from the series file
Try harder to delete series from InMem indices
Log all errors on shard deletion

Closes https://github.com/influxdata/influxdb/issues/24834
2024-03-25 17:15:31 -07:00
davidby-influx bc80e881fa
fix: do not panic when empty tags are queried (#24784)
Do not panic if a cursor array is nil and the number
of timestamps is retrieved.

closes https://github.com/influxdata/influxdb/issues/24536
2024-03-18 15:28:29 -07:00
Jack 6af0be9234
fix: panic index out of range for invalid series keys (#24565)
* chore: add scaffolding for naive solution

* feat: test case scaffolding

* fix: implement check for series key before proceeding

* fix: add validation for ReadSeriesKeyMeasurement usage

* refactor: explicit use of series key len

* feat: add remaining check to index

* feat: add check to remaining files

As the Len function is used as part of the parseSeriesKey, this also needs to be accounted for on the nil return from this function as it is used in different contexts

* feat: expand test cases

* chore: go fmt

* chore: update test failure message

* chore: impl feedback on unnecessary sz checks

* feat: expand test cases

* fix: nil series key check

In both sections for index.go there is a pre-existing length check against the series key which should catch invalid values, perhaps this explains why it hasn't cropped up in the reported panics. For even more safety, we can also skip a nil key because we know that subsequent calls will cause a panic where this key is attempted to be used

* fix: remove nil tags check

A key with no tags is valid, so we should not check for BOTH nil key and tags as a key could be nil, which is invalid, yet still have tags and therefore cause the check to pass which we do not want

* feat: extend test cases from feedback

* fix: extend checks for CompareSeriesKeys

* feat: add nilKeyHandler for shared key checking logic

* fix: logical error in nilKeyHandler

Prior to this, the else was always defaulted to at the end of the conditional branch, which causes unexpected behaviour and a failure of a bunch of tests.

* fix: return tags keep nil data

In a recent change to this, we agreed on a simple name == nil check for the actual data. As a follow on to this, I just realised that we don't actually want to nil back the tags, even if they're not checked, because having no tags is a valid input so we can simply return whatever we were passed unchanged.

* fix: use len == 0 for extra safety

* feat: extra test for blank series key
2024-01-23 09:44:29 +00:00
davidby-influx c05b340b72
chore: upgrade flux (#24504)
* chore: upgrade flux

* chore: execute "go generate" inside cross-builder (#24582)

---------

Co-authored-by: Brandon Pfeifer <bpfeifer@influxdata.com>
2024-01-19 17:40:48 -05:00
davidby-influx 969abf3da2
fix: avoid SIGBUS when reading non-std series segment files (#24509)
Some series files which are smaller than the standard
sizes cause SIGBUS in influx_inspect and influxd, because
entry iteration walks onto mapped memory not backed by the
the file.  Avoid walking off the end of the file while
iterating series entries in oddly sized files.

closes https://github.com/influxdata/influxdb/issues/24508

Co-authored-by: Geoffrey Wossum <gwossum@influxdata.com>
2023-12-08 15:46:11 -08:00
davidby-influx 2dc3dcb3d1
fix: do not escape CSV output (#24311)
CSV output is incorrectly escaped.
Add a boolean flag to tag output
functions to prevent this.

closes https://github.com/influxdata/influxdb/issues/24309
2023-06-29 12:00:41 -07:00
davidby-influx 53856cdaae
fix: series file index compaction (#23916)
Series file indices monotonically grew even
when series were deleted.  Also stop 
ignoring error in series index recovery

Partially closes https://github.com/influxdata/EAR/issues/3643
2023-06-01 10:49:23 -07:00
davidby-influx aad79e471f
fix: prevent world-writable MANIFEST files (#24235)
When a new MANIFEST file is created, set
its permissions to 644, not 666

closes https://github.com/influxdata/influxdb/issues/24233
2023-05-18 12:07:34 -07:00
Brandon Pfeifer e484c4d871
chore: upgrade Go to v1.19.3 (1.x) (#23941)
* chore: upgrade Go to 1.19.3

This re-runs ./generate.sh and ./checkfmt.sh to format and update
source code (this is primarily responsible for the huge diff.)

* fix: update tests to reflect sorting algorithm change
2022-11-28 12:15:47 -05:00
davidby-influx fd7e4aa0f7
chore: fix trace message text (#23917) 2022-11-16 08:40:10 -05:00
Brandon Pfeifer 5976e41d54
feat: upgrade flux to v0.188.0 (#23911)
* feat: upgrade flux to 0.171.0

Tests failing, safety commit

First step in https://github.com/influxdata/influxdb/issues/23815

* fix: remove "org" parameter" from writeOptSource

I attempted to implement the "orgOpt" argument in a similar fashion
to f6669f7512. However, it looks like Flux doesn't accept "org" as
a parameter to "load". It responds with:

Error calling function \"load\" @113:16-113:30: error calling function \"to\" @6:19-6:47: unused arguments [org]

This brings us from 194 passing to 570 passing.

* fix: temporarily disable broken flux tests

These tests expect rows to be stored in a certain order. However,
nothing is specifying the sort order. This has been fixed in a
later update to flux: (see 3d6f47ded).

Temporarily disable these tests until we include a fixed
version of the flux tests.

* chore: add tests from a492993012

This fixes "test-flux.sh" so it runs tests within the "flux/"
directory. This uncovered some other issues with the tests
located within "flux/". These also needed to be updated
to match the newer flux API.

* feat: upgrade flux to 0.172.0

This includes changes made in "cbbf4b27da". Since "test.go" in 2.x
diverged from 1.x, some modifications were required to make this
compatible.

* feat: upgrade flux to 0.173.0

* feat: upgrade flux to v0.174.0

* fix: Update the condition when reseting cursor (#23522)

Filters that contain `or` may change between cursor resets so we must remember to update the condition in the read cursor.

```flux
|> filter(fn: (r) => ((r["_field"] == "field1" and r["_value"]==true) or (r["_field"] == "field2" and r["_value"] == false)))
```

Closes https://github.com/influxdata/flux/issues/4804

* feat: upgrade flux to 0.174.1

* feat: upgrade flux to 0.175.0

* chore: remove end-to-end tests

These were removed in a492993 for 2.x. These tests prevent "go test ./..."
from completing. As stated in the original commit, these tests should now be
handled by the "fluxtest" harness.

* feat: upgrade flux to 0.176.0

Some tests needed to be disabled within the flux harness. This is a
result of enabling "Optimize Aggregate Window" in flux@05a1065f.
These tests are not present in 2.x. Therefore, I am unsure if
the breakage is resolved in a later commit.

* feat: upgrade flux to 0.177.0

* feat: upgrade flux to 0.178.0

* feat: upgrade flux to v0.179.0

This removes all invocations of "flux.RegisterOpSpec". According
to flux@e39096d5, "flux.RegisterOpSpec" does nothing in the
current version of flux and was removed.

* chore: update fluxtest skip list (#23633)

* chore: manually backport 785a465e9a

This removes the reference to "flux.Spec".

* build(flux): update flux to v0.181.0 (#23682)

* build(flux): update flux to v0.184.2

* chore: skip more Flux acceptance tests

There are issues for each skip detailed in test-flux.sh.

* feat: upgrade flux to v0.185.0

This adds "FluxTesting" to the "HTTPD" configuration. This option is
hidden and disabled by default. When "FluxTesting" is set, it
enables the default testing flags for "Flux".

These flags allow the "vectorized float tests" and tests requiring
the "removeRedundantSortNodes" and "labelPolymorphism" flag
enabled to work. These changes are based off of d8553c002e.

flux@3d6f47ded is included within this version of Flux. Therefore
we can now include the "group_*" tests.

* feat: upgrade flux to 0.186.0

* feat: upgrade flux to 0.187.0

* feat: upgrade flux to 0.188.0

* fix: re-run ./generate.sh with updated protoc

* fix: restrict cores to match CircleCI documentation

Co-authored-by: davidby-influx <dbyrne@influxdata.com>
Co-authored-by: Markus Westerlind <marwes91@gmail.com>
Co-authored-by: Sean Brickley <sean@wabr.io>
Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
Co-authored-by: Christopher M. Wolff <chris.wolff@influxdata.com>
2022-11-15 15:20:27 -05:00
Sam Arnold 9e9f1be574
fix: remove dead iterator (#23888) 2022-11-09 16:24:01 -05:00
davidby-influx cc26b7653c
fix: remove breaking argument validation for _fieldKeys iterator (#23875)
New argument validation code for _fieldKeys system iterator 
broke Enterprise tests because it is misused all over the 
place. Back out the safety check.
2022-11-09 09:04:44 -08:00
davidby-influx f5da0f50f4
fix: Optimize SHOW FIELD KEY CARDINALITY (#23871)
Use the _fieldKeys system iterator

closes https://github.com/influxdata/influxdb/issues/23840
2022-11-08 08:32:10 -08:00
davidby-influx b17f27a5d9
fix: incorrect error message concatenation (#23729) 2022-09-15 09:26:51 -07:00
davidby-influx 80c10c8c04
feat: optimize saving changes to fields.idx (#23701)
Instead of writing out the complete fields.idx
file when it changes, write out incremental
changes that will be applied to the file on
close and startup.

closes https://github.com/influxdata/influxdb/issues/23653
2022-09-14 13:14:09 -07:00
davidby-influx 84c4f676b0
feat: add type conflict checker to influx_inspect (#23616)
adds two commands "check-schema" and
"merge-schema" to influx_inspect.
These test for field type conflicts
in all fields.idx beneath a directory
and merges the derived schemas if
"check-schema" has been run multiple
times on different directories
2022-08-10 09:36:58 -07:00
davidby-influx eb3cc88772
fix: generalize test for Windows (#23580)
Also eliminate race condition in tests

(cherry picked from commit 7e37a7ad16)
2022-07-21 13:28:10 -07:00
davidby-influx a8732dcf52
fix: restore in-memory Manifest on write error (#23552)
Do not update the `FileSet` or `activeLogFile` field in the in-memory
Partition structure if the Manifest file is not correctly saved to
the disk.

closes https://github.com/influxdata/influxdb/issues/23553
2022-07-20 12:59:15 -07:00
davidby-influx 25cea95beb
fix: add paths to tsi log and index file errors (#23557)
Add paths to various TSI errors on opening and unmarshaling files
to help poinpoint the corrupt files.

Closes https://github.com/influxdata/influxdb/issues/23556
2022-07-19 09:02:20 -07:00
davidby-influx 061cf55f2a
fix: create TSI MANIFEST files atomically (#23539)
When a MANIFEST file is created in TSI, it
should be written to a temp file, then
atomically renamed, to avoid overwriting
the existing file only to fail on the
later write.

closes https://github.com/influxdata/influxdb/issues/23536
2022-07-13 10:11:49 -07:00
davidby-influx a2dd708a26
fix: improve error messages opening index partitions (#23532)
Where possible, add the file path path to any errors
on opening, reading, (un)marshaling, or validating
the various files comprising a partition

closes https://github.com/influxdata/influxdb/issues/23506
2022-07-12 14:22:36 -07:00
davidby-influx a428043f84
fix: lost TSI reference / close TagValueSeriesIDIterator in error case (#23461) (#23462)
(cherry picked from commit 8bd4fc502d)

closes https://github.com/influxdata/influxdb/issues/23460

Co-authored-by: Dane Strandboge <dstrandboge@influxdata.com>
2022-06-16 11:54:04 -07:00
davidby-influx 54ac7e54ed
fix: remember shards that fail Open(), avoid repeated attempts (#23437)
If a shard cannot be opened, store its ID and last error.
Prevent future attempts to open during this invocation of
influxDB. This information is not persisted.

closes https://github.com/influxdata/influxdb/issues/23428
closes https://github.com/influxdata/influxdb/issues/23426
2022-06-13 10:32:47 -07:00
davidby-influx d3db48e93d
fix: fully clean up partially opened TSI (#23430)
When one partition in a TSI fails to open, all previously opened
partitions should be cleaned up, and remaining partitions 
should not be opened

closes https://github.com/influxdata/influxdb/issues/23427
2022-06-10 11:31:29 -07:00
davidby-influx ec412f793b
fix: do not rename files on mmap failure (#23396)
If NewTSMReader() fails because mmap fails, do not
rename the file, because the error is probably
caused by vm.max_map_count being too low

closes https://github.com/influxdata/influxdb/issues/23172
2022-06-07 08:37:00 -07:00
davidby-influx 0ae0bd6e2e
fix: replace unprintable and invalid characters in errors (#23387)
Replace unprintable and invalid characters with '?'
in logged errors.  Truncate consecutive runs of them to
only 3 repeats of '?'

closes https://github.com/influxdata/influxdb/issues/23386
2022-06-01 13:45:24 -07:00
Geoffrey Wossum 160cf678d5
fix: MeasurementsCardinality should not be less than 0 (#23286)
Clamp the value of Store.MeasurementsCardinality so that it can not be less
than 0. This primarily shows up as a negative `numMeasurements` value in
/debug/vars under some circumstances.

refs #23285
2022-04-21 13:32:12 -05:00
Dane Strandboge 0574163566
build: upgrade to go1.18 (#23250) 2022-03-31 16:17:57 -05:00
davidby-influx 7d182158f4
fix: add database to MaxSeriesPerDatabase error message (#23113)
To simplify debugging, print the database name when the
max-series-per-database limit is exceeded in InMem indices.

closes https://github.com/influxdata/influxdb/issues/23112
2022-02-08 11:52:14 -08:00
davidby-influx f27df39c03
fix: add additional testing for MaxSeriesPerDatabase (#23094)
Added test to ensure new code path taken for inmem index
2022-02-02 13:16:09 -08:00
davidby-influx 0c3dca883e
fix: correctly handle MaxSeriesPerDatabaseExceeded (#23091)
Check for the correctly returned PartialWriteError
in (*shard).validateSeriesAndFields, allow partial
writes.

closes https://github.com/influxdata/influxdb/issues/23090
2022-02-01 19:08:51 -08:00
davidby-influx eb3bc7069f
feat: configurable DELETE concurrency (#23055)
Currently, deletion of series or measurements are
serialized. This new feature will add
max-concurrent-deletes to the [data] section of the
 configuration file. Legal values are any positive
 number, defaulting to 1, the current behavior.

 closes https://github.com/influxdata/influxdb/issues/23054
2022-01-13 11:04:57 -08:00
lifeibo 5be1c044c3
fix(tsi): sync index file before close (#21932) 2021-11-24 08:36:03 -05:00
Geoffrey Wossum 91609fdd3f
fix(restore): fix race condition which causes restore command to fail (#22796)
* fix(restore): fix race condition which causes restore command to fail

Fixes a race condition in the restore code path that causes shard data restores
to fail. When the bug occurs, `Error while freeing cold shard resources`
appears in the log files.

fixes issue #15323
2021-11-03 14:21:33 -05:00
davidby-influx af9e89a4d4
fix: detect misquoted tag values and return an error (#22754)
SHOW TAG KEYS FROM "foo" where bar="misquoted" is
erroneous, because the tag value must be enclosed
in single, not double, quotes. Although this
correctly returns no tag keys, it is very
inefficient and has cause out-of-memory failures
at a customer. This fix short-circuits the query.

closes https://github.com/influxdata/influxdb/issues/22755
2021-10-27 11:26:20 -07:00
davidby-influx d9b9e86db9
fix: extend snapshot copy to filesystems that cannot link (#22703)
If os.Link fails with syscall.ENOTSUP, then the file
system does not support links, and we must make copies
to snapshot files for backup. We also automatically make
copies instead of link on Windows, because although it
makes links, their semantics are different from Linux.

closes https://github.com/influxdata/influxdb/issues/16739
2021-10-21 12:53:26 -07:00
Dane Strandboge 06d1df22a2
chore: fix deadlock in `influx_inspect dumptsi` (#22661) 2021-10-20 12:48:59 -05:00
Dane Strandboge 8b38d0e2bf
build: upgrade protobuf library (#22606) 2021-10-15 11:42:47 -05:00
Sam Arnold 59fe8e515e
test: fix DiskSizeBytes flakiness (#22641) 2021-10-08 09:47:12 -04:00
Sam Arnold 611a4370a2
feat: show measurements database and retention policy wildcards (#22388)
* feat: show measurements database and retention policy wildcards

Closes #3318

* chore: run formatter
2021-10-05 09:07:25 -04:00
Dane Strandboge b4e781eff6
fix(tsdb): sync series segment to disk after writing (#22566) 2021-09-23 14:10:29 -05:00
davidby-influx 3702fe8e76
fix: for Windows, copy snapshot files being backed up (#22551)
On Windows, make copies of files for snapshots, because
Go does not support the FILE_SHARE_DELETE flag which
allows files (and links) to be deleted while open. This
causes temporary directories to be left behind after
backups.

closes https://github.com/influxdata/influxdb/issues/16289
2021-09-22 10:56:17 -07:00
davidby-influx e53f75e06d
fix: discard excessive errors (#22379)
The tsmBatchKeyIterator discards excessive errors to avoid
out-of-memory crashes when compacting very corrupt files.
Any error beyond DefaultMaxSavedErrors (100) will be
discarded instead of appended to the error slice.

closes https://github.com/influxdata/influxdb/issues/22328
2021-09-03 09:11:05 -07:00
Sam Arnold 38de69cc1c
fix: flux error properly read by cloud (#22348) 2021-08-31 17:43:12 -04:00
davidby-influx 926020e331
fix: correct error return shadowing (#22353) 2021-08-31 11:46:21 -07:00
Sam Arnold 1755b8f6d2
fix: TSI logfile race (#22338)
modTime should be protected by the read lock.

Fixes #22337
2021-08-30 17:43:37 -04:00
Tristan Su e5f6894037
fix(tsm): check write-ahead-log size (#18991) 2021-08-24 11:44:01 -04:00
davidby-influx 7d3efe1e9e
fix: avoid compaction queue stats flutter. (#22195)
When the compaction planner runs, if it cannot acquire
a lock on the files it plans to compact, it returns a
nil list of compaction groups. This, in turn, sets the
engine statistics for compactions queues to zero,
which is incorrect. Instead, use the length of pending
files which would have been returned.

closes https://github.com/influxdata/influxdb/issues/22138
2021-08-16 09:21:07 -07:00
Sam Arnold fd81373937
test: expose tcpaddr for enterprise tests (#22172)
* docs: update comment for series updates

* fix: expose TCP address for Enterprise test harness

* refactor: remove dead RemoteServer code
2021-08-11 17:19:26 -04:00
Sam Arnold 3ae389b359
test: add extra logging when disk size test fails (#22103) 2021-08-07 06:48:42 -04:00
Sam Arnold 444c22b67d
test: fix order of index teardown (#22038) 2021-08-04 16:34:51 -04:00
davidby-influx a989f8f8b6
fix: copy names from mmapped memory before closing iterator (#22040)
This fix ensures that memory-mapped files are not released
before pointers into them are copied into heap memory.
MeasurementNamesByExpr() and MeasurementNamesByPredicate() can
cause panics by copying memory from mmapped files that have been
released. The functions they call use iterators to files which
are closed (releasing the mmapped files) before the memory is
safely copied to the heap.

closes https://github.com/influxdata/influxdb/issues/22000
2021-08-04 13:16:00 -07:00
Sam Arnold e62efaf751
fix: old tsl files should be compacted without new writes (#22006)
* fix: old tsl files should be compacted wihout new writes

* chore: update changelog.md
2021-08-02 13:36:23 -04:00
Sam Arnold b64c2c3dcf
fix: tsi index should compact old or too-large log files (#21943)
* fix: tsi index should compact old log files that are too large

* chore: run automated formatter

* chore: update changelog

* fix: review comments
2021-07-26 17:40:15 -04:00
Sam Arnold 23c3d35aab
chore: update protobuf library versions and remove influx_tsm (#21882)
* chore: update protobufs

* fix: run codegen during build

* fix: fully remove influx_tsm
2021-07-20 09:42:52 -04:00
Sam Arnold 6d22e69ef1
fix: hard limit on field size while parsing line protocol (#21843)
Per https://docs.influxdata.com/enterprise_influxdb/v1.9/write_protocols/line_protocol_reference/
we only support 64KB, but 1MB is a more realistic practical limit. Before this commit there was
no enforcement of field value size.

Closes #21841
2021-07-14 17:11:09 -04:00
Tristan Su 108e2600b3
fix(tsi): clean up FileSet fields (#18961) 2021-07-12 10:42:38 -04:00
davidby-influx 73bdb2860e
chore: add logging to compaction (#21707)
Compaction logging will generate intermediate information on 
volume of data written and output files created, as well as 
improve some of the anti-entropy messages related to compaction.

This will also apply to `influx_tools compact`

Closes https://github.com/influxdata/influxdb/issues/21704
2021-06-16 15:28:44 -07:00
davidby-influx aca69e530f
fix: don't access a field in a nil struct (#21693) 2021-06-15 10:23:38 -07:00
davidby-influx bce6553459
fix: Do not close connection twice in DigestWithOptions (#21659)
tsm1.DigestWithOptions closes its network connection
twice. This may cause broken pipe errors on concurrent
invocations of the same procedure, by closing a reused
i/o descriptor. This fix also captures errors from TSM
file closures, which were previously ignored.

Closes https://github.com/influxdata/influxdb/issues/21656
2021-06-10 12:41:42 -07:00
davidby-influx f8202876ad
chore: minor refactor suggested by go lint (#21614)
(cherry picked from commit 7d10228e19)
2021-06-04 14:07:00 -07:00
davidby-influx f64be286be
fix: avoid rewriting fields.idx unnecessarily (#21592)
Under heavy write load creating new fields and measurements
the rewrite of the fields.idx file is a bottleneck. This
enhancement combines multiple writes into a single one and
shares any error return value with all of the combined
invocations. MeasurementFieldSet and the new 
MeasurementFieldSetWriter must both now be explicitly
closed.

Closes #21577
2021-06-04 09:21:33 -07:00
davidby-influx c8da9bafbf
chore(ae): add more logging (#21381) (#21452)
tsdb.Engine.IsIdle and tsdb.Engine.Digest now return a reason string for why the engine & shard are not idle.
Callers can then use this string for logging, if desired. The returned reason does not allocate memory, so the
caller may want to add the shard ID and path for more information in the log. This is intended to be used in
calls from the anti-entropy service in Enterprise.

(cherry picked from commit bf45841359)

fixes https://github.com/influxdata/influxdb/issues/21448
2021-05-11 09:46:45 -07:00
Sam Arnold 8edf7a4e2f
fix(storage): cursor requests are [start, stop] instead of [start, stop) (#21347)
* fix: backport tsdb fix for window pushdowns

From https://github.com/influxdata/influxdb/pull/19855

* fix(storage): cursor requests are [start, stop] instead of [start, stop)

The cursors were previously [start, stop) to be consistent with how flux
requests data, but the underlying storage file store was [start, stop]
because that's how influxql read data. This reverts back the cursor
behavior so that it is now [start, stop] everywhere and the conversion
from [start, stop) to [start, stop] is performed when doing the cursor
request to get the next cursor.

cherry-pick from #21318

Co-authored-by: Sam Arnold <sarnold@influxdata.com>
(cherry picked from commit 7766672797)

* chore: fix formatting

Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
2021-04-30 15:26:31 -04:00
Sam Arnold 32aa970eba
feat: mean,count aggregation for WindowAggregate pushdown in enterprise (#21291)
We support only one aggregate list [mean,count]. All other aggregates
still must be single-element lists.
2021-04-29 14:30:13 -04:00
davidby-influx 7f300dc248
fix: Anti-Entropy loops endlessly with empty shard (#21275)
The anti-entropy service will loop trying to copy an empty shard to a
data node missing that shard.  This fix is one of two changes that
correctly create an empty shard on a new node. This fix will set the
LastModified date of an empty shard directory to the modification time
of that directory, instead of to the Unix epoch.

Fixes: https://github.com/influxdata/influxdb/issues/21273
2021-04-23 09:06:03 -07:00
Daniel Moran 333cff1b15
fix(tsdb): exclude the stop time from the array cursor (#21139)
This is a backport of #14262 to the 1.x storage engine.

This also ports the table tests that existed with the pre-beta version of the
storage engine to the one that is now used in the production version.

A few of the tests are skipped. These are portions of the storage engine
that have not been ported over. They should be unskipped when that
functionality is ported over.


Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
2021-04-06 14:50:07 -04:00
Daniel Moran 31d4d742e8
refactor: rearrange flux-related storage code to match 2.x (#21114)
And fix CircleCI config
2021-04-01 14:25:48 -04:00
Daniel Moran a2154f143c
feat(storage): add support for window aggregate queries (#21107)
* feat: add cursors and readers for window aggregates
* fix: backport fix + tests for race condition in flux tag cache
* test: port 2.x test for array_cursor
2021-03-31 13:51:37 -04:00
Sam Arnold b7e7de24d6
refactor: separate coarse and fine permission interfaces (#20996) 2021-03-22 09:52:33 -04:00
Sam Arnold 04f4817aae
fix(services/storage): multi measurement queries return all applicable series (#19592) (#20934)
This fixes multi measurement queries that go through the storage service
to correctly pick up all series that apply with the filter. Previously,
negative queries such as `!=`, `!~`, and predicates attempting to match
empty tags did not work correctly with the storage service when multiple
measurements or `OR` conditions were included.

This was because these predicates would be categorized as "multiple
measurements" and then it would attempt to use the field keys iterator
to find the fields for each measurement. The meta queries for these did
not correctly account for negative equality operators or empty tags when
finding appropriate measurements and those could not be changed because
it would cause a breaking change to influxql too.

This modifies the storage service to use new methods that correctly
account for the above situations rather than the field keys iterator.

Some queries that appeared to be single measurement queries also get
considered as multiple measurement queries. Any query with an `OR`
condition will be considered a multiple measurement query.

This bug did not apply to single measurement queries where one
measurement was selected and all of the logical operators were `AND`
values. This is because it used a different code path that correctly
handled these situations.

Backport of #19566.

(cherry picked from commit ceead88bd5)

Co-authored-by: Jonathan A. Sternberg <jonathan@influxdata.com>
2021-03-12 16:34:14 -05:00
Daniel Moran 3eb4fdaf33
fix(tsm1): fix data race when accessing tombstone stats (#20903) 2021-03-09 15:20:40 -05:00
Sam Arnold de491dab97
refactor: Remove unused function add and unused variable keysHint (#20803) (#20805)
(cherry picked from commit 1068d1de6f)
2021-02-25 15:15:32 -05:00
Sam Arnold 17b9ea8723
feat: Add WITH KEY to show tag keys (#20793)
* fix: Change from RewriteExpr to PartitionExpr

Also remove some dead code

* feat: WITH KEY implementation

* feat: query rewriting for WITH KEY in SHOW TAG KEYS
2021-02-25 08:38:29 -05:00
Daniel Moran e85a10248d
fix(tsm1): fix data race and validation in cache ring (#20802)
Co-authored-by: Yun Zhao <zhaoyun2316@gmail.com>

Co-authored-by: Yun Zhao <zhaoyun2316@gmail.com>
2021-02-24 17:17:08 -05:00
davidby-influx 092c7a9976 feat: Make meta queries respect QueryTimeout values
Meta queries (SHOW TAG VALUES, SHOW TAG KEYS, SHOW SERIES CARDINALITY, etc.) do not respect
the QueryTimeout config parameter. Meta queries should check the query context when possible
to allow cancellation and timeout. This will not be as frequent as regular queries, which
use iterators, because meta queries return data in batches.

Add a context.Context to
(*Store).MeasurementNames()
(*Store).MeasurementsCardinality()
(*Store).SeriesCardinality()
(*Store).TagValues()
(*Store).TagKeys()
(*Store).SeriesSketches()
(*Store).MeasurementsSketches()
which is tested for timeout or cancellation
to allow limitation of time spent in meta queries

https://github.com/influxdata/influxdb/issues/20736
2021-02-23 12:52:44 -08:00
Sam Arnold de1a0eb2a9
feat: use count_hll for 'show series cardinality' queries (#20745)
Closes: https://github.com/influxdata/influxdb/issues/20614

Also fix nil pointer for seriesKey iterator

Fix for bug in: https://github.com/influxdata/influxdb/issues/20543

Also add a test for ingress metrics
2021-02-10 16:00:16 -05:00
Sam Arnold 903b8cd0ea
feat(query): Hyper log log operators in influxql (#20603)
* feat(query): hyper log log counting in query engine

In addition to helping with normal queries, this can improve the 'SHOW CARDINALITY'
meta-queries:

time influx -database mydb -execute 'select count_hll(sum_hll(_seriesKey)) from big'
name: big
time count_hll
---- ---------
0    200767781
influx -database mydb -execute   0.06s user 0.12s system 0% cpu 8:49.99 total
2021-02-08 08:38:14 -05:00
Sam Arnold 21823db00b
feat: series creation ingress metrics (#20700)
After turning this on and testing locally, note the 'seriesCreated' metric

"localStore": {"name":"localStore","tags":null,"values":{"pointsWritten":2987,"seriesCreated":58,"valuesWritten":23754}},
"ingress": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"cq","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":4}},
"ingress:1": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"database","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":4}},
"ingress:2": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"httpd","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":46}},
"ingress:3": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"ingress","rp":"monitor"},"values":{"pointsWritten":14,"seriesCreated":14,"valuesWritten":42}},
"ingress:4": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"localStore","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":6}},
"ingress:5": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"queryExecutor","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":10}},
"ingress:6": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"runtime","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":30}},
"ingress:7": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"shard","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":22}},
"ingress:8": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"subscriber","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":6}},
"ingress:9": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_cache","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":18}},
"ingress:10": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_engine","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":58}},
"ingress:11": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_filestore","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":4}},
"ingress:12": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_wal","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":8}},
"ingress:13": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"write","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":18}},
"ingress:14": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"cpu","rp":"autogen"},"values":{"pointsWritten":1342,"seriesCreated":13,"valuesWritten":13420}},
"ingress:15": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"disk","rp":"autogen"},"values":{"pointsWritten":642,"seriesCreated":6,"valuesWritten":4494}},
"ingress:16": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"diskio","rp":"autogen"},"values":{"pointsWritten":214,"seriesCreated":2,"valuesWritten":2354}},
"ingress:17": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"mem","rp":"autogen"},"values":{"pointsWritten":107,"seriesCreated":1,"valuesWritten":963}},
"ingress:18": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"processes","rp":"autogen"},"values":{"pointsWritten":107,"seriesCreated":1,"valuesWritten":856}},
"ingress:19": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"swap","rp":"autogen"},"values":{"pointsWritten":214,"seriesCreated":1,"valuesWritten":642}},
"ingress:20": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"system","rp":"autogen"},"values":{"pointsWritten":321,"seriesCreated":1,"valuesWritten":749}},

Closes: https://github.com/influxdata/influxdb/issues/20613
2021-02-05 14:52:43 -04:00
Sam Arnold dd3baf6d4a
feat: measurement metrics by login (#20687)
After turning on authentication and both forms of ingress metrics:

"ingress": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"cq","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":76}},
"ingress:1": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"database","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":152}},
"ingress:2": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"httpd","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":874}},
"ingress:3": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"ingress","rp":"monitor"},"values":{"pointsWritten":534,"valuesWritten":1068}},
"ingress:4": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"localStore","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":76}},
"ingress:5": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"queryExecutor","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":190}},
"ingress:6": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"runtime","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":570}},
"ingress:7": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"shard","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":836}},
"ingress:8": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"subscriber","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":114}},
"ingress:9": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_cache","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":684}},
"ingress:10": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_engine","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":2204}},
"ingress:11": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_filestore","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":152}},
"ingress:12": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_wal","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":304}},
"ingress:13": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"write","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":342}},
"ingress:14": {"name":"ingress","tags":{"db":"telegraf","login":"admin","measurement":"cpu","rp":"autogen"},"values":{"pointsWritten":1,"valuesWritten":1}},
"ingress:15": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"cpu","rp":"autogen"},"values":{"pointsWritten":1316,"valuesWritten":13160}},
"ingress:16": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"disk","rp":"autogen"},"values":{"pointsWritten":642,"valuesWritten":4494}},
"ingress:17": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"diskio","rp":"autogen"},"values":{"pointsWritten":214,"valuesWritten":2354}},
"ingress:18": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"mem","rp":"autogen"},"values":{"pointsWritten":107,"valuesWritten":963}},
"ingress:19": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"processes","rp":"autogen"},"values":{"pointsWritten":107,"valuesWritten":856}},
"ingress:20": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"swap","rp":"autogen"},"values":{"pointsWritten":214,"valuesWritten":642}},
"ingress:21": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"system","rp":"autogen"},"values":{"pointsWritten":321,"valuesWritten":749}},

Only by login:

"ingress": {"name":"ingress","tags":{"login":"_systemuser_monitor"},"values":{"pointsWritten":42,"valuesWritten":354}},
"ingress:1": {"name":"ingress","tags":{"login":"admin"},"values":{"pointsWritten":1,"valuesWritten":1}},
"ingress:2": {"name":"ingress","tags":{"login":"telegraf"},"values":{"pointsWritten":3547,"valuesWritten":28246}},

Notice writes by users 'telegraf', '_systemuser_monitor', and 'admin'.
2021-02-04 11:52:53 -05:00
Sam Arnold b3e763d96f
fix: consistent error for missing shard (#20694) 2021-02-04 09:49:14 -05:00
Sam Arnold eb92c997cd feat: Ingress metrics by measurement
Partial implementation of https://github.com/influxdata/influxdb/issues/20612

Implements per-measurement points written metric. Next step: Also support per-login.
2021-02-02 15:58:28 -05:00
Sam Arnold 117341fb0f fix: Move value metric down to tsdb store
Previously we tracked values on the http ingress, but the tsdb store is the correct
place to track total values written for the instance.
2021-02-02 10:58:47 -05:00
Sam Arnold 6795ec6c01 refactor: do not use context value anti-pattern
Extending the context instead of fixing the API breaks type safety.
For tracking the number of points / values written, it is much clearer
to pass an explicit tracker.
2021-02-01 14:34:11 -05:00
Sam Arnold 8a16bf0531 chore: run goimports -w ./ 2021-01-29 11:40:02 -05:00
Sam Arnold d28bcb8e27
Merge pull request #20544 from lesam/series-iteration-optimization
feat(tsi): optimize series iteration
2021-01-25 18:17:18 -04:00
Sam Arnold 98a76a11a0 feat(tsi): optimize series iteration
When using queries like 'select count(_seriesKey) from bigmeasurement`, we
should iterate over the tsi structures to serve the query instead of loading
all the series into memory up front.

Closes #20543
2021-01-25 14:27:31 -05:00
davidby-influx fe3af66c54
fix(tsdb): minimize lock contention when adding new fields or measurements (#20504)
fields.idx frequent writes cause lock contention and fields.idx is recreated
when a field or measurement is added in a WritePointsWithContext()
This eliminates locking during the actual file rewrite, and limits it to
the times when the MeasurementFieldSet is actually being read or written 
in memory and when the new file is being renamed.

Test verification of correct behavior by checking the fields.idx
file matches the in-memory copy after heavily parallel measurement addition.

Fixes https://github.com/influxdata/influxdb/issues/20500
2021-01-15 08:31:45 -08:00
Sam Arnold 32612313df fix: minor test fixes for go1.15 and also flaky timeouts
Also run gofmt
2021-01-08 14:59:33 -05:00
davidby-influx 9e33be2619
fix(error): SELECT INTO doesn't return error with unsupported value (#20429)
When a SELECT INTO query generates an illegal value that cannot be inserted,
like +/- Inf, it should return an error, rather than failing silently.
This adds a boolean parameter to the [data] section of influxdb.conf:
* strict-error-handling
When false, the default, the old behavior is preserved.  When true,
unsupported values will return an error from SELECT INTO queries

Fixes https://github.com/influxdata/influxdb/issues/20426
2020-12-30 18:22:43 -08:00
Sam Arnold d1a1e4b667 chore: restore ImportShard
This reverts commit d14acea44d.
2020-12-07 11:01:00 -04:00
davidby-influx 0faac1a478 chore(tsm1): fix formatting
Failed to format code before commit.
2020-11-16 21:25:26 -08:00
davidby-influx b3724581bc fix(tsm1): "snapshot in progress" error during backup
Loop with backoff in (*Engine).CreateSnapshot() to retry
(*Engine).WriteSnapshot() up to 3 times if
ErrSnapshotInPrgress is returned.  Then continue
on no error or on SnapshotInProgress if skipCacheOk is
true.

https://github.com/influxdata/plutonium/issues/3227
(cherry picked from commit dfa6aa8cea)
2020-11-16 21:23:00 -08:00
davidby-influx 0dcff81f56 fix(tsm1): "snapshot in progress" error during backup
Test the skipCacheOk flag to tsdb.Shard.CreateSnapshot() and
tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache
snapshot cannot be taken.

https://github.com/influxdata/plutonium/issues/3227
2020-11-05 16:50:51 -08:00
davidby-influx 6ec446f422 fix(tsm1): "snapshot in progress" error during backup
This fix adds a skipCacheOk flag to
tsdb.Store.CreateShardSnapshot() and tsdb.Shard.CreateSnapshot()
to pass to tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
This flag is set to false in tsm1.Engine.Export()

https://github.com/influxdata/plutonium/issues/3227
2020-11-05 11:08:08 -08:00
davidby-influx 23be20bf1b fix(tsm1): "snapshot in progress" error during backup
When an InfluxDB database is very busy writing new points the backup
the process can fail because it can not write a new snapshot.
The error is: operation timed out with error: create snapshot: snapshot in progress.
This happens because InfluxDB takes almost "continuously" a snapshot
from the cache caused by the high number of points ingested.
The fix for this was https://github.com/influxdata/influxdb/pull/16627
but it was for OSS only, and was not in the code path for backups
in clusters.
This fix adds a skipCacheOk flag to tsdb.Engine.CreateSnapshot().
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
and in tsdb.Shard.CreateSnapshot(), the cluster backup code path.
This flag is set to false in tsm1.Engine.Export()

https://github.com/influxdata/plutonium/issues/3227
2020-10-30 10:37:36 -07:00
David Norton 3d92eef720 feat: allow disable compaction per shard
This feature allows compaction to be disabled on a per-shard basis by
creating a file named do_not_compact in a shard's directory. When
disabled, a message is logged every 15 minutes with the reason for
compaction being disabled (existance of the file). This makes it easy to
know if compaction has been disabled for any shards by searching the log
for "compaction disabled" or running "find path/to/data -type f -name
do_not_compact".
2020-10-06 10:58:07 -04:00