Commit Graph

14558 Commits (de1a0eb2a919548b10f5a81bea427e1f268daf0b)

Author SHA1 Message Date
Sam Arnold de1a0eb2a9
feat: use count_hll for 'show series cardinality' queries (#20745)
Closes: https://github.com/influxdata/influxdb/issues/20614

Also fix nil pointer for seriesKey iterator

Fix for bug in: https://github.com/influxdata/influxdb/issues/20543

Also add a test for ingress metrics
2021-02-10 16:00:16 -05:00
Sam Arnold 903b8cd0ea
feat(query): Hyper log log operators in influxql (#20603)
* feat(query): hyper log log counting in query engine

In addition to helping with normal queries, this can improve the 'SHOW CARDINALITY'
meta-queries:

time influx -database mydb -execute 'select count_hll(sum_hll(_seriesKey)) from big'
name: big
time count_hll
---- ---------
0    200767781
influx -database mydb -execute   0.06s user 0.12s system 0% cpu 8:49.99 total
2021-02-08 08:38:14 -05:00
Sam Arnold 21823db00b
feat: series creation ingress metrics (#20700)
After turning this on and testing locally, note the 'seriesCreated' metric

"localStore": {"name":"localStore","tags":null,"values":{"pointsWritten":2987,"seriesCreated":58,"valuesWritten":23754}},
"ingress": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"cq","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":4}},
"ingress:1": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"database","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":4}},
"ingress:2": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"httpd","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":46}},
"ingress:3": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"ingress","rp":"monitor"},"values":{"pointsWritten":14,"seriesCreated":14,"valuesWritten":42}},
"ingress:4": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"localStore","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":6}},
"ingress:5": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"queryExecutor","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":10}},
"ingress:6": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"runtime","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":30}},
"ingress:7": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"shard","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":22}},
"ingress:8": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"subscriber","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":6}},
"ingress:9": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_cache","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":18}},
"ingress:10": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_engine","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":58}},
"ingress:11": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_filestore","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":4}},
"ingress:12": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_wal","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":2,"valuesWritten":8}},
"ingress:13": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"write","rp":"monitor"},"values":{"pointsWritten":2,"seriesCreated":1,"valuesWritten":18}},
"ingress:14": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"cpu","rp":"autogen"},"values":{"pointsWritten":1342,"seriesCreated":13,"valuesWritten":13420}},
"ingress:15": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"disk","rp":"autogen"},"values":{"pointsWritten":642,"seriesCreated":6,"valuesWritten":4494}},
"ingress:16": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"diskio","rp":"autogen"},"values":{"pointsWritten":214,"seriesCreated":2,"valuesWritten":2354}},
"ingress:17": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"mem","rp":"autogen"},"values":{"pointsWritten":107,"seriesCreated":1,"valuesWritten":963}},
"ingress:18": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"processes","rp":"autogen"},"values":{"pointsWritten":107,"seriesCreated":1,"valuesWritten":856}},
"ingress:19": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"swap","rp":"autogen"},"values":{"pointsWritten":214,"seriesCreated":1,"valuesWritten":642}},
"ingress:20": {"name":"ingress","tags":{"db":"telegraf","login":"_systemuser_unknown","measurement":"system","rp":"autogen"},"values":{"pointsWritten":321,"seriesCreated":1,"valuesWritten":749}},

Closes: https://github.com/influxdata/influxdb/issues/20613
2021-02-05 14:52:43 -04:00
Sam Arnold dd3baf6d4a
feat: measurement metrics by login (#20687)
After turning on authentication and both forms of ingress metrics:

"ingress": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"cq","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":76}},
"ingress:1": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"database","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":152}},
"ingress:2": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"httpd","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":874}},
"ingress:3": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"ingress","rp":"monitor"},"values":{"pointsWritten":534,"valuesWritten":1068}},
"ingress:4": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"localStore","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":76}},
"ingress:5": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"queryExecutor","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":190}},
"ingress:6": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"runtime","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":570}},
"ingress:7": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"shard","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":836}},
"ingress:8": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"subscriber","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":114}},
"ingress:9": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_cache","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":684}},
"ingress:10": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_engine","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":2204}},
"ingress:11": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_filestore","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":152}},
"ingress:12": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"tsm1_wal","rp":"monitor"},"values":{"pointsWritten":76,"valuesWritten":304}},
"ingress:13": {"name":"ingress","tags":{"db":"_internal","login":"_systemuser_monitor","measurement":"write","rp":"monitor"},"values":{"pointsWritten":38,"valuesWritten":342}},
"ingress:14": {"name":"ingress","tags":{"db":"telegraf","login":"admin","measurement":"cpu","rp":"autogen"},"values":{"pointsWritten":1,"valuesWritten":1}},
"ingress:15": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"cpu","rp":"autogen"},"values":{"pointsWritten":1316,"valuesWritten":13160}},
"ingress:16": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"disk","rp":"autogen"},"values":{"pointsWritten":642,"valuesWritten":4494}},
"ingress:17": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"diskio","rp":"autogen"},"values":{"pointsWritten":214,"valuesWritten":2354}},
"ingress:18": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"mem","rp":"autogen"},"values":{"pointsWritten":107,"valuesWritten":963}},
"ingress:19": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"processes","rp":"autogen"},"values":{"pointsWritten":107,"valuesWritten":856}},
"ingress:20": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"swap","rp":"autogen"},"values":{"pointsWritten":214,"valuesWritten":642}},
"ingress:21": {"name":"ingress","tags":{"db":"telegraf","login":"telegraf","measurement":"system","rp":"autogen"},"values":{"pointsWritten":321,"valuesWritten":749}},

Only by login:

"ingress": {"name":"ingress","tags":{"login":"_systemuser_monitor"},"values":{"pointsWritten":42,"valuesWritten":354}},
"ingress:1": {"name":"ingress","tags":{"login":"admin"},"values":{"pointsWritten":1,"valuesWritten":1}},
"ingress:2": {"name":"ingress","tags":{"login":"telegraf"},"values":{"pointsWritten":3547,"valuesWritten":28246}},

Notice writes by users 'telegraf', '_systemuser_monitor', and 'admin'.
2021-02-04 11:52:53 -05:00
Sam Arnold b3e763d96f
fix: consistent error for missing shard (#20694) 2021-02-04 09:49:14 -05:00
Sam Arnold bb27966fc2
Merge pull request #20677 from lesam/ingress-metrics-measurement-points
feat: Ingress metrics by measurement
2021-02-02 16:25:10 -05:00
Sam Arnold eb92c997cd feat: Ingress metrics by measurement
Partial implementation of https://github.com/influxdata/influxdb/issues/20612

Implements per-measurement points written metric. Next step: Also support per-login.
2021-02-02 15:58:28 -05:00
Sam Arnold 7a9a0ec1bf
Merge pull request #20664 from lesam/measurement-metric-ingress
refactor: do not use context value anti-pattern
2021-02-02 11:12:43 -05:00
Sam Arnold 117341fb0f fix: Move value metric down to tsdb store
Previously we tracked values on the http ingress, but the tsdb store is the correct
place to track total values written for the instance.
2021-02-02 10:58:47 -05:00
Sam Arnold 6795ec6c01 refactor: do not use context value anti-pattern
Extending the context instead of fixing the API breaks type safety.
For tracking the number of points / values written, it is much clearer
to pass an explicit tracker.
2021-02-01 14:34:11 -05:00
Sam Arnold a483fb5068
Merge pull request #20649 from lesam/add-goimports
chore: add goimports to ci checks
2021-01-29 14:19:39 -05:00
Sam Arnold ec40d5c380 chore: Fix spaces 2021-01-29 13:20:36 -05:00
Sam Arnold 053fae914b chore: update docs with install instructions for goimports 2021-01-29 11:45:17 -05:00
Sam Arnold 8a16bf0531 chore: run goimports -w ./ 2021-01-29 11:40:02 -05:00
Sam Arnold b0d26fe412 chore: add goimports to ci checks 2021-01-29 11:39:42 -05:00
Sam Arnold b2f0d05ecc
Merge pull request #20647 from lesam/add-diskusage-inspect
feat(inspect): Add report-disk for disk usage by measurement
2021-01-29 11:36:31 -05:00
Sam Arnold 3a31e2370e fix(inspect): bad pattern matching 2021-01-29 10:55:20 -05:00
Sam Arnold a6152e8ac1 feat(inspect): Add report-disk for disk usage by measurement 2021-01-29 10:51:42 -05:00
Sam Arnold d28bcb8e27
Merge pull request #20544 from lesam/series-iteration-optimization
feat(tsi): optimize series iteration
2021-01-25 18:17:18 -04:00
Sam Arnold 98a76a11a0 feat(tsi): optimize series iteration
When using queries like 'select count(_seriesKey) from bigmeasurement`, we
should iterate over the tsi structures to serve the query instead of loading
all the series into memory up front.

Closes #20543
2021-01-25 14:27:31 -05:00
davidby-influx fe3af66c54
fix(tsdb): minimize lock contention when adding new fields or measurements (#20504)
fields.idx frequent writes cause lock contention and fields.idx is recreated
when a field or measurement is added in a WritePointsWithContext()
This eliminates locking during the actual file rewrite, and limits it to
the times when the MeasurementFieldSet is actually being read or written 
in memory and when the new file is being renamed.

Test verification of correct behavior by checking the fields.idx
file matches the in-memory copy after heavily parallel measurement addition.

Fixes https://github.com/influxdata/influxdb/issues/20500
2021-01-15 08:31:45 -08:00
Sam Arnold 415361e1eb
Merge pull request #20481 from influxdata/fix-tests
fix: minor test fixes for go1.15 and also flaky timeouts
2021-01-08 15:21:19 -05:00
Sam Arnold 32612313df fix: minor test fixes for go1.15 and also flaky timeouts
Also run gofmt
2021-01-08 14:59:33 -05:00
davidby-influx 9e33be2619
fix(error): SELECT INTO doesn't return error with unsupported value (#20429)
When a SELECT INTO query generates an illegal value that cannot be inserted,
like +/- Inf, it should return an error, rather than failing silently.
This adds a boolean parameter to the [data] section of influxdb.conf:
* strict-error-handling
When false, the default, the old behavior is preserved.  When true,
unsupported values will return an error from SELECT INTO queries

Fixes https://github.com/influxdata/influxdb/issues/20426
2020-12-30 18:22:43 -08:00
davidby-influx 2e26dc62cb
build: switch tested centos base images (#20417) 2020-12-23 21:17:55 -08:00
davidby-influx 8a8b25ec4f
fix(prometheus): regexp handling should comply with PromQL (#19832) (#20388)
(cherry picked from commit 5296fe990f)

Co-authored-by: Tristan Su <foobar@users.noreply.github.com>
2020-12-18 14:58:29 -08:00
davidby-influx 5b98166b05
fix: cp.Mux.Serve() closes all net.Listener instances silently on error. (#20278)
A customer has seen a rash of "connection refused" errors to the meta node.
This fix ensures that when net.Listener instances are closed because of an
error in Accept(), influxdb logs the error which caused the closures, as well
as any errors in closing the Listeners.

Fixes https://github.com/influxdata/influxdb/issues/20256
2020-12-08 16:17:59 -08:00
davidby-influx 6ac0bb3fe3
fix(error): unsupported value: +Inf" error not handled gracefully (#20250)
JSON marshalling errors should be returned properly formatted in JSON
like other errors. This fix formats marshalling errors the same way
influxdb formats other query errors.

Fixes https://github.com/influxdata/influxdb/issues/20249
2020-12-07 13:03:55 -08:00
Sam Arnold da7a4fd379
Merge pull request #20266 from lesam/fix-build-for-clustering
chore: Fix build for clustering
2020-12-07 13:23:50 -04:00
Sam Arnold d96c8fb125 chore: fix clustering build
Clustering requires taking the hash of synthetic points, so
allow this function to work on anything with a HashID.
2020-12-07 11:24:45 -04:00
Sam Arnold d1a1e4b667 chore: restore ImportShard
This reverts commit d14acea44d.
2020-12-07 11:01:00 -04:00
davidby-influx df39b1e71c
fix(query): Group By queries with offset that crosses a DST boundary can fail (#20230)
* fix(query): Group By queries with offset that crosses a DST boundary can fail

Customer reported that a GROUP BY query with an offset that caused an interval
to cross a daylight savings change inserted an extra output row off by one hour.
This fix ensured that the start time for the interval of a GROUP BY operator is
correctly set before calculating the time zone offset for that date and time.

Add TestGroupByIterator_DST() in query/iterator_test.go
for regression testing of this bug.

Fixes https://github.com/influxdata/influxdb/issues/20238
2020-12-04 09:40:43 -08:00
Daniel Moran 5d922e9d0e
feat: Optimize shard lookups in groups containing only one shard (#20118) (#20200)
Co-authored-by: Yun Zhao <zhaoyun2316@gmail.com>
2020-11-30 15:16:21 -05:00
Ayan George e75e83314b
fix: Reuse http server (#20191)
Once applied, this patch will use the same net/http.Server value to
handle all http requests.

This simplifies cleanly shutting down the server.
2020-11-29 21:03:19 -05:00
Ayan George 72fde1b5d9
fix: Properly shutdown multiple http servers (#20183)
* fix: Properly shutodnw http server on Close()
2020-11-25 13:58:04 -05:00
Ayan George 8d90d953d7
fix: Properly shutdown http server on Close() (#20171) 2020-11-25 12:46:55 -05:00
davidby-influx f401aded6b
Merge pull request #20100 from influxdata/influxdb_1891
fix(write): Successful writes increment write error statistics incorrectly
2020-11-20 15:09:52 -08:00
davidby-influx c05c5575f6 chore: remove CHANGELOG.md changes 2020-11-20 09:13:29 -08:00
davidby-influx 6aa5495426 chore: update CHANGELOG.md 2020-11-19 08:45:41 -08:00
davidby-influx 3d9f8b5020 fix(write): Successful writes increment write error statistics incorrectly.
In v1.8.3 and earlier, the write path through (*PointsWriter) writeToShardWithContext() always increments the WriteErr count in the debug variables, and does not increment the WriteOK count.
https://github.com/influxdata/influxdb/blob/v1.8.3/coordinator/points_writer.go line 450 should be an else if err != nil { instead of an else
This has been reported in a customer cloud instance, and verified under a debugger.

https://github.com/influxdata/influxdb/issues/20098
2020-11-18 19:39:42 -08:00
davidby-influx af00cb7bbd
Merge pull request #20063 from influxdata/DSB_SnapshotInProgress_master-1.x
fix(tsm1): "snapshot in progress" error during backup: restore loop with backoff
2020-11-17 10:06:46 -08:00
davidby-influx 0faac1a478 chore(tsm1): fix formatting
Failed to format code before commit.
2020-11-16 21:25:26 -08:00
davidby-influx b3724581bc fix(tsm1): "snapshot in progress" error during backup
Loop with backoff in (*Engine).CreateSnapshot() to retry
(*Engine).WriteSnapshot() up to 3 times if
ErrSnapshotInPrgress is returned.  Then continue
on no error or on SnapshotInProgress if skipCacheOk is
true.

https://github.com/influxdata/plutonium/issues/3227
(cherry picked from commit dfa6aa8cea)
2020-11-16 21:23:00 -08:00
davidby-influx cc1e70baf4
Merge pull request #19869 from influxdata/DSB_SnapshotInProgress_3227
fix(tsm1): "snapshot in progress" error during backup
2020-11-12 15:42:22 -08:00
davidby-influx 0dcff81f56 fix(tsm1): "snapshot in progress" error during backup
Test the skipCacheOk flag to tsdb.Shard.CreateSnapshot() and
tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache
snapshot cannot be taken.

https://github.com/influxdata/plutonium/issues/3227
2020-11-05 16:50:51 -08:00
davidby-influx 6ec446f422 fix(tsm1): "snapshot in progress" error during backup
This fix adds a skipCacheOk flag to
tsdb.Store.CreateShardSnapshot() and tsdb.Shard.CreateSnapshot()
to pass to tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
This flag is set to false in tsm1.Engine.Export()

https://github.com/influxdata/plutonium/issues/3227
2020-11-05 11:08:08 -08:00
Ayan George 225bcecd73
fix: Upgrade version of jwt-go package to v4.0.0 (#19893)
* fix: Upgrade version of jwt-go package to v4.0.0

This commit updates the dependencies for influxdb to require v4.0.0-preview1 of
the jwt-go package.  This required updating the go.mod and go.sum files as well
as any source file that directly imported that package.

Prior to this commit, the TestHandler_Query_Auth() tests would fail as it
checked for specific error strigns returned by the jwt-go package.

Version 4.0.0-preview1 of the package changed the verbiage of those errors a
bit.  This patch updates the test to detect the new error string.
2020-11-05 10:55:24 -05:00
davidby-influx 23be20bf1b fix(tsm1): "snapshot in progress" error during backup
When an InfluxDB database is very busy writing new points the backup
the process can fail because it can not write a new snapshot.
The error is: operation timed out with error: create snapshot: snapshot in progress.
This happens because InfluxDB takes almost "continuously" a snapshot
from the cache caused by the high number of points ingested.
The fix for this was https://github.com/influxdata/influxdb/pull/16627
but it was for OSS only, and was not in the code path for backups
in clusters.
This fix adds a skipCacheOk flag to tsdb.Engine.CreateSnapshot().
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
and in tsdb.Shard.CreateSnapshot(), the cluster backup code path.
This flag is set to false in tsm1.Engine.Export()

https://github.com/influxdata/plutonium/issues/3227
2020-10-30 10:37:36 -07:00
Ayan George f7eb697dd3
refactor: Use filepath.Walk (#19514)
Prior to this commit, we had our own recursive file walker which
required a condition based on if s.Config.TypesDB pointed to a directory
or a regular file.

This commit replaces our own readdir() with filepath.Walk() and treats
recursing directories and loading one file as a single case.  This
simplifies the code quite a bit.
2020-10-21 10:29:48 -04:00
Ayan George b1def70670
feat: generate modern profiles (#19655)
* feat: generate modern profiles

Prior to this commit, influxd was writing legacy profiling data which
often (always?) required an accompanying executable to use.

This commit instructs influxd to write profiles in the new format which
can be examined without a binary.

While we're at it, this commit also adds the allocs and threadcreate
profiles.

Finally, this patch also changes the format of the downloaded tar in the
following ways:

* The profiles are added to the profile/ directory -- so instead of
  extracting the profiles into your current directory, they're placed in
  a "profiles" directory.

* This commit adds the .pb.gz extension to each of the files since
  they're gzipped protobuf files and not .txt.
2020-10-21 09:26:15 -04:00