Commit Graph

14591 Commits (fbfd4b46514062ce8688337a07d64d18d81658d9)

Author SHA1 Message Date
Sam Arnold a6152e8ac1 feat(inspect): Add report-disk for disk usage by measurement 2021-01-29 10:51:42 -05:00
Sam Arnold d28bcb8e27
Merge pull request #20544 from lesam/series-iteration-optimization
feat(tsi): optimize series iteration
2021-01-25 18:17:18 -04:00
Sam Arnold 98a76a11a0 feat(tsi): optimize series iteration
When using queries like 'select count(_seriesKey) from bigmeasurement`, we
should iterate over the tsi structures to serve the query instead of loading
all the series into memory up front.

Closes #20543
2021-01-25 14:27:31 -05:00
davidby-influx fe3af66c54
fix(tsdb): minimize lock contention when adding new fields or measurements (#20504)
fields.idx frequent writes cause lock contention and fields.idx is recreated
when a field or measurement is added in a WritePointsWithContext()
This eliminates locking during the actual file rewrite, and limits it to
the times when the MeasurementFieldSet is actually being read or written 
in memory and when the new file is being renamed.

Test verification of correct behavior by checking the fields.idx
file matches the in-memory copy after heavily parallel measurement addition.

Fixes https://github.com/influxdata/influxdb/issues/20500
2021-01-15 08:31:45 -08:00
Sam Arnold 415361e1eb
Merge pull request #20481 from influxdata/fix-tests
fix: minor test fixes for go1.15 and also flaky timeouts
2021-01-08 15:21:19 -05:00
Sam Arnold 32612313df fix: minor test fixes for go1.15 and also flaky timeouts
Also run gofmt
2021-01-08 14:59:33 -05:00
davidby-influx 9e33be2619
fix(error): SELECT INTO doesn't return error with unsupported value (#20429)
When a SELECT INTO query generates an illegal value that cannot be inserted,
like +/- Inf, it should return an error, rather than failing silently.
This adds a boolean parameter to the [data] section of influxdb.conf:
* strict-error-handling
When false, the default, the old behavior is preserved.  When true,
unsupported values will return an error from SELECT INTO queries

Fixes https://github.com/influxdata/influxdb/issues/20426
2020-12-30 18:22:43 -08:00
davidby-influx 2e26dc62cb
build: switch tested centos base images (#20417) 2020-12-23 21:17:55 -08:00
davidby-influx 8a8b25ec4f
fix(prometheus): regexp handling should comply with PromQL (#19832) (#20388)
(cherry picked from commit 5296fe990f)

Co-authored-by: Tristan Su <foobar@users.noreply.github.com>
2020-12-18 14:58:29 -08:00
davidby-influx 5b98166b05
fix: cp.Mux.Serve() closes all net.Listener instances silently on error. (#20278)
A customer has seen a rash of "connection refused" errors to the meta node.
This fix ensures that when net.Listener instances are closed because of an
error in Accept(), influxdb logs the error which caused the closures, as well
as any errors in closing the Listeners.

Fixes https://github.com/influxdata/influxdb/issues/20256
2020-12-08 16:17:59 -08:00
davidby-influx 6ac0bb3fe3
fix(error): unsupported value: +Inf" error not handled gracefully (#20250)
JSON marshalling errors should be returned properly formatted in JSON
like other errors. This fix formats marshalling errors the same way
influxdb formats other query errors.

Fixes https://github.com/influxdata/influxdb/issues/20249
2020-12-07 13:03:55 -08:00
Sam Arnold da7a4fd379
Merge pull request #20266 from lesam/fix-build-for-clustering
chore: Fix build for clustering
2020-12-07 13:23:50 -04:00
Sam Arnold d96c8fb125 chore: fix clustering build
Clustering requires taking the hash of synthetic points, so
allow this function to work on anything with a HashID.
2020-12-07 11:24:45 -04:00
Sam Arnold d1a1e4b667 chore: restore ImportShard
This reverts commit d14acea44d.
2020-12-07 11:01:00 -04:00
davidby-influx df39b1e71c
fix(query): Group By queries with offset that crosses a DST boundary can fail (#20230)
* fix(query): Group By queries with offset that crosses a DST boundary can fail

Customer reported that a GROUP BY query with an offset that caused an interval
to cross a daylight savings change inserted an extra output row off by one hour.
This fix ensured that the start time for the interval of a GROUP BY operator is
correctly set before calculating the time zone offset for that date and time.

Add TestGroupByIterator_DST() in query/iterator_test.go
for regression testing of this bug.

Fixes https://github.com/influxdata/influxdb/issues/20238
2020-12-04 09:40:43 -08:00
Daniel Moran 5d922e9d0e
feat: Optimize shard lookups in groups containing only one shard (#20118) (#20200)
Co-authored-by: Yun Zhao <zhaoyun2316@gmail.com>
2020-11-30 15:16:21 -05:00
Ayan George e75e83314b
fix: Reuse http server (#20191)
Once applied, this patch will use the same net/http.Server value to
handle all http requests.

This simplifies cleanly shutting down the server.
2020-11-29 21:03:19 -05:00
Ayan George 72fde1b5d9
fix: Properly shutdown multiple http servers (#20183)
* fix: Properly shutodnw http server on Close()
2020-11-25 13:58:04 -05:00
Ayan George 8d90d953d7
fix: Properly shutdown http server on Close() (#20171) 2020-11-25 12:46:55 -05:00
davidby-influx f401aded6b
Merge pull request #20100 from influxdata/influxdb_1891
fix(write): Successful writes increment write error statistics incorrectly
2020-11-20 15:09:52 -08:00
davidby-influx c05c5575f6 chore: remove CHANGELOG.md changes 2020-11-20 09:13:29 -08:00
davidby-influx 6aa5495426 chore: update CHANGELOG.md 2020-11-19 08:45:41 -08:00
davidby-influx 3d9f8b5020 fix(write): Successful writes increment write error statistics incorrectly.
In v1.8.3 and earlier, the write path through (*PointsWriter) writeToShardWithContext() always increments the WriteErr count in the debug variables, and does not increment the WriteOK count.
https://github.com/influxdata/influxdb/blob/v1.8.3/coordinator/points_writer.go line 450 should be an else if err != nil { instead of an else
This has been reported in a customer cloud instance, and verified under a debugger.

https://github.com/influxdata/influxdb/issues/20098
2020-11-18 19:39:42 -08:00
davidby-influx af00cb7bbd
Merge pull request #20063 from influxdata/DSB_SnapshotInProgress_master-1.x
fix(tsm1): "snapshot in progress" error during backup: restore loop with backoff
2020-11-17 10:06:46 -08:00
davidby-influx 0faac1a478 chore(tsm1): fix formatting
Failed to format code before commit.
2020-11-16 21:25:26 -08:00
davidby-influx b3724581bc fix(tsm1): "snapshot in progress" error during backup
Loop with backoff in (*Engine).CreateSnapshot() to retry
(*Engine).WriteSnapshot() up to 3 times if
ErrSnapshotInPrgress is returned.  Then continue
on no error or on SnapshotInProgress if skipCacheOk is
true.

https://github.com/influxdata/plutonium/issues/3227
(cherry picked from commit dfa6aa8cea)
2020-11-16 21:23:00 -08:00
davidby-influx cc1e70baf4
Merge pull request #19869 from influxdata/DSB_SnapshotInProgress_3227
fix(tsm1): "snapshot in progress" error during backup
2020-11-12 15:42:22 -08:00
davidby-influx 0dcff81f56 fix(tsm1): "snapshot in progress" error during backup
Test the skipCacheOk flag to tsdb.Shard.CreateSnapshot() and
tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache
snapshot cannot be taken.

https://github.com/influxdata/plutonium/issues/3227
2020-11-05 16:50:51 -08:00
davidby-influx 6ec446f422 fix(tsm1): "snapshot in progress" error during backup
This fix adds a skipCacheOk flag to
tsdb.Store.CreateShardSnapshot() and tsdb.Shard.CreateSnapshot()
to pass to tsdb.Engine.CreateSnapshot()
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
This flag is set to false in tsm1.Engine.Export()

https://github.com/influxdata/plutonium/issues/3227
2020-11-05 11:08:08 -08:00
Ayan George 225bcecd73
fix: Upgrade version of jwt-go package to v4.0.0 (#19893)
* fix: Upgrade version of jwt-go package to v4.0.0

This commit updates the dependencies for influxdb to require v4.0.0-preview1 of
the jwt-go package.  This required updating the go.mod and go.sum files as well
as any source file that directly imported that package.

Prior to this commit, the TestHandler_Query_Auth() tests would fail as it
checked for specific error strigns returned by the jwt-go package.

Version 4.0.0-preview1 of the package changed the verbiage of those errors a
bit.  This patch updates the test to detect the new error string.
2020-11-05 10:55:24 -05:00
davidby-influx 23be20bf1b fix(tsm1): "snapshot in progress" error during backup
When an InfluxDB database is very busy writing new points the backup
the process can fail because it can not write a new snapshot.
The error is: operation timed out with error: create snapshot: snapshot in progress.
This happens because InfluxDB takes almost "continuously" a snapshot
from the cache caused by the high number of points ingested.
The fix for this was https://github.com/influxdata/influxdb/pull/16627
but it was for OSS only, and was not in the code path for backups
in clusters.
This fix adds a skipCacheOk flag to tsdb.Engine.CreateSnapshot().
A value of true allows the backup to proceed even if a cache snapshot
cannot be taken.
This flag is set to true in tsm1.Engine.Backup(), the OSS backup code path
and in tsdb.Shard.CreateSnapshot(), the cluster backup code path.
This flag is set to false in tsm1.Engine.Export()

https://github.com/influxdata/plutonium/issues/3227
2020-10-30 10:37:36 -07:00
Ayan George f7eb697dd3
refactor: Use filepath.Walk (#19514)
Prior to this commit, we had our own recursive file walker which
required a condition based on if s.Config.TypesDB pointed to a directory
or a regular file.

This commit replaces our own readdir() with filepath.Walk() and treats
recursing directories and loading one file as a single case.  This
simplifies the code quite a bit.
2020-10-21 10:29:48 -04:00
Ayan George b1def70670
feat: generate modern profiles (#19655)
* feat: generate modern profiles

Prior to this commit, influxd was writing legacy profiling data which
often (always?) required an accompanying executable to use.

This commit instructs influxd to write profiles in the new format which
can be examined without a binary.

While we're at it, this commit also adds the allocs and threadcreate
profiles.

Finally, this patch also changes the format of the downloaded tar in the
following ways:

* The profiles are added to the profile/ directory -- so instead of
  extracting the profiles into your current directory, they're placed in
  a "profiles" directory.

* This commit adds the .pb.gz extension to each of the files since
  they're gzipped protobuf files and not .txt.
2020-10-21 09:26:15 -04:00
David Norton 8e57d701bd
Merge pull request #19691 from influxdata/dn-disable-compaction-per-shard
feat: allow disable compaction per shard
2020-10-15 09:49:13 -04:00
David Norton 3d92eef720 feat: allow disable compaction per shard
This feature allows compaction to be disabled on a per-shard basis by
creating a file named do_not_compact in a shard's directory. When
disabled, a message is logged every 15 minutes with the reason for
compaction being disabled (existance of the file). This makes it easy to
know if compaction has been disabled for any shards by searching the log
for "compaction disabled" or running "find path/to/data -type f -name
do_not_compact".
2020-10-06 10:58:07 -04:00
Pavel Závora b8ca6f9298
Merge pull request #19631 from influxdata/fix/CORS_allows_patch_v1
fix(CORS): allow PATCH
2020-09-24 13:36:40 +02:00
Pavel Zavora e8f7b78d68 fix(CORS): allow PATCH 2020-09-24 11:55:22 +02:00
David Norton fb98ce63ec
Merge pull request #19420 from influxdata/fix-unlocked-map-access
fix: lock map before writes
2020-09-22 11:28:46 -04:00
Ayan George 431f073b9e
feat: Add -lponly flag to export sub-command (#19609)
When applied, this patch will add the -lponly flag to the export command
which instructs influx_inspect to only output line protocol without
comments and other out-of-band data.
2020-09-22 10:09:09 -04:00
Ayan George 42873d4424
chore: Quiet static analysis tools (#19509)
* Remove redundant type in slice/array declarations.
* Call t.Fatal() from test-functions, not non-test go-routines.
* Remove unnecessary empty value operator from ranges.
* Call defer .Close() methods only after checking for error on Open().
2020-09-05 12:43:29 -04:00
Ayan George 4ef4fe9aef fix(tsi1): Acquire a lock when modifying measurement map
This patch protects an internal map for concurrent use.

(*LogFile).Writes() method calls
(*LogFile).createMeasurementIfNotExists() which writes to a shared map.

(*LogFile).Writes() acquires a read-lock which leaves
createMeasurementIfNotExists() open to concurrent writes to its shared
map.

This commit adds the ExecEntries method to the *LogFile type so that we
can properly lock calls to (*LogFile).appendEntry() using defer.

(*LogFile).ExecEntries() is used to mostly replace the body of
(*LogFile).Writes() and incurs another function call since ExecEntries()
can't be inlined.  Below is the output of build with "-m -m -m" gcflags:

  ./log_file.go:1076:6: cannot inline (*LogFile).ExecEntries: unhandled op DEFER

The performace impact of the additional function call should be
negligable and is outwieghed by the safety and simplicity of using
defer.
2020-08-31 12:52:54 -04:00
Ayan George 1ffe13894d
chore: Use latest version of influxql package (#19460)
This commit updates our influxql dependency to hash 65d3ef77.
2020-08-28 11:31:50 -04:00
Ayan George 6297ede3d9
fix(tsdb): return error on nonexistent shard id (#17060)
Have Store.DeleteShard() return a useful error if it cannot find the
requested shard.

Fixes #17059
2020-08-24 14:34:44 +00:00
Ayan George 3436db4ebb
refactor: Use binary.Read() instead of io.ReadFull() (#19323)
The original version of verifyVersion() reads into a byte slice,
manually ensures its byte order, then converts it to a type comparable
with Version and MagicNumber.

This patch hides those details by calling binary.Read() and reading
values into properly typed variables.

This adds a bit of overhead but this code isn't in the hot-path and this
patch greatly simplifies the code.

verifyVersion() originally accepted an io.ReadSeeker.  It is only called
in once place and that function immediately calls seek after
verifyVersion(), therefore it is probably safe to call Seek() BEFORE
verifyVersion().

The benefit is that verifyVersion() is easier to test since we can pass
it a bytes.Buffer.

This patch adds a test for verifyVersion() as well as a benchmark.

benchmark                    old ns/op     new ns/op     delta
BenchmarkVerifyVersion-8     73.5          123           +67.35%

Finally, this commit moves verifyVersion() from writer.go to reader.go
which is where it is actually used.
2020-08-13 14:54:18 -04:00
Ayan George 6ce0e11738
feat: Collect values written stats (#19187)
* feat(engine/tsm1): Add WritePointsWithContext()

Add WritePontsWithContext() and make WritePoints() a thin wrapper for
it.

The purpose is to add statistics context values that we'll use to
propagate the number of fields and points written to calls up the call
chain.

* feat(tsdb): Add WriteToShardWithContext()

When applied, this patch adds WriteToShardWithContext() and wraps it
with WriteToShard() to preserve the API.

The the purpose of this addition is to propagate a context.Context value
to Shard.WritePointsWithContext().

* feat(tsdb/shard): Add WritePointsWithContext()

The purpose of adding WritePointsWithContext() is to propage context
values down to engine code and propage statistics via the context.Value
up to callers.

This patch also adds values written statistics to the shard.

* feat(http): Gather values written stats

WritePointsWithContext() was added to propagate context values down to
the engine and communicate stats to the caller.

* feat(http): Gather values written stats

WritePointsWithContext() was added to propagate context values down to
the engine and communicate stats to the caller.

* refactor: Change MetricKey to ContextKey

This patch gives the type we're useing for context keys a better name.
2020-08-12 11:26:12 -04:00
David Norton 8eade84355
Merge pull request #19252 from influxdata/dn-revert-disable-series-id-set-cache-size
fix(tsdb): revert disable series id set cache size by default
2020-08-07 14:44:17 -04:00
David Norton 94a4a3474d fix(tsdb): revert disable series id set cache size by default
This reverts commit 9c41e12ee4.
2020-08-07 14:06:03 -04:00
David Norton 619f0ab78e
Merge pull request #18667 from influxdata/new-http-headers
feat(service/httpd): Add user configurable HTTP headers
2020-07-08 13:59:59 -04:00
Tristan Su 6910c53440
feat(prometheus): update prometheus remote protocol (#17814)
Fetched up-to-date protocol from prometheus project
2020-07-08 07:12:52 -07:00
Jacob Marble 3f3b7b5160
chore: update some dependencies (#18786)
Helps #18528

This change bumps a couple of dependencies to prepare for something like #17814 which
updates many dependencies at once. Turns out that change is based on an
old commit, so several things have already been updated.

After this, we should do a separate commit to update prometheus per #18528
2020-07-06 14:34:55 -07:00