Commit Graph

367 Commits (main-2.x)

Author SHA1 Message Date
WeblWabl 06ab224516
fix(influxd): update xxhash, avoid stringtoslicebyte in cache (#578) (#25622)
* fix(influxd): update xxhash, avoid stringtoslicebyte in cache (#578)

* fix(influxd): update xxhash, avoid stringtoslicebyte in cache

This commit does 3 things:

* it updates xxhash from v1 to v2; v2 includes a assembly arm version of
  Sum64
* it changes the cache storer to write with a string key instead of a
  byte slice. The cache only reads the key which WriteMulti already has
as a string so we can avoid a host of allocations when converting back
and forth from immutable strings to mutable byte slices. This includes
updating the cache ring and ring partition to write with a string key
* it updates the xxhash for finding the cache ring partition to use
Sum64String which uses unsafe pointers to directly use a string as a
byte slice since it only reads the string. Note: this now uses an
assembly version because of the v2 xxhash update. Go 1.22 included new
compiler ability to recognize calls of Method([]byte(myString)) and not
make a copy but from looking at the call sites, I'm not sure the
compiler would recognize it as the conversion to a byte slice was
happening several calls earlier.

That's what this change set does. If we are uncomfortable with any of
these, we can do fewer of them (for example, not upgrade xxhash; and/or
not use the specialized Sum64String, etc).

For the performance issue in maz-rr, I see converting string keys to
byte slices taking between 3-5% of cpu usage on both the primary and
secondary. So while this pr doesn't address directly the increased cpu
usage on the secondary, it makes cpu usage less on both which still
feels like a win. I believe these changes are easier to review that
switching to a byte slice pool that is likely needed in other places as
the compiler provides nearly all of the correctness checks we need (we
are relying also on xxhash v2 being correct).

* helps #550

* chore: fix tests/lint

* chore: don't use assembly version; should inline

This 2 line change causes xxhash to use a purego Sum64 implementation
which allows the compiler to see that Sum64 only read the byte slice
input which them means is can skip the string to byte slice allocation
and since it can skip that, it should inline all the calls to
getPartitionStringKey and Sum64 avoiding 1 call to Sum64String which
isn't inlined.

* chore: update ci build file

the ci build doesn't use the make file!!!

* chore: revert "chore: update ci build file"

This reverts commit 94be66fde03e0bbe18004aab25c0e19051406de2.

* chore: revert "chore: don't use assembly version; should inline"

This reverts commit 67d8d06c02e17e91ba643a2991e30a49308a5283.

(cherry picked from commit 1d334c679ca025645ed93518b7832ae676499cd2)

* feat: need to update go sum

---------

Co-authored-by: Phil Bracikowski <13472206+philjb@users.noreply.github.com>
2024-12-05 16:57:26 -06:00
WeblWabl b88e74e6bb
fix(tsi1/partition/test): fix data races in test code (#57) (#25338) (#25344)
* fix(tsi1/partition/test): fix data races in test code (#57)

* fix(tsi1/partition/test): fix data races in test code

This PR is like influxdata/influxdb#24613 but solves it with a setter
method for MaxLogFileSize which allows unexporting that value and
MaxLogFileAge. There are actually two places locks were needed in test
code. The behavior of production code is unchanged.

(cherry picked from commit f0235c4daf4b97769db932f7346c1d3aecf57f8f)

* feat: modify error handling to be more idiomatic

closes https://github.com/influxdata/influxdb/issues/24042

* fix: errors.Join() filters nil errors

closes https://github.com/influxdata/influxdb/issues/25341
---------

Co-authored-by: Phil Bracikowski <13472206+philjb@users.noreply.github.com>
(cherry picked from commit 5c9e45f033)
2024-09-17 13:09:14 -05:00
WeblWabl 5a599383f1
fix(tsi1/partition/test): fix data races in test code (#57) (#25336)
* fix(tsi1/partition/test): fix data races in test code

This PR is like #24613 but solves it with a setter
method for MaxLogFileSize which allows unexporting that value and
MaxLogFileAge. There are actually two places locks were needed in test
code. The behavior of production code is unchanged.

(cherry picked from commit f0235c4daf4b97769db932f7346c1d3aecf57f8f)
2024-09-16 16:51:00 -05:00
Phil Bracikowski 713efbc164
fix(tsi1/partition/test): fix data race in test code (#24613)
* fix(tsi1/partition/test): fix data race in test code

TestPartition_Compact_Write_Fail test was not locking the partition
before changing the value of MaxLogFileSize. This PR exports the mutex
of the partition to allow the test to access it and lock. Alternatives
require more changes such as a Setter method if we need to hide the
mutex.

* fixes #24042, for #24040

* chore: complete renaming of mutex in file and fix flux test

The flux test is another failing test because it was using a relative
time range.
2024-01-30 20:01:20 -08:00
Jack 976ef20a32
fix: panic index out of range for invalid series keys [Port to main-2.x] (#24597)
* fix: cherry-pick to main-2.x
2024-01-23 17:09:10 +00:00
Jeffrey Smith II c854e53c2b
fix: chmod'ing the manifest is unnecessary (#24165) 2023-04-03 13:09:01 -04:00
Eng Zer Jun 903d30d658
test: use `T.TempDir` to create temporary test directory (#23258)
* test: use `T.TempDir` to create temporary test directory

This commit replaces `os.MkdirTemp` with `t.TempDir` in tests. The
directory created by `t.TempDir` is automatically removed when the test
and all its subtests complete.

Prior to this commit, temporary directory created using `os.MkdirTemp`
needs to be removed manually by calling `os.RemoveAll`, which is omitted
in some tests. The error handling boilerplate e.g.
	defer func() {
		if err := os.RemoveAll(dir); err != nil {
			t.Fatal(err)
		}
	}
is also tedious, but `t.TempDir` handles this for us nicely.

Reference: https://pkg.go.dev/testing#T.TempDir
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestSendWrite on Windows

=== FAIL: replications/internal TestSendWrite (0.29s)
    logger.go:130: 2022-06-23T13:00:54.290Z	DEBUG	Created new durable queue for replication stream	{"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestSendWrite1627281409\\001\\replicationq\\0000000000000001"}
    logger.go:130: 2022-06-23T13:00:54.457Z	ERROR	Error in replication stream	{"replication_id": "0000000000000001", "error": "remote timeout", "retries": 1}
    testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestSendWrite1627281409\001\replicationq\0000000000000001\1: The process cannot access the file because it is being used by another process.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestStore_BadShard on Windows

=== FAIL: tsdb TestStore_BadShard (0.09s)
    logger.go:130: 2022-06-23T12:18:21.827Z	INFO	Using data dir	{"service": "store", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestStore_BadShard1363295568\\001"}
    logger.go:130: 2022-06-23T12:18:21.827Z	INFO	Compaction settings	{"service": "store", "max_concurrent_compactions": 2, "throughput_bytes_per_second": 50331648, "throughput_bytes_per_second_burst": 50331648}
    logger.go:130: 2022-06-23T12:18:21.828Z	INFO	Open store (start)	{"service": "store", "op_name": "tsdb_open", "op_event": "start"}
    logger.go:130: 2022-06-23T12:18:21.828Z	INFO	Open store (end)	{"service": "store", "op_name": "tsdb_open", "op_event": "end", "op_elapsed": "77.3µs"}
    testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestStore_BadShard1363295568\002\data\db0\rp0\1\index\0\L0-00000001.tsl: The process cannot access the file because it is being used by another process.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestPartition_PrependLogFile_Write_Fail and TestPartition_Compact_Write_Fail on Windows

=== FAIL: tsdb/index/tsi1 TestPartition_PrependLogFile_Write_Fail/write_MANIFEST (0.06s)
    testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestPartition_PrependLogFile_Write_Failwrite_MANIFEST656030081\002\0\L0-00000003.tsl: The process cannot access the file because it is being used by another process.
    --- FAIL: TestPartition_PrependLogFile_Write_Fail/write_MANIFEST (0.06s)

=== FAIL: tsdb/index/tsi1 TestPartition_Compact_Write_Fail/write_MANIFEST (0.08s)
    testing.go:1090: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestPartition_Compact_Write_Failwrite_MANIFEST3398667527\002\0\L0-00000003.tsl: The process cannot access the file because it is being used by another process.
    --- FAIL: TestPartition_Compact_Write_Fail/write_MANIFEST (0.08s)

We must close the open file descriptor otherwise the temporary file
cannot be cleaned up on Windows.

Fixes: 619eb1cae6 ("fix: restore in-memory Manifest on write error")
Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestReplicationStartMissingQueue on Windows

=== FAIL: TestReplicationStartMissingQueue (1.60s)
    logger.go:130: 2023-03-17T10:42:07.269Z	DEBUG	Created new durable queue for replication stream	{"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestReplicationStartMissingQueue76668607\\001\\replicationq\\0000000000000001"}
    logger.go:130: 2023-03-17T10:42:07.305Z	INFO	Opened replication stream	{"id": "0000000000000001", "path": "C:\\Users\\circleci\\AppData\\Local\\Temp\\TestReplicationStartMissingQueue76668607\\001\\replicationq\\0000000000000001"}
    testing.go:1206: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestReplicationStartMissingQueue76668607\001\replicationq\0000000000000001\1: The process cannot access the file because it is being used by another process.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: update TestWAL_DiskSize

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

* test: fix failing TestWAL_DiskSize on Windows

=== FAIL: tsdb/engine/tsm1 TestWAL_DiskSize (2.65s)
    testing.go:1206: TempDir RemoveAll cleanup: remove C:\Users\circleci\AppData\Local\Temp\TestWAL_DiskSize2736073801\001\_00006.wal: The process cannot access the file because it is being used by another process.

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>

---------

Signed-off-by: Eng Zer Jun <engzerjun@gmail.com>
2023-03-21 16:22:11 -04:00
Jeffrey Smith II f74c69c5e4
chore: update to go 1.20 (#24088)
* build: upgrade to go 1.19

* chore: bump go.mod

* chore: `gofmt` changes for doc comments

https://tip.golang.org/doc/comment

* test: update tests for new sort order

* chore: make generate-sources

* chore: make generate-sources

* chore: go 1.20

* chore: handle rand.Seed deprecation

* chore: handle rand.Seed deprecation in tests

---------

Co-authored-by: DStrand1 <dstrandboge@influxdata.com>
2023-02-09 14:14:35 -05:00
Abirdcfly c433342830
chore: remove duplicate word in comments (#23685)
Signed-off-by: Abirdcfly <fp544037857@gmail.com>

Signed-off-by: Abirdcfly <fp544037857@gmail.com>
2022-09-13 11:00:52 -05:00
davidby-influx 619eb1cae6
fix: restore in-memory Manifest on write error (#23552) (#23578)
Do not update the `FileSet` or `activeLogFile` field in the in-memory
Partition structure if the Manifest file is not correctly saved to
the disk.

closes https://github.com/influxdata/influxdb/issues/23553

(cherry picked from commit a8732dcf52)

closes https://github.com/influxdata/influxdb/issues/23554
2022-07-25 10:53:09 -07:00
davidby-influx f762346ecc
fix: add paths to tsi log and index file errors (#23557) (#23562)
Add paths to various TSI errors on opening and unmarshaling files
to help poinpoint the corrupt files.

Closes https://github.com/influxdata/influxdb/issues/23556

(cherry picked from commit 25cea95beb)

closes https://github.com/influxdata/influxdb/issues/23558
2022-07-19 15:45:42 -07:00
davidby-influx 00edb77253
fix: create TSI MANIFEST files atomically (#23539) (#23546)
When a MANIFEST file is created in TSI, it
should be written to a temp file, then
atomically renamed, to avoid overwriting
the existing file only to fail on the
later write.

closes https://github.com/influxdata/influxdb/issues/23536

(cherry picked from commit 061cf55f2a)

closes https://github.com/influxdata/influxdb/issues/23538
2022-07-14 09:13:11 -07:00
davidby-influx 4789d5402a
fix: improve error messages opening index partitions (#23532) (#23535)
Where possible, add the file path path to any errors
on opening, reading, (un)marshaling, or validating
the various files comprising a partition

closes https://github.com/influxdata/influxdb/issues/23506

(cherry picked from commit a2dd708a26)

closes https://github.com/influxdata/influxdb/issues/23534
2022-07-13 13:20:47 -07:00
Dane Strandboge 8bd4fc502d
fix: lost TSI reference / close TagValueSeriesIDIterator in error case (#23461) 2022-06-16 13:35:45 -05:00
davidby-influx a9df3f8a7c
fix: fully clean up partially opened TSI (#23430) (#23454)
When one partition in a TSI fails to open, all previously opened
partitions should be cleaned up, and remaining partitions
should not be opened

closes https://github.com/influxdata/influxdb/issues/23427

(cherry picked from commit d3db48e93d)

closes https://github.com/influxdata/influxdb/issues/23432
2022-06-14 11:49:16 -07:00
Dane Strandboge 82d1123e78
build: upgrade to Go 1.18.1 (#23252) 2022-04-13 15:24:27 -05:00
Sam Arnold 799d349813
fix(tsi): sync index file before close (#22927)
(cherry picked from commit 5fd1b29d74)

Co-authored-by: lifeibo <lifeibo382005@gmail.com>
2021-11-24 15:52:45 -05:00
CasMc 2bace7767d
fix: unhandled errors returned by Sketch.Merge (#22858) 2021-11-10 09:26:24 -05:00
Sam Arnold 2ecbb68fc3
test: fix DiskSizeBytes flakiness (#22639) 2021-10-08 09:46:58 -04:00
Sam Arnold 7dfd7de81f
feat: set X-Influxdb-Version and X-Influxdb-Build headers (#22535)
Closes #20224
Also a forward port of #22038 since I saw the same test failing on 2.x
2021-09-22 07:30:45 -04:00
Sam Arnold 5015297d40
fix: more expressive errors (#22448)
* fix: more expressive errors

Closes #22446

* fix: server only logging for untyped errors

* chore: fix formatting
2021-09-13 15:12:35 -04:00
Daniel Moran 12fff64760
fix: make TSI index compact old and too-large log files (#22334)
*  TSI index should compact old or too-large log files
* Old tsl files should be compacted without new writes
* Add extra logging when disk size test fails


Co-authored-by: Sam Arnold <sarnold@influxdata.com>
2021-08-30 18:27:48 -04:00
Jonathan A. Sternberg f94783e016
build(flux): update flux to master and change renamed structs (#22281) 2021-08-26 10:07:02 -05:00
Daniel Moran 5aa91f0524
refactor: delete unused FileSet methods, clean up some errors (#22309) 2021-08-26 10:48:59 -04:00
Sam Arnold 962b9d7d02
fix: simplify file set, remove series file member (#21831) 2021-07-12 10:43:20 -04:00
Dane Strandboge ba31a0e260
feat: port `influx inspect dumptsi` subcommand (#21784) 2021-07-06 11:40:21 -05:00
Yun Zhao 4f535d281a
fix(tsi1): optimize the comparison of SeriesIDSet. (#21013) 2021-03-23 13:27:38 -04:00
Tristan Su 9c63033e8d
chore: clean up unused fields in FileSet (#20770)
Co-authored-by: Tristan Su <suqing.sq@alibaba-inc.com>
2021-03-05 09:55:03 -05:00
Daniel Moran 727a7b58c1
test: replace influxlogger with zaptest logger (#20589) 2021-02-11 10:12:39 -05:00
Sam Arnold 781fa0e846 chore: add goimports 2021-01-29 14:06:52 -05:00
Daniel Moran 9aefa6f868
fix(tsdb): never use an inmem index (#20313)
And fix the logging setup for the TSDB storage engine
2020-12-23 07:46:57 -08:00
Ben Johnson 65f42deec4
Merge pull request #20008 from influxdata/flakey-test-field-conflict-concurrent
fix: Add locking during tsi iterator creation.
2020-11-12 13:42:38 -07:00
Ben Johnson edb5e56881 fix: Add locking during tsi iterator creation.
This commit fixes a locking issue that caused the `TestShard_WritePoints_FieldConflictConcurrent`
test to fail.
2020-11-12 06:57:29 -07:00
Daniel Moran 15b9531273
fix: correct various typos (#19987)
Co-authored-by: kumakichi <xyesan@gmail.com>
2020-11-11 13:54:21 -05:00
sans 7dcaf5c639
fix: typos (#19734) 2020-10-13 09:50:32 -07:00
Ayan George ca2055c16c
refactor: Replace ctx.Done() with ctx.Err() (#19546)
* refactor: Replace ctx.Done() with ctx.Err()

Prior to this commit we checked for context cancellation with a select
block and context.Context.Done() without multiplexing over any other
channel like:

  select {
    case <-ctx.Done():
      // handle cancellation
    default:
      // fallthrough
  }

This commit replaces those type of blocks with a simple check of
ctx.Err().  This has the following benefits:

* Calling ctx.Err() is much faster than entering a select block.

* ctx.Done() allocates a channel when called for the first time.

* Testing the result of ctx.Err() is a reliable way of determininging if
  a context.Context value has been canceled.

* fix: Fix data race in execDeleteTagValueEntry()
2020-09-16 12:20:09 -04:00
Brett Buddin b917d8d9b0
chore(influxdb): Placate the linter. 2020-08-27 15:46:32 -04:00
Stuart Carnie dee8977d2c
chore: move v2/v1/tsdb → v2/tsdb 2020-08-26 10:46:47 -07:00
Mark Rushakoff f2898d1992 Wipe out workspace in preparation for v2 merge
"Knock knock."

"Who's there?"

"InfluxDB Veet."

...
2019-01-11 10:38:50 -08:00
Edd Robinson 3a055a6107 Fix cardinality estimation error
This commit fixes an error in the TSI index with estimating the
cardinality of series recently added and then removed.
2019-01-10 17:46:30 +00:00
Jeff Wendling 0a2f6191a6 tsdb: clean up fields index for every kind of delete
Before this, if you deleted everything with `delete where true`
for example, then you would be left with all of your measurements
in the fields index. That would cause ghost fields to reappear
if someone reinserted to the measurement.

This fixes that by making it so the deepest most delete code
checks if the measurement was removed from the index, and if so
cleaning it up out of the fields index.

Additionally, it fixes bugs in that cleanup code where if you had
a measurement like "m1" and "m10", when iterating over the cache
or file store, "m1" would match "m10" due to it only checking the
prefix. This also has it check the character right after the
measurement to be either a comma because tags started, or the first
character of the field separator.
2018-11-27 16:12:06 -07:00
Edd Robinson cade59e253 Fix panic in IndexSet
This commit fixes a panic where a concurrent removal of a shard and meta
query could cause a `nil` index to be added to the IndexSet`.
2018-10-26 18:23:54 +01:00
Ben Johnson bdcbad3fc9
Fix append of possible nil iterator.
This commit updates an iterator list to ignore `nil` iterators.
Adding a `nil` caused the `SeriesIterators.Close()` to panic.
2018-10-02 13:19:21 -06:00
Ben Johnson 0d777ad423
Fix tsi1 sketch locking. 2018-09-26 17:01:47 -06:00
Edd Robinson 812ac6da25 PR feedback 2018-09-18 15:58:38 -07:00
Edd Robinson a15bdeef92 Fix megacheck 2018-09-18 15:58:38 -07:00
Edd Robinson 76237d80f2 Address PR feedback 2018-09-18 15:58:38 -07:00
Ben Johnson e651153f1c Add TagValueSeriesIDCache.Delete(). 2018-09-18 15:58:38 -07:00
Ben Johnson fcbc03240a Inline mutex into TagValueSeriesIDCache. 2018-09-18 15:58:38 -07:00
Edd Robinson bdc293abdd Tidy up 2018-09-18 15:58:38 -07:00