Commit Graph

14909 Commits (db/wait-timeout-utility)

Author SHA1 Message Date
devanbenz 28ffa95b82 feat: Update to use debug logger 2025-05-23 12:42:32 -05:00
devanbenz 035b66c7c1 feat: use a debug log 2025-05-23 11:41:11 -05:00
devanbenz de93bab096 feat: Merge branch 'master-1.x' into db/wait-timeout-utility 2025-05-22 16:12:01 -05:00
davidby-influx eab8a8a6e8
fix: add locking in ClearBadShardList (#26423) 2025-05-19 09:14:07 -07:00
Geoffrey Wossum 66f4dbeaad
fix: limit number of concurrent optimized compactions (#26319)
Limit number of concurrent optimized compactions so that level compactions do not get starved. Starved level compactions result in a sudden increase in disk usage.

Add [data] max-concurrent-optimized-compactions for configuring maximum number of concurrent optimized compactions. Default value is 1.

Co-authored-by: davidby-influx <dbyrne@influxdata.com>
Co-authored-by: devanbenz <devandbenz@gmail.com>
Closes: #26315
2025-05-06 15:42:39 -05:00
davidby-influx 62e803e673
feat: improve dropped point logging (#26257)
Log the reason for a point being dropped,
the type of boundary violated, and the
time that was the boundary. Prints the
maximum and minimum points (by time)
that were dropped

closes https://github.com/influxdata/influxdb/issues/26252

* fix: better time formatting and additional testing

* fix: differentiate point time boundary violations

* chore: clean up switch statement

* fix: improve error messages
2025-04-18 15:18:19 -07:00
devanbenz eb3e879ce6 fix: Use 10*millisecond for ticker 2025-04-18 16:06:12 -05:00
devanbenz 586ab5ad66 feat: rename channel and pass as arg into go lambda 2025-04-18 15:47:03 -05:00
devanbenz 3f500b05ab feat: Pass in a channel 2025-04-18 15:13:51 -05:00
WeblWabl 6c687520dd
Merge branch 'master-1.x' into db/wait-timeout-utility 2025-04-18 14:50:33 -05:00
devanbenz e59a96b5d7 fix: checkfmt 2025-04-18 14:33:03 -05:00
devanbenz d7482b02b2 feat: pass in emitter that can be used for timeouts 2025-04-18 14:21:26 -05:00
Jamie Strandboge f61a082618
chore: update to go 1.23.8 (#26293) 2025-04-18 13:53:04 -05:00
Jamie Strandboge 58475a1b36
chore: use github.com/golang-jwt/jwt/v4 and update golang.org/x/net to v0.38.0 (1.x) (#26292)
* chore: update to supported github.com/golang-jwt/jwt/v4

* chore(dep): update golang.org/x/net to v0.38.0
2025-04-18 13:52:55 -05:00
devanbenz 961c19a86b feat: add wg_timeout package and wrap Close with timeout 2025-04-18 13:42:53 -05:00
devanbenz df86dfc8f1 feat: Add WaitWithTimeout to Partition
This PR makes it easier to debug potential hanging retention service
routines during DeleteShard.
2025-04-18 13:22:19 -05:00
davidby-influx 53329a3ad3
feat: use zap.AtomicLevel for dynamic logging levels (#26182)
Use the zap.AtomicLevel struct for log levels
which allows the level to be changed dynamically.
Enterprise will use this feature.
2025-04-17 10:07:33 -07:00
WeblWabl 8358f1beb9
fix: Modify package publishing to fix slack msg & publish_packages (#26279) 2025-04-16 15:55:57 -05:00
WeblWabl 96e44cac73
fix: PlanOptimize is running too frequently (#26211)
PlanOptimize is being checked far too frequently. This PR is the simplest change that can be made in order to ensure that PlanOptimize is not being ran too much. To alleviate the frequency I've added a lastWrite parameter to PlanOptimize and added an additional test that mocks the edge cause out in the wild that led to this PR.

Previously in test cases for PlanOptimize I was not checked to see if certain cases would be picked up by Plan I've adjusted a few of the existing test cases after modifying Plan and PlanOptimize to have the same lastWrite time.
2025-04-08 12:22:29 -05:00
Geoffrey Wossum 61f21c5adb
chore(ci): push artifiacts to public bucket (#26190)
* chore(ci): push artifacts to public bucket (#25435)

Clean cherry-pick of #25435 to master-1.x.

(cherry picked from commit ca80b243ed)

* chore: port #24491 to master-1.x

Port a portion of #24491 that was not included in previous cherry-picks to master-1.x
2025-03-25 12:31:31 -05:00
WeblWabl 77d6f20894
feat: Upgrade influxql to v1.4.1 (#26181) 2025-03-21 12:24:38 -05:00
WeblWabl 6cda9c903e
fix: Remove nil dereference (#26154) 2025-03-18 08:11:22 -05:00
davidby-influx 9e00f0de98
fix: do not panic on invalid multiple subqueries (#26143)
Multiple subqueries in a FROM clause caused a
panic, insead of returning an error because
they are syntactically invalid. This corrects
that problem

closes https://github.com/influxdata/influxdb/issues/26139
2025-03-14 13:38:57 -07:00
WeblWabl d8bcbd894c
feat: Add CompactPointsPerBlock config opt (#26100)
* feat: Add CompactPointsPerBlock config opt
This PR adds an additional parameter for influxd
CompactPointsPerBlock. It adjusts the DefaultAggressiveMaxPointsPerBlock
to 10,000. We had discovered that with the points per block set to
100,000 compacted TSM files were increasing. After modifying the
points per block to 10,000 we noticed that the file sizes decreased.
The value has been set as a parameter that can be adjusted by administrators
this allows there to be some tuning if compression problems are encountered.
2025-03-05 14:59:06 -06:00
davidby-influx 2ab5aad52e
chore: add logging to Filestore.purger (#26089)
Also fixes error type checks in
TestCompactor_CompactFull_InProgress
2025-03-05 11:46:07 -08:00
davidby-influx 1efb8dad43
fix: remove temp files on error in Compactor.writeNewFiles (#26074)
Compactor.writeNewFiles should delete
temporary files created on iterations
before an error halts the compaction.

closes https://github.com/influxdata/influxdb/issues/26073
2025-02-27 08:17:48 -08:00
davidby-influx ba95c9b0f0
fix: ensure temp files removed on failed compaction (#26070)
Add more robust temporary file removal
on a failed compaction. Don't halt on
a failed removal, and don't assume a
failed compaction won't generate
temporary files.

closes https://github.com/influxdata/influxdb/issues/26068
2025-02-26 13:17:17 -08:00
davidby-influx 083b679b56
fix: ensure fields in memory match on disk
A field could be created in  memory but not
saved to disk if a later field in that
point was invalid (type conflict, too big)
Ensure that if a field is created, it is
saved.
2025-02-24 13:53:40 -08:00
WeblWabl 03b6ed2bed
feat: Upgrade flux to v0.196.1 (#26041)
* feat: update flux to 0.196.1

* feat: Update proto files
This updates from protoc-gen-go v1.33.0 -> v1.34.1
and protoc from v5.26.1 -> v5.29.2
2025-02-20 13:46:06 -06:00
davidby-influx 5f576331d3
chore: refactor field creation for maintainability
Address review comments in the port work of the
field creation. Also fixes one bug in returning the wrong
error.
2025-02-18 14:00:11 -08:00
davidby-influx b617eb24a7
fix: switch MeasurementFields from atomic.Value to sync.Map (#26022)
Simplify and speed up synchronization for
MeasurementFields structures by switching
from a mutex and atomic.Value to a sync.Map
2025-02-13 16:53:25 -08:00
davidby-influx 5a20a835a5
fix: lock MeasurementFields while validating (#25998)
There was a window where a race between writes with
differing types for the same field were being validated.
Lock the  MeasurementFields struct during field
validation to avoid this.

closes https://github.com/influxdata/influxdb/issues/23756
2025-02-13 11:33:34 -08:00
WeblWabl 4ad5e2aba7
feat: Add error join for file writing in snapshots (#26004)
This PR adds an error join to help with handling multiple errors
from snapshot file writers.
2025-02-12 15:06:43 -06:00
WeblWabl 306a184a8d
feat: Add error joins/returns (#25996)
This pr adds err handling for branch that did not specify os file removal errors
previously. This is part of EAR #5819.
2025-02-11 12:15:25 -06:00
davidby-influx f54a34ae33
fix: actually call the deferred function (#25952) 2025-01-31 15:42:38 -08:00
WeblWabl edf5ff20f6
feat: updates go to 1.23.5 (#25926)
* feat: updates go to 1.23.5 and gosnowflake to 1.9.0
2025-01-28 13:31:31 -06:00
davidby-influx 800970490a
fix: move aside TSM file on errBlockRead (#25839)
The error type check for errBlockRead was incorrect,
and bad TSM files were not being moved aside when
that error was encountered. Use errors.Join,
errors.Is, and errors.As to correctly unwrap multiple
errors.

Closes https://github.com/influxdata/influxdb/issues/25838
2025-01-22 10:46:31 -08:00
WeblWabl f04105bede
feat: Modify optimized compaction to cover edge cases (#25594)
* feat: Modify optimized compaction to cover edge cases
This PR changes the algorithm for compaction to account for the following
cases that were not previously accounted for:

- Many generations with a groupsize over 2 GB
- Single generation with many files and a groupsize under 2 GB
- Where groupsize is the total size of the TSM files in said shard directory.
- shards that may have over a 2 GB group size but
many fragmented files (under 2 GB and under aggressive
point per block count)

closes https://github.com/influxdata/influxdb/issues/25666
2025-01-14 14:51:09 -06:00
WeblWabl e2d76edb40
feat: expose NewEncoder from logging package (#25710)
* feat: This PR exposes NewEncoder from our internal logger package
2025-01-14 12:15:17 -06:00
mwdmwd 7999835ac3
feat: influx_inspect export from a single tsm file (#25530)
* feat: This PR adds -tsm file flag to export

Adds the ability to use influx_inspect export to export data from a single tsm file, for example influx_inspect export -out - -tsmfile 000000006-000000002.tsm.bad -database thermo -retention autogen.
2025-01-13 13:48:35 -06:00
davidby-influx e974165d25
fix: do not leak file handles from Compactor.write (#25725)
There are a number of code paths in Compactor.write which
on error can lead to leaked file handles to temporary files.
This, in turn, prevents the removal of the temporary files until
InfluxDB is rebooted, releasing the file handles.

closes https://github.com/influxdata/influxdb/issues/25724
2025-01-03 14:43:41 -08:00
davidby-influx 694607a22c
fix: avoid panic if shard group has no shards (#25717) (#25719)
Avoid panicking when mapping points to a shard group
that has no shards. This does not address the root problem,
how the shard group ended up with no shards.

helps: https://github.com/influxdata/influxdb/issues/25715
(cherry picked from commit 5b364b51c8)

closes: https://github.com/influxdata/influxdb/issues/25718
2024-12-27 14:30:01 -08:00
cpinflux db523227a2
feat: Added fluxQueryRespBytes metric to 1.x /debug/vars (#25669)
This PR adds an additional statistic "fluxQueryRespBytes" to the output of /debug/vars, in turn making it available to Telegraf and other monitoring tools.

Closes https://github.com/influxdata/influxdb/issues/25671
2024-12-17 11:35:45 -08:00
WeblWabl 45a8227ad6
fix(influxd): update xxhash, avoid stringtoslicebyte in cache (#578) (#25622) (#25624)
* fix(influxd): update xxhash, avoid stringtoslicebyte in cache (#578)

* fix(influxd): update xxhash, avoid stringtoslicebyte in cache

This commit does 3 things:

* it updates xxhash from v1 to v2; v2 includes a assembly arm version of
  Sum64
* it changes the cache storer to write with a string key instead of a
  byte slice. The cache only reads the key which WriteMulti already has
as a string so we can avoid a host of allocations when converting back
and forth from immutable strings to mutable byte slices. This includes
updating the cache ring and ring partition to write with a string key
* it updates the xxhash for finding the cache ring partition to use
Sum64String which uses unsafe pointers to directly use a string as a
byte slice since it only reads the string. Note: this now uses an
assembly version because of the v2 xxhash update. Go 1.22 included new
compiler ability to recognize calls of Method([]byte(myString)) and not
make a copy but from looking at the call sites, I'm not sure the
compiler would recognize it as the conversion to a byte slice was
happening several calls earlier.

That's what this change set does. If we are uncomfortable with any of
these, we can do fewer of them (for example, not upgrade xxhash; and/or
not use the specialized Sum64String, etc).

For the performance issue in maz-rr, I see converting string keys to
byte slices taking between 3-5% of cpu usage on both the primary and
secondary. So while this pr doesn't address directly the increased cpu
usage on the secondary, it makes cpu usage less on both which still
feels like a win. I believe these changes are easier to review that
switching to a byte slice pool that is likely needed in other places as
the compiler provides nearly all of the correctness checks we need (we
are relying also on xxhash v2 being correct).

* helps #550

* chore: fix tests/lint

* chore: don't use assembly version; should inline

This 2 line change causes xxhash to use a purego Sum64 implementation
which allows the compiler to see that Sum64 only read the byte slice
input which them means is can skip the string to byte slice allocation
and since it can skip that, it should inline all the calls to
getPartitionStringKey and Sum64 avoiding 1 call to Sum64String which
isn't inlined.

* chore: update ci build file

the ci build doesn't use the make file!!!

* chore: revert "chore: update ci build file"

This reverts commit 94be66fde03e0bbe18004aab25c0e19051406de2.

* chore: revert "chore: don't use assembly version; should inline"

This reverts commit 67d8d06c02e17e91ba643a2991e30a49308a5283.

(cherry picked from commit 1d334c679ca025645ed93518b7832ae676499cd2)

* feat: need to update go sum

---------

Co-authored-by: Phil Bracikowski <13472206+philjb@users.noreply.github.com>
(cherry picked from commit 06ab224516)
2024-12-06 16:05:03 -06:00
davidby-influx eea87ba94c
fix: log rejected writes to subscriptions (#25589)
Log writes to subscriptions that are rejected because
the queue is full by bytes or by length metrics.
2024-11-25 16:11:04 -08:00
WeblWabl 75eb209f72
feat(influx_inspect): Adds an additional log to rebuild TSI (#25575)
Closes https://github.com/influxdata/feature-requests/issues/612
2024-11-21 15:28:27 -06:00
davidby-influx 19f65f50b7
fix: optimise write window check (#25558)
And expose types and methods for Enterprise use.
2024-11-15 14:41:30 -08:00
davidby-influx 07c261a21a
feat: allow the specification of a write window for retention policies (#25517)
Add FutureWriteLimit and PastWriteLimit to retention
policies. Points which are outside of
now() + FutureWriteLimit
or
now() - PastWriteLimit
will be rejected on write with a PartialWriteError.

closes https://github.com/influxdata/influxdb/issues/25424
2024-11-15 13:30:14 -08:00
davidby-influx d2f874b411
feat: improve logging for subscriptions
Print the subscription name, destination,
retention policy, and database on errors in subscription writes

closes https://github.com/influxdata/influxdb/issues/25518
2024-11-14 15:47:07 -08:00
Geoffrey Wossum 8497fbf0af
chore: remove unnecessary fmt.Sprintf calls (#25536)
Remove unnecessary fmt.Sprintf calls for static code checks in main-2.x.
2024-11-12 11:06:39 -06:00