Commit Graph

49662 Commits (pbarnett/update-script-to-dockerhub)

Author SHA1 Message Date
Carol (Nichols || Goulding) d56217a8f6
refactor: Define the default limits on the types, not the catalog 2023-09-15 16:23:30 -04:00
Carol (Nichols || Goulding) 94b86e317b
refactor: Extract service protection types to their own module 2023-09-15 13:41:20 -04:00
Carol (Nichols || Goulding) c32a04388c
feat: Wrap max tables and max columns per table values in newtypes 2023-09-15 13:09:36 -04:00
Carol (Nichols || Goulding) ab7282795a
refactor: Extract functions for creating repeated NamespaceSchema values 2023-09-15 13:09:36 -04:00
Carol (Nichols || Goulding) 3614ea4e70
refactor: Share a test fn and constants between router tests 2023-09-15 13:09:36 -04:00
Carol (Nichols || Goulding) b63290daf4
fix: Remove a redundant function
This function is largely duplicated with namespace_to_proto, and the
other responses in this file don't make helper functions for
constructing the response type, so make Create more similar to the other
actions.
2023-09-15 13:09:36 -04:00
Carol (Nichols || Goulding) 307a450f51
refactor: Extract a function for comparing NamespaceSchemas
Namely, those attributes that are simple values, as opposed to tables
which is more complicated and is the aspect under test.
2023-09-15 13:09:35 -04:00
Carol (Nichols || Goulding) c00cd95f9d
refactor: Extract a function for constructing NamespaceCreated messages in the router
So that the same conversion can happen in the tests and one assert_eq!
can check everything rather than repeating lots of assertions for every
test for every field.
2023-09-15 13:09:35 -04:00
Carol (Nichols || Goulding) f6c7fb4403
refactor: Reuse protobuf conversion functions for tests
So that we can use PartialEq rather than comparing each field
individually.

Also take a reference to a namespace; this function doesn't need
ownership.
2023-09-15 13:09:35 -04:00
Carol (Nichols || Goulding) 0994264152
refactor: Use a test constant rather than redefining 2023-09-15 13:09:35 -04:00
Carol (Nichols || Goulding) 553e34a7f3
refactor: Share some test constants in a common parent module 2023-09-15 13:09:35 -04:00
Fraser Savage 827f4beb05
feat(ingester): Expose metric for finished replay of whole & truncated segment files 2023-09-15 16:47:41 +01:00
Fraser Savage bc3b421618
fix(ingester): Include truncated WAL file for max sequence number calculation 2023-09-15 16:24:43 +01:00
Martin Hilton 421b78e48b
feat(iox_query): support timezone in gap-filling (#8745)
When gap-filling make the output time array have the same timezone
as the imput time array.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-15 14:55:16 +00:00
Fraser Savage 32853b77ed
refactor(ingester): Only discard truncated write during replay for final segment
This moves the error handling up the the file level replay loop, being
stricter about which files are considered "replayed" when they are
truncated. Any files other than the most recent segment file which
encounter and unexpected are not considered to be safe to replay and
discard.
2023-09-15 15:45:58 +01:00
Carol (Nichols || Goulding) fb351ac3e1
test: Add a test encoding expected behavior of validate_or_insert_schema (#8738)
I was confused about whether validate_or_insert_schema should return all
columns a table has in the catalog if another process has added some.

Dom explained that no, this is by design-- the validate_or_insert_schema
function shouldn't be fetching any extra columns from the catalog, only
inserting missing columns from the diff set being processed during a
write.

The NamespaceCache/gossip system takes care of eventually converging
schemas at a higher level.

To avoid anyone having to go through the understanding path I just did,
encode this expected behavior in a test for future reference.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-15 14:36:19 +00:00
Martin Hilton ada65d7389
feat(query_functions): suport timezone in selector_* functions (#8742)
Update the selector functions to output the selected time in the
same timezone as input time array. This will not have any effect
on the rest of the system yet as timezones are not used anywhere.
This change is being done in preparation for making use of timezones.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-15 11:54:28 +00:00
dependabot[bot] 8fddbc395b
chore(deps): Bump mockito from 1.1.0 to 1.1.1 (#8741)
Bumps [mockito](https://github.com/lipanski/mockito) from 1.1.0 to 1.1.1.
- [Release notes](https://github.com/lipanski/mockito/releases)
- [Commits](https://github.com/lipanski/mockito/compare/1.1.0...1.1.1)

---
updated-dependencies:
- dependency-name: mockito
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-15 08:51:03 +00:00
kodiakhq[bot] 477e6df639
Merge pull request #8735 from influxdata/dom/persist-threads
perf: use half of logical cores for persist exec
2023-09-14 16:05:27 +00:00
Dom Dwyer 1bb4c08067
perf: use half of logical cores for persist exec
Changes the default ingester configuration to assign half the logical
cores to datafusion for persist execution. Prior to this commit,
datafusion always used 4 threads by default.

In situations where the ingesters are configured with 4 logical cores or
less, the periodic persist can start enough persist jobs to keep the 4
threads assigned to datafusion busy. Because there are enough threads to
saturate all CPU cores, these CPU-heavy persist threads can impact write
latency by stealing CPU time from the tokio runtime threads.

This change assigns exactly half the threads to DF by default, ensuring
there's always N/2 cores to service I/O heavy API requests.
2023-09-14 17:54:33 +02:00
Dom Dwyer 7ad26e6d0e
feat: only limit non-empty partitions
This changes the per-namespace buffered partition limiter to only
consider non-empty partitions when enforcing the partition limit.

Non-empty partitions cost a small amount of RAM, but are not added to
the persist queue - only non-empty partitions will need persisting, so
the limiter only needs to limit non-empty partitions.

This commit also significantly improves the consistency properties of
the limiter - the limit no longer suffers from a small window of
"overrun" due to non-atomic updates w.r.t partition creation - the limit
is now exact.

As an optimisation, partitions are not created at all if the limit has
been reached, preventing an accumulation of empty partitions whilst the
limit is being enforced.
2023-09-14 16:50:48 +02:00
Dom Dwyer 3978b07a43
feat(ingester): partition is_empty()
Adds an is_empty() method to the PartitionData, returning true iff a
subsequent query of the partition would return no rows.
2023-09-14 16:50:23 +02:00
Dom 11604f1f70
Merge pull request #8733 from influxdata/dom/partition-use-builder
test: accept PartitionDataBuilder in provider
2023-09-14 15:34:42 +01:00
Dom b0b93a1225
Merge branch 'main' into dom/partition-use-builder 2023-09-14 15:28:35 +01:00
Nga Tran ac426fe5e1
feat: ingester reads `sort_key_ids` instead of `sort_key` (#8588)
* feat: have ingester's SortKeyState include sort_key_ids

* fix: test failures

* chore: address review comments

* feat: first step to compare sort_key_ids

* feat: compare sort_key_ids in cas_sort_key

* fix: comment typos

* feat: ingester reads sort_key_ids instead of sort_key

* refactor: use direct assert instead of going true a function

* chore: fix typo

* test: add tests and comments

* chore: fix typos

* test: add more test to handle empty sort key

* chore: address review comments

* fix: type

* chore: address review comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-14 14:06:21 +00:00
Fraser Savage f5bfe72c7b
refactor(wal): Remove `IncompleteEntry` reader error 2023-09-14 14:59:57 +01:00
Dom Dwyer 429d90cde9
test: accept PartitionDataBuilder in provider
Use the PartitionDataBuilder in the MockPartitionProvider, allowing the
test caller to specify any necessary parameters, but still allow the
mock provider to inject the arguments it was called with.
2023-09-14 15:41:14 +02:00
Fraser Savage a160b97977
chore(ingester): Address review comments
Lower log from error to warn, clarify. Undo rename of `replay_file`.

Co-authored-by: Dom <dom@itsallbroken.com>
2023-09-14 12:02:36 +01:00
Dom 75ceafff54
Merge pull request #8723 from influxdata/dom/ingester-partition-bound
feat(ingester): buffered partition limit
2023-09-14 11:25:48 +01:00
Dom 0ea2dfbe01
Merge branch 'main' into dom/ingester-partition-bound 2023-09-14 11:19:44 +01:00
kodiakhq[bot] a07596f05f
Merge pull request #8672 from influxdata/savage/respect-ingest-system-state-during-wal-replay
feat(ingester): Allow read of `IngestState` with exceptions
2023-09-14 09:55:59 +00:00
kodiakhq[bot] dd0ee28e02
Merge branch 'main' into savage/respect-ingest-system-state-during-wal-replay 2023-09-14 09:50:03 +00:00
Fraser Savage 04c5e89c96
test(ingester): Add cover of multi-exceptions back to `IngestState`
This adds a level of assurance that multiple error states set are
ignored when they are all are present in the exceptions, while disjoint
error states and exceptions return an error. Arbitrary sets could be
covered, but would like require taking a non-const array for
`read_with_exceptions`.
2023-09-14 10:42:36 +01:00
dependabot[bot] 0d51a1ca6f
chore(deps): Bump serde_json from 1.0.106 to 1.0.107 (#8731)
Bumps [serde_json](https://github.com/serde-rs/json) from 1.0.106 to 1.0.107.
- [Release notes](https://github.com/serde-rs/json/releases)
- [Commits](https://github.com/serde-rs/json/compare/v1.0.106...v1.0.107)

---
updated-dependencies:
- dependency-name: serde_json
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-14 09:28:30 +00:00
dependabot[bot] 71315d8ab6
chore(deps): Bump toml from 0.7.8 to 0.8.0 (#8730)
Bumps [toml](https://github.com/toml-rs/toml) from 0.7.8 to 0.8.0.
- [Commits](https://github.com/toml-rs/toml/compare/toml-v0.7.8...toml-v0.8.0)

---
updated-dependencies:
- dependency-name: toml
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-14 09:22:18 +00:00
Marco Neumann 4ad21e1eca
feat: decode time portion of the partition key (#8725)
* refactor: make partition key parsing more flexible

* feat: decode time portion of the partition key

Helpful for #8705 because we can prune partitions earlier during the
query planning w/o having to consider their parquet files at all.
2023-09-14 09:15:11 +00:00
Marco Neumann b5c0c9c167
feat: allow fallback to generic TS column range for chunk stats (#8724)
This will be useful for #8705.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-14 08:37:50 +00:00
dependabot[bot] 82c45a798c
chore(deps): Bump libc from 0.2.147 to 0.2.148 (#8729)
Bumps [libc](https://github.com/rust-lang/libc) from 0.2.147 to 0.2.148.
- [Release notes](https://github.com/rust-lang/libc/releases)
- [Commits](https://github.com/rust-lang/libc/compare/0.2.147...0.2.148)

---
updated-dependencies:
- dependency-name: libc
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-14 08:30:21 +00:00
Marco Neumann c49c6159ef
refactor: change "normalization" in projected schema cache (#8720)
* refactor: "projected schema" cache inputs must be normalized

Normalizing under the hood and returning normalized schemas w/o the user
knowing about it is a good source for subtle bugs.

* refactor: do not normalize projected schema by name

Normalizing makes it harder to predict the output and potentially
requires additional string lookups just to work with the schema.

* fix: typos

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Martin Hilton <mhilton@influxdata.com>

---------

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: Martin Hilton <mhilton@influxdata.com>
2023-09-13 15:25:38 +00:00
Dom Dwyer 9e3f7611bb
feat(ingester): buffered partition limit
This commit adds an optional (disabled by default) limit on the number
partitions that may be buffered for a namespace at any one time.

The exact value is configurable by setting
INFLUXDB_IOX_MAX_PARTITIONS_PER_NAMESPACE to a non-zero value, and is
disabled unless specified.
2023-09-13 17:14:43 +02:00
Dom Dwyer db5ad12b9a
docs: remove misleading documentation
In an ArcMap, an init() function is called exactly once, this sentence
was supposed to suggest threads race to call init, but instead it sounds
like they race to initialise a V (via init()) and put it in the map
before the other thread, which is incorrect.
2023-09-13 17:14:42 +02:00
Dom e3e66145da
Merge pull request #8722 from influxdata/dom/table-metadata
refactor: move table metadata alongside resolver
2023-09-13 16:14:33 +01:00
Fraser Savage eaa63c6392
test(ingester): Use simpler set logic for `read_with_exception` test 2023-09-13 15:53:16 +01:00
Dom Dwyer 9379a227c4
refactor: move table metadata alongside resolver
We already have a metadata resolver, so lets stick the metadata types
next to it.
2023-09-13 15:54:02 +02:00
Fraser Savage 7174262d4b
chore(ingester): DRY `IngestState` mask
Co-authored-by: Dom <dom@itsallbroken.com>
2023-09-13 11:46:40 +01:00
dependabot[bot] 2477bdbbee
chore(deps): Bump clap from 4.4.2 to 4.4.3 (#8719)
Bumps [clap](https://github.com/clap-rs/clap) from 4.4.2 to 4.4.3.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v4.4.2...v4.4.3)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-13 09:17:52 +00:00
Nga Tran a86c77213e
feat: compare sort_key_ids in cas_sort_key (#8579)
* feat: have ingester's SortKeyState include sort_key_ids

* fix: test failures

* chore: address review comments

* feat: first step to compare sort_key_ids

* feat: compare sort_key_ids in cas_sort_key

* fix: comment typos

* refactor: use direct assert instead of going true a function

* chore: fix typo

* test: add tests and comments

* chore: fix typos

* test: add more test to handle empty sort key

* chore: address review comments

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-12 15:10:57 +00:00
Marco Neumann c466f469ea
refactor: optimize `PartitionHashId` hashing (#8712)
There is no need to hash a hash.

Found while investigating https://github.com/influxdata/EAR/issues/4505
and the hashing code turned up in the profile. In general, hashing IDs
should be pretty cheap.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-12 08:35:13 +00:00
Marco Neumann d31f54754f
refactor: replace tokio `oneshot` w/ futures `oneshot` (#8713)
Tokio oneshots have A LOT of overhead:

61042b4d90/tokio/src/sync/oneshot.rs (L1091-L1097)

For a particular case that I've debugged (https://github.com/influxdata/EAR/issues/4505),
that change alone decreases the "cold" query time from 16s to 11s.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-09-12 08:29:18 +00:00
Joe-Blount ce34d4ffa3
fix: handle oversized files in compactor (#8700) 2023-09-12 00:18:56 +00:00