Commit Graph

513 Commits (20f1ae1c8fb5a81dc806b2f327cc14a5394c992c)

Author SHA1 Message Date
Carol (Nichols || Goulding) c27d3a22d2
fix: Remove namespace argument from test helper function 2022-11-14 16:46:04 -05:00
Carol (Nichols || Goulding) 3943faf998
fix: Remove namespace from DmlWrite and DmlDelete constructors 2022-11-14 16:46:04 -05:00
Carol (Nichols || Goulding) f78195f7c7
fix: Remove namespace name field from DmlWrite and DmlDelete
But leave the argument in their constructors for now.

Not all numbers in tests can be 42, Dom.
2022-11-14 16:46:04 -05:00
Carol (Nichols || Goulding) c203e8295f
test: Keep track of namespaces by ID in ingester TestContext 2022-11-14 16:46:04 -05:00
kodiakhq[bot] 6c1e9f04ef
Merge branch 'main' into dom/deferred-table-name 2022-11-14 18:22:46 +00:00
Carol (Nichols || Goulding) fd898cea2a
docs: Correct grammar and update outdated comment 2022-11-14 13:21:55 -05:00
dependabot[bot] a969754819
chore(deps): Bump chrono from 0.4.22 to 0.4.23 (#6129)
* chore(deps): Bump chrono from 0.4.22 to 0.4.23

Bumps [chrono](https://github.com/chronotope/chrono) from 0.4.22 to 0.4.23.
- [Release notes](https://github.com/chronotope/chrono/releases)
- [Changelog](https://github.com/chronotope/chrono/blob/main/CHANGELOG.md)
- [Commits](https://github.com/chronotope/chrono/compare/v0.4.22...v0.4.23)

---
updated-dependencies:
- dependency-name: chrono
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* refactor: chrono future compat

Integer->timstamp conversions should not silently panic.

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-14 13:34:09 +00:00
Dom Dwyer 413b7c8f4a
refactor: use table name from catalog
Changes the TableData within the ingester to utilise a TableNameResolver
to fetch the TableName via the catalog on demand / in the background,
instead of using the table name sent over the write.

This change causes the ingester to perform a catalog query in the
background (or on demand) to resolve the table name. This is a
pre-requisite for removing the table name from the write wire format.
2022-11-14 11:32:22 +01:00
Dom Dwyer 0df6c7877c
refactor: indirect DeferredLoad<TableName> init
Like the NamespaceNameProvider, this commit adds a TableNameProvider to
provide decoupled initialisation of a DeferredLoad<TableName> instead of
hard-coding in a catalog instance / query code, and plumbs it into
position to be used when initialising a TableName.
2022-11-14 11:32:21 +01:00
Dom Dwyer 8dae6d3994
perf(ingester): address tables by ID only
Changes the buffer tree to address TableData by their ID only (removing
support for addressing tables by their string names). This removes the
double reference book keeping / twin indexes and associated overhead.

As part of this change, the TableName is now wrapped in a DeferredLoad
in preparation for removal of the names in the DmlOperation wire format.

This commit also switches the map of TableData within the NamespaceData
(the parent node) to use the ArcMap for faster lookups and DRY
exactly-once initialisation.
2022-11-14 11:27:19 +01:00
Dom Dwyer d8fc9ff258
test: fix testing deadlocks
The MemCatalog suffers from deadlocks when attempting to obtain a second
ref to RepoCollection:

    https://github.com/influxdata/influxdb_iox/issues/3859
2022-11-14 10:50:10 +01:00
Dom Dwyer 9e97866b48
refactor: internalise PartitionProvider
Removes the need to leak the PartitionProvider outside of the ingester
crate.

This will allow the PartitionProvider to utilise a
DeferredLoad<TableName> without having to make the DeferredLoad and
TableName pub.
2022-11-14 10:50:05 +01:00
Marco Neumann 746032af0f fix: compatibility after hashbrown upgrade
- Some methods need explicit types
- `hashbrown::HashMap` now takes 32 bytes, not 64
2022-11-11 13:25:39 -05:00
Jake Goulding cc17e5a54b refactor: use a workspace dependency for hashbrown 2022-11-11 13:25:39 -05:00
dependabot[bot] 5024523f00 chore(deps): Bump hashbrown from 0.12.3 to 0.13.1
Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.12.3 to 0.13.1.
- [Release notes](https://github.com/rust-lang/hashbrown/releases)
- [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/hashbrown/compare/v0.12.3...v0.13.1)

---
updated-dependencies:
- dependency-name: hashbrown
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-11 13:24:56 -05:00
Dom 2e7a1391f8
Merge branch 'main' into dom/deferred-namespace-name 2022-11-11 17:39:10 +00:00
Dom Dwyer 0f6470c390
refactor: use correct description for retries
Use the correct description for namespace query retries.
2022-11-11 18:38:30 +01:00
Dom Dwyer 1e5d3f31af
docs: clearer code comments / docs
Remove redundant comments & clarify returns.
2022-11-11 18:38:29 +01:00
Dom 18c86ca44f
refactor: named unused return
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-11-11 17:32:42 +00:00
Nga Tran 9c4266c503
refactor: first step to remove unused retention_duration (#6113)
* refactor: first step to remove unused retention_duration

* refactor: remove retenion_duration from update catalog

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-11 15:21:06 +00:00
Dom Dwyer 2521aedb6a
perf(ingester): address namespaces by ID only
Removes reliance on string name identifiers for namespaces in the
ingester buffer tree, reducing the memory usage of the namespace index
and associated overhead.

The namespace name is required (though unused by IOx) in the IoxMetadata
embedded within a parquet file, and therefore the name is necessary at
persist time. For this reason, a DeferredLoad is used to query the
catalog (by ID) for the name, at some uniformly random duration of time
after initialisation of the NamespaceData, up to a maximum of 1 minute
later. This ensures the query remains off the hot ingest path, and the
jitter prevents spikes in catalog load during replay/ingester startup.

As an additional / easy optimisation, the persist code causes a
pre-fetch of the name in the background while compacting, hiding the
query latency should it not have already been resolved.

In order to keep the the ingester buffer & catalog decoupled / easily
testable, this commit uses a provider/factory trait
NamespaceNameProvider and corresponding implementation
(NamespaceNameResolver) in a similar fashion to the PartitionResolver,
allowing easy mocking for tests, and composition for prod code, allowing
future optimisations such as pre-fetching / caching the "hot" namespace
names at startup.

Internal string identifier removal is a pre-requisite for removing
string identifiers from the write wire format (#4880).
2022-11-11 14:37:21 +01:00
Dom Dwyer 611acc1ad2
refactor: plumb in DeferredLoad<NamespaceName>
Changes the ingester's buffer tree to use the deferred loading primitive
to resolve the namespace name for NamespaceData.

Note that the loader is initialised with the name in the first place -
this commit just introduces the use of the deferred loading primitive,
and doesn't change where the name is sourced from.
2022-11-11 14:37:20 +01:00
Dom Dwyer 3adc66a4b2
feat: Display impl for DeferredLoad
This lets deferred loads be used in place of a non-differed T, such as
log context fields.

If the value has not been resolved, the display impl returns
"<unresolved>".
2022-11-11 14:37:19 +01:00
Dom Dwyer 76ed1afb01
perf(ingester): support prefetch deferred loads
Allow a caller to signal to the DeferredLoad that the value it may or
may not have to materialise will be used imminently, optimistically
hiding the latency of resolving the value (typically a catalog query).
2022-11-11 14:37:18 +01:00
Dom Dwyer d1cfa9d08b
refactor: remove redundant shard data init
Removes confusingly unused shard data initialisation.
2022-11-11 13:27:15 +01:00
Dom 02be6ba7e4
refactor: generic deferred loader helper (#6095)
* refactor: generic deferred loader helper

Splits the DeferredSortKey loader introduced in #5807 into two parts - a
generic helper type that implements deferred/background loading of
values, and SortKey specific logic for use with it.

As this will be more widley used, this implementation features improved
behaviour of the deferred loader under concurrent demand requests
(multiple calls to get() do not attempt to concurrently resolve the
value), as well as complete cancellation safety (cancelling the get()
doesn't affect the liveness of the background task).

* docs: doc-link & minor comment amendments

Fixes naming, adds missing doc-links, and expands some code comments.

* test: bound wait times to avoid hangs

Adds timeouts to all .await of the code under test, ensuring tests don't
hang if something goes wrong.
2022-11-10 19:16:51 +00:00
Nga Tran 93e11d4c91
chore: Revert "feat: flag partitions for delete (#6075)" (#6111)
This reverts commit 77a2541172.
2022-11-10 17:01:39 +00:00
Carol (Nichols || Goulding) dd013c5402
fix: Update the expected size in a test
I tracked down the source of the size difference to the difference in
`mem::size_of::<mutable_batch::column::ColumnData>`. I believe this enum
is now able to take advantage of this niche-filling optimization:

<https://github.com/rust-lang/rust/pull/94075/>
2022-11-09 10:54:18 -05:00
Carol (Nichols || Goulding) fa46951524
fix: Remove needless deref done by auto deref, thanks Clippy! 2022-11-09 10:54:18 -05:00
Nga Tran 77a2541172
feat: flag partitions for delete (#6075)
* feat: flag partition for delete

* fix: compare the right date and time

* chore: Run cargo hakari tasks

* chore: cleanup

* fix: typos

* chore: rust style tidy ups in catalog

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
2022-11-09 12:06:23 +00:00
Dom d9c97795fc
feat: use IDs in ingester query API (#6093)
* refactor: NS+table ID (instead of name) in querier<>ingester

* feat(ingester): use IDs for query API

Changes the ingester to utilise the ID fields (instead of names) sent
over the query wire message wrapped within the Flight API.

BREAKING: this changes the "query-ingester" CLI command arguments which
now expects the namespace & table IDs, rather than their names.

* refactor(ingester): add more query logging context

Updates the log messages during query execution to include more context
fields.

* style: remove unused import

Co-authored-by: Marco Neumann <marco@crepererum.net>
2022-11-09 11:25:13 +00:00
Dom Dwyer 38b0459994 test: simplify tests / remove catalog
Remove the catalog from tests that only initialised an implementation in
order to call buffer_operation().
2022-11-08 17:02:01 +01:00
Dom Dwyer 226f14a97f perf(ingester): remove table lookup query
Now DML operations contain the table ID, the ingester has all necessary
data to initialise the TableData buffer node without having to query the
catalog.

This also removes the catalog from the buffer_operation() call path,
simplifying testing.
2022-11-08 17:00:44 +01:00
Dom Dwyer 225c3b97c1 perf(ingester): remove namespace lookup query
Now DML operations contain the namespace ID, the ingester has all
necessary data to initialise the NamespaceData buffer node without
having to query the catalog.
2022-11-08 16:57:53 +01:00
Dom Dwyer 8ebea0df37 feat: table/namespace IDs in write protocol
Expose the Table and Namespace IDs encoded within the serialised DML
write (added in #6036).

This makes the IDs available for use in the consumers, ending the
transition period. This commit DOES NOT remove the strings sent over the
wire.
2022-11-08 16:57:53 +01:00
Dom b7f7ee6a13
Merge branch 'main' into dom/mutex-pushdown 2022-11-08 14:57:32 +00:00
Dom Dwyer b73d07c22b perf(ingester): granular per-partition locking
This commit pushes the existing table-level mutex down to the partition.

This allows the ingester to gather data from multiple partitions within
a single table in parallel, and reduces contention between ingest/query
workloads.
2022-11-08 15:45:59 +01:00
Dom Dwyer b8181119e1 refactor: push down per-partition op skipping
This moves the logic that skips operations that do not need to be
applied to a partition during shard replay from the table level, to the
partition level.
2022-11-08 15:45:52 +01:00
Dom Dwyer 4c8882e33a docs: ref link to fix PR 2022-11-08 15:17:46 +01:00
Dom Dwyer d71f023a57 refactor: inline helpers
Inline the hash generation & key comparator.
2022-11-08 15:17:46 +01:00
Dom Dwyer 8dd7f2c603 refactor: accept owned key for insert()
Changes the bounds on the ArcMap to accept an owned key, avoiding an
extra allocation.

Cleans up the bounds on other fn to ensure the borrowed key impl Eq and
is the ref type of K.
2022-11-08 15:17:46 +01:00
Dom Dwyer bbc2afe2a1 refactor: extract key equality checking
Creates a shared fn for checking key equality to DRY the various
chaining checks.
2022-11-08 15:17:46 +01:00
Dom Dwyer 8eaccd518b fix: cross-thread map entry visibility
This commit changes the ArcMap HashBuilder to use the same instance as
the underlying HashMap hasher.

This prevents divergent hashing across threads that MAY initialise a
hasher with a different seed.
2022-11-08 15:17:46 +01:00
Dom Dwyer 66a6e8e929 test: cross-thread hashmap entry visibility
At the time of this commit, this test fails. Performing a get() on a key
previously inserted by another thread should not fail.
2022-11-08 15:17:46 +01:00
Dom Dwyer fbd25a06d0 revert: push down per-partition op skipping
This reverts commit 425fd46def.
2022-11-08 10:31:51 +01:00
Dom Dwyer 7ac0857a28 revert: granular per-partition locking
This reverts commit 79d24fa350.
2022-11-08 10:31:37 +01:00
Dom Dwyer 79d24fa350 perf(ingester): granular per-partition locking
This commit pushes the existing table-level mutex down to the partition.

This allows the ingester to gather data from multiple partitions within
a single table in parallel, and reduces contention between ingest/query
workloads.
2022-11-07 13:45:03 +01:00
Dom Dwyer 425fd46def refactor: push down per-partition op skipping
This moves the logic that skips operations that do not need to be
applied to a partition during shard replay from the table level, to the
partition level.
2022-11-07 13:45:03 +01:00
kodiakhq[bot] 5e297e259b
Merge branch 'main' into dom/arcmap-get_or_insert_with 2022-11-07 11:47:00 +00:00
Andrew Lamb 034d9b371d
chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0` (#6061)
* chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0`

* fix: Update query_functions

* fix: update for TimestampNanosecondArray API changes

* fix: update for TimestampNanosecondArray API changes

* chore: Update flatbuffers and remove rustsec warning

* chore: Update text

* fix: update more test

* fix: Lock ahash to exactly 0.8.0

* fix: Update datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 11:01:58 +00:00