Commit Graph

703 Commits (fd8a89deea311a071e535eb192d9c7125705aeb7)

Author SHA1 Message Date
Dom Dwyer d8fc9ff258
test: fix testing deadlocks
The MemCatalog suffers from deadlocks when attempting to obtain a second
ref to RepoCollection:

    https://github.com/influxdata/influxdb_iox/issues/3859
2022-11-14 10:50:10 +01:00
Dom Dwyer 9e97866b48
refactor: internalise PartitionProvider
Removes the need to leak the PartitionProvider outside of the ingester
crate.

This will allow the PartitionProvider to utilise a
DeferredLoad<TableName> without having to make the DeferredLoad and
TableName pub.
2022-11-14 10:50:05 +01:00
Marco Neumann 746032af0f fix: compatibility after hashbrown upgrade
- Some methods need explicit types
- `hashbrown::HashMap` now takes 32 bytes, not 64
2022-11-11 13:25:39 -05:00
Jake Goulding cc17e5a54b refactor: use a workspace dependency for hashbrown 2022-11-11 13:25:39 -05:00
dependabot[bot] 5024523f00 chore(deps): Bump hashbrown from 0.12.3 to 0.13.1
Bumps [hashbrown](https://github.com/rust-lang/hashbrown) from 0.12.3 to 0.13.1.
- [Release notes](https://github.com/rust-lang/hashbrown/releases)
- [Changelog](https://github.com/rust-lang/hashbrown/blob/master/CHANGELOG.md)
- [Commits](https://github.com/rust-lang/hashbrown/compare/v0.12.3...v0.13.1)

---
updated-dependencies:
- dependency-name: hashbrown
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-11-11 13:24:56 -05:00
Dom 2e7a1391f8
Merge branch 'main' into dom/deferred-namespace-name 2022-11-11 17:39:10 +00:00
Dom Dwyer 0f6470c390
refactor: use correct description for retries
Use the correct description for namespace query retries.
2022-11-11 18:38:30 +01:00
Dom Dwyer 1e5d3f31af
docs: clearer code comments / docs
Remove redundant comments & clarify returns.
2022-11-11 18:38:29 +01:00
Dom 18c86ca44f
refactor: named unused return
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-11-11 17:32:42 +00:00
Nga Tran 9c4266c503
refactor: first step to remove unused retention_duration (#6113)
* refactor: first step to remove unused retention_duration

* refactor: remove retenion_duration from update catalog

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-11 15:21:06 +00:00
Dom Dwyer 2521aedb6a
perf(ingester): address namespaces by ID only
Removes reliance on string name identifiers for namespaces in the
ingester buffer tree, reducing the memory usage of the namespace index
and associated overhead.

The namespace name is required (though unused by IOx) in the IoxMetadata
embedded within a parquet file, and therefore the name is necessary at
persist time. For this reason, a DeferredLoad is used to query the
catalog (by ID) for the name, at some uniformly random duration of time
after initialisation of the NamespaceData, up to a maximum of 1 minute
later. This ensures the query remains off the hot ingest path, and the
jitter prevents spikes in catalog load during replay/ingester startup.

As an additional / easy optimisation, the persist code causes a
pre-fetch of the name in the background while compacting, hiding the
query latency should it not have already been resolved.

In order to keep the the ingester buffer & catalog decoupled / easily
testable, this commit uses a provider/factory trait
NamespaceNameProvider and corresponding implementation
(NamespaceNameResolver) in a similar fashion to the PartitionResolver,
allowing easy mocking for tests, and composition for prod code, allowing
future optimisations such as pre-fetching / caching the "hot" namespace
names at startup.

Internal string identifier removal is a pre-requisite for removing
string identifiers from the write wire format (#4880).
2022-11-11 14:37:21 +01:00
Dom Dwyer 611acc1ad2
refactor: plumb in DeferredLoad<NamespaceName>
Changes the ingester's buffer tree to use the deferred loading primitive
to resolve the namespace name for NamespaceData.

Note that the loader is initialised with the name in the first place -
this commit just introduces the use of the deferred loading primitive,
and doesn't change where the name is sourced from.
2022-11-11 14:37:20 +01:00
Dom Dwyer 3adc66a4b2
feat: Display impl for DeferredLoad
This lets deferred loads be used in place of a non-differed T, such as
log context fields.

If the value has not been resolved, the display impl returns
"<unresolved>".
2022-11-11 14:37:19 +01:00
Dom Dwyer 76ed1afb01
perf(ingester): support prefetch deferred loads
Allow a caller to signal to the DeferredLoad that the value it may or
may not have to materialise will be used imminently, optimistically
hiding the latency of resolving the value (typically a catalog query).
2022-11-11 14:37:18 +01:00
Dom Dwyer d1cfa9d08b
refactor: remove redundant shard data init
Removes confusingly unused shard data initialisation.
2022-11-11 13:27:15 +01:00
Dom 02be6ba7e4
refactor: generic deferred loader helper (#6095)
* refactor: generic deferred loader helper

Splits the DeferredSortKey loader introduced in #5807 into two parts - a
generic helper type that implements deferred/background loading of
values, and SortKey specific logic for use with it.

As this will be more widley used, this implementation features improved
behaviour of the deferred loader under concurrent demand requests
(multiple calls to get() do not attempt to concurrently resolve the
value), as well as complete cancellation safety (cancelling the get()
doesn't affect the liveness of the background task).

* docs: doc-link & minor comment amendments

Fixes naming, adds missing doc-links, and expands some code comments.

* test: bound wait times to avoid hangs

Adds timeouts to all .await of the code under test, ensuring tests don't
hang if something goes wrong.
2022-11-10 19:16:51 +00:00
Nga Tran 93e11d4c91
chore: Revert "feat: flag partitions for delete (#6075)" (#6111)
This reverts commit 77a2541172.
2022-11-10 17:01:39 +00:00
Carol (Nichols || Goulding) dd013c5402
fix: Update the expected size in a test
I tracked down the source of the size difference to the difference in
`mem::size_of::<mutable_batch::column::ColumnData>`. I believe this enum
is now able to take advantage of this niche-filling optimization:

<https://github.com/rust-lang/rust/pull/94075/>
2022-11-09 10:54:18 -05:00
Carol (Nichols || Goulding) fa46951524
fix: Remove needless deref done by auto deref, thanks Clippy! 2022-11-09 10:54:18 -05:00
Nga Tran 77a2541172
feat: flag partitions for delete (#6075)
* feat: flag partition for delete

* fix: compare the right date and time

* chore: Run cargo hakari tasks

* chore: cleanup

* fix: typos

* chore: rust style tidy ups in catalog

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
2022-11-09 12:06:23 +00:00
Dom d9c97795fc
feat: use IDs in ingester query API (#6093)
* refactor: NS+table ID (instead of name) in querier<>ingester

* feat(ingester): use IDs for query API

Changes the ingester to utilise the ID fields (instead of names) sent
over the query wire message wrapped within the Flight API.

BREAKING: this changes the "query-ingester" CLI command arguments which
now expects the namespace & table IDs, rather than their names.

* refactor(ingester): add more query logging context

Updates the log messages during query execution to include more context
fields.

* style: remove unused import

Co-authored-by: Marco Neumann <marco@crepererum.net>
2022-11-09 11:25:13 +00:00
Dom Dwyer 38b0459994 test: simplify tests / remove catalog
Remove the catalog from tests that only initialised an implementation in
order to call buffer_operation().
2022-11-08 17:02:01 +01:00
Dom Dwyer 226f14a97f perf(ingester): remove table lookup query
Now DML operations contain the table ID, the ingester has all necessary
data to initialise the TableData buffer node without having to query the
catalog.

This also removes the catalog from the buffer_operation() call path,
simplifying testing.
2022-11-08 17:00:44 +01:00
Dom Dwyer 225c3b97c1 perf(ingester): remove namespace lookup query
Now DML operations contain the namespace ID, the ingester has all
necessary data to initialise the NamespaceData buffer node without
having to query the catalog.
2022-11-08 16:57:53 +01:00
Dom Dwyer 8ebea0df37 feat: table/namespace IDs in write protocol
Expose the Table and Namespace IDs encoded within the serialised DML
write (added in #6036).

This makes the IDs available for use in the consumers, ending the
transition period. This commit DOES NOT remove the strings sent over the
wire.
2022-11-08 16:57:53 +01:00
Dom b7f7ee6a13
Merge branch 'main' into dom/mutex-pushdown 2022-11-08 14:57:32 +00:00
Dom Dwyer b73d07c22b perf(ingester): granular per-partition locking
This commit pushes the existing table-level mutex down to the partition.

This allows the ingester to gather data from multiple partitions within
a single table in parallel, and reduces contention between ingest/query
workloads.
2022-11-08 15:45:59 +01:00
Dom Dwyer b8181119e1 refactor: push down per-partition op skipping
This moves the logic that skips operations that do not need to be
applied to a partition during shard replay from the table level, to the
partition level.
2022-11-08 15:45:52 +01:00
Dom Dwyer 4c8882e33a docs: ref link to fix PR 2022-11-08 15:17:46 +01:00
Dom Dwyer d71f023a57 refactor: inline helpers
Inline the hash generation & key comparator.
2022-11-08 15:17:46 +01:00
Dom Dwyer 8dd7f2c603 refactor: accept owned key for insert()
Changes the bounds on the ArcMap to accept an owned key, avoiding an
extra allocation.

Cleans up the bounds on other fn to ensure the borrowed key impl Eq and
is the ref type of K.
2022-11-08 15:17:46 +01:00
Dom Dwyer bbc2afe2a1 refactor: extract key equality checking
Creates a shared fn for checking key equality to DRY the various
chaining checks.
2022-11-08 15:17:46 +01:00
Dom Dwyer 8eaccd518b fix: cross-thread map entry visibility
This commit changes the ArcMap HashBuilder to use the same instance as
the underlying HashMap hasher.

This prevents divergent hashing across threads that MAY initialise a
hasher with a different seed.
2022-11-08 15:17:46 +01:00
Dom Dwyer 66a6e8e929 test: cross-thread hashmap entry visibility
At the time of this commit, this test fails. Performing a get() on a key
previously inserted by another thread should not fail.
2022-11-08 15:17:46 +01:00
Dom Dwyer fbd25a06d0 revert: push down per-partition op skipping
This reverts commit 425fd46def.
2022-11-08 10:31:51 +01:00
Dom Dwyer 7ac0857a28 revert: granular per-partition locking
This reverts commit 79d24fa350.
2022-11-08 10:31:37 +01:00
Dom Dwyer 79d24fa350 perf(ingester): granular per-partition locking
This commit pushes the existing table-level mutex down to the partition.

This allows the ingester to gather data from multiple partitions within
a single table in parallel, and reduces contention between ingest/query
workloads.
2022-11-07 13:45:03 +01:00
Dom Dwyer 425fd46def refactor: push down per-partition op skipping
This moves the logic that skips operations that do not need to be
applied to a partition during shard replay from the table level, to the
partition level.
2022-11-07 13:45:03 +01:00
kodiakhq[bot] 5e297e259b
Merge branch 'main' into dom/arcmap-get_or_insert_with 2022-11-07 11:47:00 +00:00
Andrew Lamb 034d9b371d
chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0` (#6061)
* chore: Update datafusion and arrow/arrow-flight/parquet to `26.0.0`

* fix: Update query_functions

* fix: update for TimestampNanosecondArray API changes

* fix: update for TimestampNanosecondArray API changes

* chore: Update flatbuffers and remove rustsec warning

* chore: Update text

* fix: update more test

* fix: Lock ahash to exactly 0.8.0

* fix: Update datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 11:01:58 +00:00
Dom Dwyer 2b9e0e173f refactor: rename ArcMap::get_or_insert_with()
Renames ArcMap::get_or_else() to ArcMap::get_or_insert_with() for
consistency with the stdlib HashMap Entry.
2022-11-07 11:56:55 +01:00
Marco Neumann f511db380c
refactor: remove table name from chunks (#6063)
It should be always clear from the context to which table a chunk
belongs.

I think having a table name bound to a chunk goes back to a time where
chunks had multiple tables.

Helps with #6049.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 10:42:57 +00:00
YIXIAO SHI 586035b34d
chore: delete metric duplicate character (#6057)
* chore: delete metric duplicate character

* fix: failure ci test case

* fix: failure ci test case

* fix: failure ci test case

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 10:04:31 +00:00
Dom Dwyer 6fa48731aa feat: NamespaceId in DmlDelete
Changes the DmlDelete to contain the NamespaceId for which it should be
applied, propagating this value over the wire.

Like the existing IDs within the DmlWrite, these values are marked
unsafe to use due to avoid the consumers utilising them accidentally
during deployment. Unlike DmlWrite, the DmlDelete is completely unused,
so this is less of an issue.
2022-11-03 13:57:40 +01:00
Dom Dwyer 30f69ce4f6 feat: ArcMap values() snapshot
Returns a snapshot of the values within an ArcMap.
2022-11-03 11:49:01 +01:00
Dom Dwyer 17890a9906 feat: add ArcMap map type
Implements a map of K -> Arc<V> with exactly-once initialisation
semantics.

This map can be used to ensure a given key maps to singleton instances
of V; exactly what all the nodes in the ingester "buffer tree" of shard
-> namespace -> table -> partition require.

This impl contains unused funcs (silenced with an allow(dead_code)) due
to it being picked from a future branch.
2022-11-03 11:29:09 +01:00
Andrew Lamb 4fb2843d05
refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037)
* chore: Rename `schema::selection::Selection` to `schema::projection::Projection`

* fix: docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 18:15:04 +00:00
Dom Dwyer ddd6ab0ba4 refactor(write_buffer): pass IDs in wire format
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.

In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
2022-11-02 13:28:56 +01:00
Marco Neumann 45b3984aa3
refactor: simplify `QueryChunk` data access (#6015)
* refactor: simplify `QueryChunk` data access

We have only two types for chunks (now that the RUB is gone):

1. In-memory RecordBatches
2. Parquet files

Loads of logic is duplicated in the different `read_filter`
implementations. Also `read_filter` hides a solid amount of logic from
DataFusion, which will prevent certain (future) optimizations. To enable #5897
and to simplify the interface, let the chunks return the data (batches
or metadata for parquet files) directly and let `iox_query` perform the
actual heavy-lifting.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 08:18:33 +00:00
Marco Neumann 072439e428
refactor: mandatory `QueryChunkMeta::summary` (#5997)
With #5963 merged, all chunks now provide a summary (even though it may
not contain data for all columns). So let's make it mandatory, which
also removes a few 🙈-style `.except(...)` calls.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:38:02 +00:00
Carol (Nichols || Goulding) dad1ad1318
feat: Add the catalog service to ingester, querier, and compactor
So that `remote get` that uses the catalog service can work no matter
what kind of server you contact.
2022-10-28 10:49:26 -04:00
Carol (Nichols || Goulding) 53445af25d
chore: Alphabetize some dependencies
I can't handle not knowing where to look for a dependency or knowing
where to add a new dependency.
2022-10-28 10:34:25 -04:00
Andrew Lamb e9d04ffcb5
feat: Log how long each persist plan takes to complete (#5989)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-28 13:52:39 +00:00
kodiakhq[bot] 1567227b49
Merge branch 'main' into dom/require-partition-key 2022-10-28 10:31:22 +00:00
Marco Neumann 8447d46093
refactor: remove `QueryChunkMeta::timestamp_min_max` (#5963)
Use the table summary instead. This allows us to have a single mechanism
that both IOx and DataFusion understand. This basically lifts the "basic
table summary" mechanism that the querier uses to `iox_query` and let
the compactor and ingester use the same mechanism.

While not strictly necessary, simplifying the `QueryChunk[Meta]`
interface helps with #5897.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-28 10:29:16 +00:00
Dom Dwyer 72a358e52f refactor(dml): PartitionKey required for writes
Changes the DmlWrite type to require a PartitionKey be specified,
instead of accepting an Option.

This requirement was already in place - the write buffer upheld an
invariant that all writes contained a partition key value (was not
"None") or it panicked at runtime when attempting to enqueue the write.

It is now possible to encode this invariant in the type system, which is
what this change does.
2022-10-28 10:57:30 +02:00
Dom Dwyer 5d2f4a0ad1 docs: fix issue URL for memory tracking bug 2022-10-27 10:15:15 +02:00
Dom Dwyer f6416675c2 docs: mark hyperlink in rustdoc comments 2022-10-27 10:15:15 +02:00
Dom Dwyer 678fb81892 refactor(ingester): use partition buffer FSM
This commit makes use of the partition buffer state machine introduced
in https://github.com/influxdata/influxdb_iox/pull/5943.

This commit significantly changes the buffering, and querying, of data
from a partition, swapping out the existing "DataBuffer" for the new
state machine implementation (itself simplified due to temporary lack of
incremental snapshot generation, see #5944).

This commit simplifies the query path, removing multiple types that
wrapped one-another to pass around various state necessary to perform a
query, with various query functions needing different types or
combinations of types. The query path now operates using a single type
(named "QueryAdaptor") that provides a queryable interface over the set
of RecordBatch returned from a partition.

There is significantly increased testing of the PartitionData itself,
covering data in various states and the ordering of returned RecordBatch
(to ensure correct materialisation of updates). There are also
invariants upheld by the type system / compiler to minimise the
complexities of working with empty batches & states, and many asserts
that ensure (mostly existing!) invariants are upheld.
2022-10-27 10:15:15 +02:00
Carol (Nichols || Goulding) 88c3a1f5e7
feat: Use workspace dep inheritance for the arrow-flight crate 2022-10-26 10:34:54 -04:00
Carol (Nichols || Goulding) 3145e2c05b
feat: Use workspace dep inheritance for the arrow crate 2022-10-26 10:34:29 -04:00
Carol (Nichols || Goulding) 44936f661a
feat: Use workspace dep inheritance for datafusion instead of shim crate 2022-10-26 10:33:56 -04:00
Carol (Nichols || Goulding) 2e83e04eab
feat: Use workspace package metadata to reduce differences and repetition 2022-10-24 13:04:09 -04:00
Dom Dwyer 39f826518b revert: use histogram to record TTBR
This reverts commit c63312ce12.

This change fixed a low-priority alert when there was no traffic flowing
through the system. The loss in TTBR value fidelity due to bucketing is
a greater concern as it affects live, high-volume clusters and hinders
operational insight.
2022-10-24 10:27:22 +02:00
Dom Dwyer 7b3fa43209 refactor: disable incremental snapshot generation
This commit removes the on-demand, incremental snapshot generation
driven by queries.

This functionality is "on hold" due to concerns documented in:

    https://github.com/influxdata/influxdb_iox/issues/5805

Incremental snapshots will be introduced alongside incremental
compactions of those same snapshots.
2022-10-21 17:41:43 +02:00
Dom db83053be7
Merge branch 'main' into dom/buffer-fsm 2022-10-21 16:32:54 +01:00
Dom Dwyer 8ca72ceff1 docs: fix state mod comments 2022-10-21 17:32:19 +02:00
Dom Dwyer c8fdd76033 feat(ingester): partition buffer state machine
This commit introduces code that is intended to replace the current
implicit state machine used by PartitionData. The existing code is still
in use, the new code is NOT used in this commit. A follow-up commit will
switch over to minimise the diff.

This change has two main goals;
    * encapsulation & simplification for callers
    * robust implementation so developing correct additions is easier

This is a significant refactor of the partition buffering logic to
encapsulate the various states of data (buffering, snapshot, persisting
and the mixed states between them) within the Partition. This alleviates
the rest of the system from having to be concerned with the differences
between "buffering" data, and "unpersisted data", "snapshot data",
"persisting data", "persisting with snapshots" etc - callers now invoke
a method called get_query_data() and they are provided with all the
relevant data for a partition. This abstraction change alone
significantly reduces code and test complexity in the rest of the
ingester.

For the second goal, the new implementation leverages an explicit state
machine, encoded using typestates. Typestate ensures compile-time
correctness of transitions and method calls, and the explicit FSM itself
helps ensure the system progresses in the desired manner - this fixes
and helps prevent bugs caused by implicit states such as:

    https://github.com/influxdata/influxdb_iox/issues/5805

This state machine makes the system states explicit and
self-descriptive, helping to reduce the cost of developer on-boarding
(no prior knowledge of "how this bit works") and reduces ongoing
developer burden. This explicit nature also de-risks adding new
functionality - it should be relatively easy to add concurrent snapshot
generation or incremental compaction without introducing bugs. The state
transition logic is abstracted away from callers, minimising the
overhead of this strategy.
2022-10-21 14:25:51 +02:00
Carol (Nichols || Goulding) 59e1c1d5b9
feat: Pass trace id through Flight requests from querier to ingester
Fixes #5723.
2022-10-20 08:55:30 -04:00
Andrew Lamb 83e3a96c19
fix: improve ttbr histogram metric description (#5909)
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-20 09:03:58 +00:00
Dom ea7b4a0de6
Merge branch 'main' into dom/ingester-integration-tests 2022-10-19 13:36:54 +01:00
Marco Neumann eb5a661ab3
refactor: prep work for #5897 (#5907)
* refactor: add ID to `ParquetStorage`

* refactor: remove duplicate code

* refactor: use dedicated `StorageId`
2022-10-19 11:54:42 +00:00
Dom Dwyer 0c0a38c484 refactor: more verbose shard reset logs
Adds a little more context to the "shard reset" logs.
2022-10-19 12:28:02 +02:00
Dom Dwyer 40f1937e63 test: write buffer seeking tests
Asserts write buffer seeking behaviour, including:

    * Seeking past already persisted data correctly
    * Skipping to next available op in non-contiguous offset stream
    * Skipping to next available op for dropped ops due to retention
    * Panics when seeking beyond available data (into the future)

Removes a pair of tests that covered some of the above due to their
tight coupling with ingester internals.
2022-10-19 12:28:02 +02:00
Dom Dwyer 7729494f61 test: write, query & progress API coverage
This commit adds a new test that exercises all major external APIs of
the ingester:

    * Writing data via the write buffer
    * Waiting for data to be readable via the progress API
    * Querying data and and asserting the contents

This should provide basic integration coverage for the Ingester
internals. This commit also removes a similar test (though with less
coverage) that was tightly coupled to the existing buffering structures.
2022-10-19 11:51:15 +02:00
Dom Dwyer b12d472a17 test(ingester): add integration TestContext
Adds a test helper type that maintains the in-memory state for a single
ingester integration test, and provides easy-to-use methods to
manipulate and inspect the ingester instance.
2022-10-19 11:51:15 +02:00
Dom Dwyer d0b546109f refactor: impl converting IngesterQueryResponse
An existing function to map the complex IngesterQueryResponse type to a
simple set of RecordBatch existed in test code - this has been lifted
onto an inherent method on the response type itself for reuse.
2022-10-19 11:51:15 +02:00
dependabot[bot] b5574c07b7
chore(deps): Bump async-trait from 0.1.57 to 0.1.58 (#5904)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.57 to 0.1.58.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.57...0.1.58)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-19 09:40:26 +00:00
Andrew Lamb d706f8221d
chore: Update datafusion and arrow / parquet / arrow-flight 25.0.0 (#5900)
* chore: Update datafusion and  `arrow` / `parquet` / `arrow-flight` 25.0.0

* chore: Update for structure changes

* chore: Update for new projection pushdown

* chore: Run cargo hakari tasks

* fix: fmt

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-18 20:58:47 +00:00
Dom Dwyer c63312ce12 refactor: use histogram to record TTBR
Changes the TTBR metric from a gauge to a histogram so that observations
maintain a time dimension.
2022-10-18 16:29:09 +02:00
Andrew Lamb 8021b8be0b
fix: Use Display rather than Debug when logging errors (#5859)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-14 14:43:11 +00:00
Luke Bond 475c8a0704
fix: only emit ttbr metric for applied ops (#5854)
* fix: only emit ttbr metric for applied ops

* fix: move DmlApplyAction to s/w accessible

* chore: test for skipped ingest; comments and log improvements

* fix: fixed ingester test re skipping write

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-14 12:06:49 +00:00
Carol (Nichols || Goulding) efb964c390
feat: Enforce table column limits from the schema cache (#5819)
* fix: Avoid some allocations by collecting instead of inserting into a vec

* refactor: Encode that adding columns is for one table at a time

* test: Add another test of column limits

* test: Add below/above limit tests for create_or_get_many

* fix: Explicitly DO NOT check column limits when inserting many columns

* feat: Cache the max_columns_per_table on the NamespaceSchema

* feat: Add a function to validate column limits in-memory

* fix: Provide more useful information when over column limits

* fix: Swap types to remove intermediate allocation

* docs: Explain the interactions of the cache and the column limits

* test: Actually set up test that showcases column limit race condition

* fix: Allow writing to existing columns even if table is over column limit

Co-authored-by: Dom <dom@itsallbroken.com>
2022-10-14 11:34:17 +00:00
Andrew Lamb 9134ccd6c3
chore: Update datafusion again (#5855)
* chore: Update datafusion

* chore: Updates for changes in datafusion

* chore: more updates

* fix: update doc example

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-13 19:18:57 +00:00
kodiakhq[bot] 3039b5877b
Merge branch 'main' into dom/no-persist-lookups 2022-10-13 15:13:36 +00:00
Dom Dwyer 86d28d3359 fix: update cached sort key
Once persist() has successfully updated the sort key in the catalog, set
the partition sort key cache to reflect the new value.
2022-10-13 17:12:07 +02:00
Dom Dwyer 9c40d80032 refactor(ingester): log shard_id in op result
Include the shard ID in the op apply result to correlate it with other
log messages.
2022-10-13 15:41:48 +02:00
Dom Dwyer 3e70dc44a0 refactor(catalog): remove partition_info_by_id()
This method used to return a subset of partition metadata, and was used
exclusively for persistence in the ingester. It is now no longer
necessary.
2022-10-13 15:26:36 +02:00
Dom Dwyer 3fbeaa1314 refactor: assert monotonic partition persistence
Copies the existing monotonic partition persistence check into the
partition too - this ensures that even if the partitions are persisted
in order, they are never marked as persisted OUT of order.
2022-10-13 15:26:36 +02:00
Dom Dwyer 920f7edf75 refactor: defer querying for table schema
Do not query for the table schema until it is needed.
2022-10-13 15:26:36 +02:00
Dom Dwyer e556677192 perf(ingester): remove persist lookup queries
Removes the catalog queries previously used to look up various
information about the partition/table/namespace that was already in
memory.

As part of this change, the compaction helper function is changed to
accept the inputs it needs, rather than a struct of data from the
catalog - this significantly simplifies testing.

This commit also adds additional context to all log messages in the
persist() fn.
2022-10-13 15:26:36 +02:00
Dom Dwyer 10d77b0ef7 refactor: use deferred sort key loading
Changes the persist() implementation in the ingester to load the sort
key using the deferred loading mechanism, instead of on-demand.
2022-10-13 15:26:36 +02:00
Dom Dwyer dbcbb5b824 refactor: include sequence numbers in apply() logs
Include the op sequence number in the error/success apply() log
messages.
2022-10-13 14:19:02 +02:00
Dom Dwyer 15e153a74c perf(ingester): cheaper table discovery
This commit changes the table ID lookup query from an expensive,
JOIN multi-query to a simple, single table, indexed lookup.

As this is on the hot path, this should help with the recovery rate of
the ingesters.
2022-10-13 13:44:50 +02:00
Andrew Lamb d57c99638c
chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0 (#5792)
* chore: Update datafusion + `arrow`, `arrow-flight`, and `parquet` to 24.0.0.0

* fix: Update for coercion, fix explain plans for change in column name display

* chore: Update datafusion lock

* fix: Update for other API changes

* chore: Update to latest datafusion pin

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 16:19:14 +00:00
dependabot[bot] 7202dddab6
chore(deps): Bump tokio-stream from 0.1.10 to 0.1.11 (#5838)
Bumps [tokio-stream](https://github.com/tokio-rs/tokio) from 0.1.10 to 0.1.11.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-stream-0.1.10...tokio-stream-0.1.11)

---
updated-dependencies:
- dependency-name: tokio-stream
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-12 12:37:24 +00:00
Luke Bond 11900cea4d
chore: add some tracing logs to the ingester (#5839) 2022-10-12 12:10:20 +00:00
Dom Dwyer b294bb98aa refactor: move query types to query_handler
Moves types that are only used for handling queries to the query_handler
module.
2022-10-11 17:58:55 +02:00
Dom Dwyer c4f542bbe2 refactor(ingester): remove tombstone support
This commit removes tombstone support from the ingester, and deletes
associated code/helpers/tests. This commit does NOT remove tombstone
support from any other service, but MAY include removing overlapping
test coverage.

This also removes the tombstone support from the Ingester -> Querier RPC
response message.

This has the nice side effect of removing a whole lot of thread spawning
in the ingester tests for the Executor, speeding everything up!
2022-10-11 13:10:04 +02:00
Luke Bond fda1479db0
chore: add trace log to ingester to aid debugging (#5829) 2022-10-11 10:33:42 +00:00
Dom d2467d0b63
Merge branch 'main' into dependabot/cargo/object_store-0.5.1 2022-10-11 09:56:27 +01:00
dependabot[bot] 933493fab3
chore(deps): Bump object_store from 0.5.0 to 0.5.1
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.0 to 0.5.1.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.0...object_store_0.5.1)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-11 01:19:10 +00:00
Dom Dwyer 97c6e0f8ce refactor: use TableName, not Arc<str>
Adds a type wrapper TableName, internally an Arc<str> to leverage the
type system instead of passing around untyped strings.
2022-10-10 19:09:43 +02:00
Dom Dwyer 4518bd49d1 test: constify duration seconds 2022-10-10 14:39:35 +02:00
Dom Dwyer ab78f99ab2 refactor: eager background task abort
Changes the get() code path to abort the background load task when the
caller will resolve the sort key.

Note that an aborted future will leave the DeferredSortKey without a
background task to fetch the key, and the next caller will have to query
the catalog. Given the rarity of aborted futures, and desire to minimise
catalog load, this seems like a decent trade-off.

This commit also documents the many-readers eager loading problem.
2022-10-10 14:39:35 +02:00
Dom Dwyer afcb96ae47 perf(ingester): deferred sort key lookup queries
This commit carries the SortKey in the PartitionData, and configures the
ingester to use deferred sort key lookups, smearing the lookups across a
fixed period of time after initialising the PartitionData, instead of
querying for the sort key at persist time.

This allows large numbers of PartitionData to be initialised without
causing a equally large spike in catalog load to resolve the sort key -
instead this load is spread out randomly to reduce peak query rps.
2022-10-06 16:39:54 +02:00
Dom Dwyer c022ab6786 feat: deferred partition sort key fetcher
Adds a new DeferredSortKey type that fetches a partition's sort key from
the catalog in the background, or on-demand if not yet pre-fetched.

From the caller's perspective, little has changed compared to reading it
from the catalog directly - the sort key is always returned when calling
get(), regardless of the mechanism, and retries are handled
transparently. Internally the sort key MAY have been pre-fetched in the
background between the DeferredSortKey being initialised, and the call
to get().

The background task waits a (uniformly) random duration of time before
issuing the catalog query to pre-fetch the sort key. This allows large
numbers of DeferredSortKey to (randomly) smear the lookup queries over a
large duration of time. This allows a large number of DeferredSortKey to
be initialised in a short period of time, without creating an equally
large spike in queries against the catalog in the same time period.
2022-10-06 16:37:04 +02:00
kodiakhq[bot] ffa1704d96
Merge branch 'main' into dom/namespace-name 2022-10-06 13:58:47 +00:00
Marco Neumann c4c83e0840
fix: query error propagation (#5801)
- treat OOM protection as "resource exhausted"
- use `DataFusionError` in more places instead of opaque `Box<dyn Error>`
- improve conversion from/into `DataFusionError` to preserve more
  semantics

Overall, this improves our error handling. DF can now return errors like
"resource exhausted" and gRPC should now automatically generate a
sensible status code for it.

Fixes #5799.
2022-10-06 08:54:01 +00:00
Dom Dwyer abb9122e2c refactor: carry namespace name in NamespaceData
Changes the ingester's NamespaceData to carry a ref-counted string
identifier as well as the ID.

The backing storage for the name in NamespaceData is shared with the
index map in ShardData, so it is effectively free!
2022-10-05 13:03:16 +02:00
Dom Dwyer 1a7eb47b81 refactor: persist() passes all necessary IDs
This commit changes the persist() call so that it passes through all
relevant IDs so that the impl can locate the partition in the buffer
tree - this will enable elimination of many queries against the catalog
in the future.

This commit also cleans up the persist() impl, deferring queries until
the result will be used to avoid unnecessary load, improves logging &
error handling, and documents a TOCTOU bug in code:

    https://github.com/influxdata/influxdb_iox/issues/5777
2022-10-04 14:28:01 +02:00
Dom Dwyer f9bf86927d refactor: ref PartitionData by key & ID
Changes the TableData to hold a map of partition key -> PartitionData,
and partition ID -> PartitionData simultaneously. This allows for cheap
lookups when the caller holds an ID.

This commit also manages to internalise the partition map within the
TableData - one less pub / peeking!

This commit also switches from a BTreeMap to a HashMap as the backing
collection, as maintaining key ordering doesn't appear to be necessary.
2022-10-04 14:28:01 +02:00
Dom Dwyer 0847cc5458 refactor: PartitionData::id() -> partition_id()
Consistent naming is consistent - all the others are thing_id().
2022-10-04 14:28:01 +02:00
Dom Dwyer 66e05b5ea7 refactor: ref NamespaceData by name & ID
Changes the ShardData to hold a map of namespace name -> NamespaceData,
and namespace ID -> NamespaceData simultaneously.

This allows for cheap lookups when the caller holds an ID, and is part
of preparatory work to transition away from using string names in the
ingester for tables.

This commit also switches from a BTreeMap to a HashMap as the backing
collection, as maintaining key ordering doesn't appear to be necessary.
2022-10-04 14:28:01 +02:00
Dom Dwyer 9c0e4e98c4 refactor: ref TableData by name & ID
Changes the NamespaceData to hold a map of table name -> TableData, and
table ID -> TableData simultaneously.

This allows for cheap lookups when the caller holds an ID, and is part
of preparatory work to transition away from using string names in the
ingester for tables.

This commit also switches from a BTreeMap to a HashMap as the backing
collection, as maintaining key ordering doesn't appear to be necessary.
2022-10-04 14:28:01 +02:00
Dom Dwyer 7efd81a63a docs: comment write record ordering 2022-10-03 12:23:30 +02:00
Dom Dwyer b23ad31711 fix: spurious memory accounting for failed write
Fixes a case where the ingester may incorrectly record a write as having
been buffered in memory, when in fact the buffering failed.

This could cause the effective buffer size to be reduced over time as
more and more data is spuriously "added" to the buffer, but never
released back to the memory tracker as it is never persisted.
2022-10-03 12:13:43 +02:00
Dom Dwyer 20451921d0 test: MockLifecycleHandle captures calls
Changes the NoopLifecycleHandle to MockLifecycleCall, and adds code
causing it to log all calls made to the log_write() method.

This will allow tests to assert calls and their values in DML buffering
tests.
2022-10-03 12:13:43 +02:00
Dom Dwyer 7dd28f4230 test: simplify PartitionProvider mock
The PartitionKey is now part of the PartitionData, so there is no need
to specify the redundant ID when configuring the mock.
2022-09-30 16:32:39 +02:00
Dom Dwyer c33499764d test: share populate_catalog() across tests
Parametrises test_util::populate_catalog() and exports for re-use in
ingester tests.
2022-09-30 16:32:37 +02:00
Dom Dwyer fc47f6ab8f test: re-use test_utils::make_op
Share the make_op helper across all tests in the Ingester.
2022-09-30 16:32:36 +02:00
Dom Dwyer f0885612e9 test: shared mock LifecycleHandle impl
Moves the NoopLifecycleHandle to the Ingester's test_utils to share it
across multiple components.
2022-09-30 16:32:34 +02:00
Dom Dwyer e84186763f refactor: LifecycleStats tracks Namespace/TableId
Changes the lifecycle handle to also track the namespace + table ID in
addition to the existing shard ID.

Adds asserts to ensure the values never vary for a given partition.
2022-09-30 15:29:39 +02:00
Dom Dwyer 726b1d1d3b refactor: PartitionData carries parent IDs
This commit changes the PartitionData buffer structure to carry the IDs
of all its parents - the table, namespace, and shard. Previously only
the table & shard were carried.
2022-09-29 15:07:03 +02:00
Dom e9bd03b77c
Merge branch 'main' into dom/partition-contains-key 2022-09-29 12:32:35 +01:00
Dom Dwyer f5a7fbf8e2 refactor: PartitionData carries PartitionKey
Changes the PartitionData to carry the derived PartitionKey for which it
is buffering ops for. This is used at persist time.
2022-09-29 13:22:50 +02:00
Dom Dwyer cd4087e00d style: add no todo!() or dbg!() lints
Some crates had theme, some not - lets be consistent and have the
compiler spot dbg!() and todo!() macro calls - they should never be in
prod code!
2022-09-29 13:10:07 +02:00
kodiakhq[bot] 54e68637dc
Merge branch 'main' into dom/partition-cache 2022-09-28 15:22:40 +00:00
Dom Dwyer 82b7479f97 refactor(write_buffer): seek error at seek time
Moves the "you've tried to seek into the future!" error to the point at
which the seek attempt was made.

This makes more sense than deferring the seek error until read time, and
is easier to determine this is the case rather than at read time (where
the read response error contains an invalid high_watermark value of -1,
making it impossible to conclusively determine what has happened).
2022-09-28 16:44:59 +02:00
Dom Dwyer 5f2f735c7e fix: spurious watermark < read offset panic
In staging we observed an ingester panic due to the write buffer stream
yielding an WriteBufferErrorKind::SequenceNumberAfterWatermark,
suggesting the ingester was attempting to read from an offset that
exceeds the current max write offset in Kafka (high watermark offset).

This turned out not to be the case - the partition had a single write at
offset 2, and the ingester was attempting to seek to offset 1. The first
read would fail (offset 1 does not exist) and the error handling did not
account for the high watermark not being correctly set (-1 in the
response).

I have no idea why rskafka returns this watermark / doesn't retry / etc
but this change will allow the ingesters to recover.
2022-09-28 15:22:34 +02:00
Dom Dwyer 8cf81f457a perf(ingester): amortise Partition cache memory
Remove each cache hit from the partition cache, as each partition should
be looked up at most once.

This amortises the memory usage of the cache, as it should be "drained"
of hot partitions.
2022-09-27 17:16:18 +02:00
Dom Dwyer 1311a8746d refactor(ingester): use Partition cache
Cache the 10,000 most recent partitions at startup, and share them
across all shards.

At commit time, there are approx ~8,000 partitions per day, per
ingester, so this should cache all of the partitions for a given day so
far at startup.
2022-09-27 17:15:59 +02:00
Dom Dwyer 2068ff394b perf(ingester): cache Partition
This commit implements a PartitionCache decorator over the
PartitionProvider abstraction.

When an ingester starts up, the internal data structures are empty and
are lazily initialised for each namespace / table / partition as they
are observed in the stream of DML ops.

This lazy initialisation includes resolving the Partition ID and last
persisted sequence number offset value from the catalog for each
partition in each table in each namespace for which an op is observed -
this occurs in the hot path, while blocking ingest for a shard.
resolving each partition will cause a catalog query, this can cause a
spike in queries against the catalog, also resulting in unnecessarily
slow ingester recovery - we're effectively lazily warming a cache of
PartitionData in the hot path!

Instead this cache can be used to pre-warm the N most recently created
partitions (which are likely to have ongoing writes) at startup to
eliminate the hot-path overhead and associated catalog queries.

NOTE: unlike most of the other hot-path queries, partition persist
offset resolution cannot be eliminated by changes to the Kafka wire
format.
2022-09-27 17:15:57 +02:00
Dom Dwyer a3d6e7a45a refactor(ingester): server-wide PartitionProvider
Lifts the PartitionProvider initialisation higher in the stack to a
point where a single instance can be used across all shards an ingester
manages.

This is a pre-requisite for sharing a cache of Partitions across all
shards.
2022-09-27 17:15:31 +02:00
Dom Dwyer 38ebd5fb20 test: simplify partition provider mock
Removes redundant fields from the MockPartitionProvider.
2022-09-27 17:11:13 +02:00
Dom 2ef04e99da
Merge branch 'main' into dom/non-pub-shard 2022-09-27 14:21:23 +01:00
Andrew Lamb 66dbb9541f
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 (#5694)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight`  to 23.0.0

* chore: Update thrift / remove parquet_format

* fix: Update APIs

* chore: Update lock + Run cargo hakari tasks

* fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-27 12:50:54 +00:00
Dom Dwyer b873297fad refactor(ingester): limit visibility
Marks many internal data structures as non-pub.

Many remain as they're used across tests / from multiple callers
"peeking", but this limits the scope of false sharing in the future.
2022-09-27 14:27:32 +02:00
Dom Dwyer 11be746dc0 refactor: internalise ShardData init
Move the initialisation of ShardData (an internal ingester data
structure) into the ingester itself.

Previously callers would initialise the ingester state, and pass it into
the IngesterData constructor.
2022-09-27 14:26:17 +02:00
Dom Dwyer 61aecc3044 refactor: decouple partition init from table
Removes the "how" of initialising a per-partition buffer structure
(PartitionData) from the per-table buffer (TableData).

This is a cleaner separation of concerns - a table buffer is responsible
for addressing and initialising per-table partitions as necessary, and
buffering of ops for them. It does not have to be concerned with the
series of steps necessary to look up the various bits of data in order
to construct a PartitionData.

This abstract provider can be layered up to provide more complex
behaviours - I intend to add a read-through cache impl that decorates
the catalog impl in this commit, which should eliminate most partition
queries at ingester startup utilising the indirection added here.
2022-09-26 14:35:15 +02:00
Carol (Nichols || Goulding) c8108f01e7
chore: Upgrade to Rust 1.64 (#5727)
* chore: Upgrade to Rust 1.64

* fix: Use iter find instead of a for loop, thanks clippy

* fix: Remove some needless borrows, thanks clippy

* fix: Use then_some rather than then with a closure, thanks clippy

* fix: Use iter retain rather than filter collect, thanks clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 18:04:00 +00:00
Marco Neumann 55ef272920
refactor: acquire table locks concurrently (#5722)
Waiting for one after the other (one per shard) in serial fashion
likely increases latency too much.
2022-09-22 10:56:22 +00:00
Marco Neumann 365a246f8d
refactor: do not run de-dup in ingester for querier requests (#5626)
* refactor: do not run de-dup in ingester for querier requests

This removes the entire de-dup logic from the inegster for querier
requests. Furthermore, it even removes the entire datafusion execution
from the querier and just dumps the in-memory record batches as quickly
as possible. No filters are applied. Note that even prior to this PR,
we've never applied projections (tracked by #5624).

**Pros:**

- speed up query planning within the querier (since we need the ingester
  response for state reconciling)
- lowered ingester CPU load

**Cons:**

- more querier<>ingester network traffic

Closes #5602.

* test: extend query test case

* fix: ingester tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 07:33:54 +00:00
Marco Neumann c66f16e4af
fix: ingester retries (#5708)
* fix: retry ingester requests faster

The retries introduced in #5695 are too slow and block the entire
querier for minutes (until the very long gRPC timeout kicks in).

* fix: add error details on why the query planning failed

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-21 09:27:47 +00:00
Dom Dwyer c6fe0dab3e refactor(ingester): reduced internal visibility
Changes many pub fields / methods to be pub(super), or if necessary,
pub(crate).

This helps maintain an internal API boundary for code hygiene, and helps
identify functions that are unused / only used in tests (which I've
annotated with cfg(test) and intend to remove - we should be driving
code under test via the public API rather than using test-only state
mutation, otherwise we're just testing our tests!)
2022-09-20 16:24:27 +01:00
Dom Dwyer 6d00d6b683 test(ingester): refactor querier API tests
This commit changes the prepare_data_to_querier() tests to drive the
ingester state by applying DML ops, therefore driving the prod code
paths (and testing them!) rather than having the tests set up what the
tests believe is the correct internal ingester state, and then asserting
on that state.

This gives us much better coverage of prod code paths, decouples the
tests from the internal state/representation of ingesters (making the
tests less fragile), and removes a bunch of special-cased, test-only
functions that are functionally similar, but not the same as, the prod
functions.

Unblocks #5658, further clean-up to come.
2022-09-20 16:24:27 +01:00
dependabot[bot] 4fbb32eed6
chore(deps): Bump tokio-stream from 0.1.9 to 0.1.10 (#5667)
Bumps [tokio-stream](https://github.com/tokio-rs/tokio) from 0.1.9 to 0.1.10.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-stream-0.1.9...tokio-stream-0.1.10)

---
updated-dependencies:
- dependency-name: tokio-stream
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-19 07:36:11 +00:00
Dom Dwyer b0eb85ddd5 refactor: store ShardId in child nodes
Instead of passing the ShardId into each function for child nodes of the
Shard, store it. This avoids the possibility of mistakenly passing the
wrong value.
2022-09-16 18:00:11 +02:00
Dom Dwyer 07b08fa9cb refactor: add table name in PartitionData
A partition belongs to a table - this commit stores the table name in
the PartitionData (which was readily available at construction time)
instead of redundantly passing it into various functions at the risk of
getting it wrong.
2022-09-16 17:59:22 +02:00
Dom Dwyer c7ba0bea91 refactor: ShardId & TableId in PartitionData
When we construct a PartitionData we have the ShardId and TableId. This
commit stores them in the PartitionData for later use, rather than
repeatedly passing them in again when constructing snapshots, at the
risk of passing the wrong IDs.
2022-09-16 17:17:16 +02:00
Dom Dwyer 85d6efafe1 refactor: snapshot_to_persisting redundant ID
Partition::snapshot_to_persisting() passes the ID of the partition it is
calling `snapshot_to_persisting()` on. The partition already knows what
its ID is, so at best it's redundant, and at worst, inconsistent with
the actual ID.
2022-09-16 17:08:08 +02:00
Dom Dwyer ce0d189260 perf: O(1) partition persist mark discovery
Changes the ingest code path to eliminate scanning the parquet_files
table to discover the last persisted offset per partition, instead
utilising the new persisted_sequence_number field on the Partition
itself to read the same value.

This lookup blocks ingest for the shard, so removing the expensive query
from the ingest hot path should improve catch-up time after a
restart/deployment.
2022-09-16 14:06:42 +02:00
Dom Dwyer 66bf0ff272 refactor(db): NULLable persisted_sequence_number
Makes the partition.persisted_sequence_number column in the catalog DB
NULLable. 0 is a valid persisted sequence number.
2022-09-15 18:19:39 +02:00
Dom Dwyer 234d460fcb chore: rename update_persisted_sequence_number fn 2022-09-15 16:10:35 +02:00
Dom Dwyer f91d802107 feat: store per-partition persist markers
Changes the ingester to record the per-partition, maximum persisted
sequencer offsets to the catalog. This will enable quick O(1) lookup in
the future, but the currently persisted value is only used to assert the
per-partition monotonic persist ordering invariant.
2022-09-15 16:10:35 +02:00
Dom Dwyer 300938f858 refactor: assert partition persistence ordering
Assert the per-shard / per-partition persistence watermarks
monotonically increase, and document the invariant.

NOTE: this is not a new invariant, just a new assertion to validate it.
2022-09-15 16:10:35 +02:00
Dom Dwyer d199a83355 feat(catalog): per-partition persist mark API
Adds the "persisted_sequence_number" field to the Partition model, and
updates the catalog API to read & update it.
2022-09-15 16:10:35 +02:00
Dom Dwyer fc17f2ec2d refactor: hoist persistence watermark from buffer
The maximum persisted sequence number is tracked to answer "up to where
has this partition been persisted", used for querying and skipping
writes that have already been applied (though I suspect this is
redundant).

This is a property of the partition, not the actual data buffer, so this
commit hoists it up out of the data buffer and onto the per-partition
data structure, internalising the field in the process (not pub).
2022-09-14 18:07:45 +02:00
Dom Dwyer ee8cdb48af style(ingester): fmt imports & long strings
Rewrite the imports to be a consistent order; std, external, crate and
merge all crate-level imports into one use statement.
2022-09-14 14:20:19 +02:00
Dom Dwyer 074722eb3e refactor(ingester): split data.rs into modules
Breaks the gigantic data.rs file into sub-modules for Shard, Namespace,
Table, Partition, and finally the actual data buffer used to store
writes.
2022-09-14 14:20:19 +02:00
Andrew Lamb f86d3e31da
chore: Update datafusion + object_store (#5619)
* chore: Update datafusion pin

* chore: update object_store to 0.5.0

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-13 12:34:54 +00:00
Marco Neumann e4a66f69c7
refactor: dedicated query path for querier tombstone materialization (#5618)
I would like to remove de-dup from the querier<>ingester query path in #5602,
but the tombstone application and parquet write path can still do the full work.
2022-09-13 07:16:38 +00:00
Andrew Lamb 1fd31ee3bf
chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 (#5591)
* chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0

* fix: enable dynamic comparison flag

* chore: derive Eq for clippy

* chore: update explain plans

* chore: Update sizes for ReadBuffer encoding

* chore: update more tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-12 17:45:03 +00:00
Dom Dwyer 6342be2fa9 refactor: consistent logging
Changes the ingester to log ALL reasons for a partition being marked for
persistence, rather than just one of the 4 previously. Fields are
consistent, but verbose / repetitive.

Also cleans up some misleading messages like "updating ..." to be logged
only when the update actually takes place.
2022-09-12 15:57:27 +02:00
Marco Neumann 8933f47ec1
refactor: make `QueryChunk::partition_id` non-optional (#5614)
In our data model, a chunk always belongs to a partition[^1], so let's
not make this attribute optional. The optional value only leads to
-- mostly surprising -- conditional behavior, ranging from "do not equalize
the partition sort key" (querier) to "always consider the chunk overlapping"
(iox_query when dealing with ingester chunks).

[^1]: This is even true when the chunk belongs to a parquet file that is not
      yet added to the catalog, contrary to what a comment in the ingester
      stated. The catalog and data model used by the querier are two totally
      different things.
2022-09-12 13:52:51 +00:00
Marco Neumann caa0dfd1e0
refactor: query code clean ups (#5612)
* refactor: remove dead code

* refactor: `Deduplicator::build_scan_plan` consumes `self`

There is no good reason to use the same `Deduplicator` twice. In
contrast I'm quite sure that this would lead to nasty bugs, because
`split_overlapped_chunks` exists early in some cases so the 2nd plan
would have old and new chunks mixed together.
2022-09-12 13:00:56 +00:00
dependabot[bot] 786ce75e26
chore(deps): Bump tokio-util from 0.7.3 to 0.7.4 (#5596)
Bumps [tokio-util](https://github.com/tokio-rs/tokio) from 0.7.3 to 0.7.4.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-util-0.7.3...tokio-util-0.7.4)

---
updated-dependencies:
- dependency-name: tokio-util
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-09 07:40:16 +00:00
YIXIAO SHI 52ae60bf2e
chore: fix comment typo (#5551)
Co-authored-by: Dom <dom@itsallbroken.com>
2022-09-07 08:49:29 +00:00
Marco Neumann adeacf416c
ci: fix (#5569)
* ci: use same feature set in `build_dev` and `build_release`

* ci: also enable unstable tokio for `build_dev`

* chore: update tokio to 1.21 (to fix console-subscriber 0.1.8

* fix: "must use"
2022-09-06 14:13:28 +00:00
Marco Neumann 87772a6aec
refactor: debug log improvements (#5553)
* feat: extend log output for ingester responses

* feat: add debug log for parquet `read_filter` calls

* feat: add debug log to `get_write_info`

* feat: add debug log parquet cache invalidation
2022-09-05 13:54:13 +00:00
dependabot[bot] 7c61bdcf35
chore(deps): Bump paste from 1.0.8 to 1.0.9 (#5526)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.8 to 1.0.9.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.8...1.0.9)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-01 12:07:53 +00:00
Dom ed2490deb2
Merge branch 'main' into dom/ingester-row-limit 2022-08-31 14:56:42 +01:00
Dom Dwyer 2a19606456 feat(ingester): restrict partition row count
This limit restricts a single partition to containing at most N rows
before it is marked for persistence (note: being marked for persistence
does not currently prevent further ingest for that partition.)
2022-08-31 15:48:18 +02:00
Andrew Lamb 6669d85fb4
chore: Update datafusion + arrow/parquet to `21.0.0` (#5519)
* chore: Update arrow/arrow-flight/parquet to 21.0.0

* chore: Update datafusion pin

* chore: Fix arrow update script

* chore: Update Cargo.lock

* chore: Update for new API
2022-08-31 13:30:47 +00:00
Nga Tran cb10a7c6d8
feat: More accurate memory estimate for compaction (#5471)
* feat: initial implementation of memory estimation for a compaction

* feat: estimate size of files and have the right actions for the needed budget

* feat: run candidates in parallel

* fix: have the right name for the column field of the output struct

* feat: add metrics for estimated budgets

* chore: cleanup

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fix syntax after applying review's suggestions

* refactor: Convert a Vec to VecDeque to go well with pop and push

* chore: remove max_concurrent_size_bytes and input_size_threshold_bytes

* chore: remove input_file_count_threshold

* test: tests for estimate_arrow_bytes_for_file

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-30 13:44:44 +00:00
Carol (Nichols || Goulding) dbd27f648f
refactor: Rename more mentions of Kafka to their other name where appropriate 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 58f0b63cdc
refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 74c9529062
fix: Rename KafkaPartition to ShardIndex 2022-08-29 14:07:18 -04:00
Carol (Nichols || Goulding) c9567cad7d
fix: Rename some more sequencer to shard 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) fe9c474620
fix: rustfmt 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) fbae4282df
fix: Rename another sequencer to shard to be hopefully clearer 2022-08-29 14:06:45 -04:00
Jake Goulding 4abf21c724
refactor: Rename Sequencer (and its entourage) to Shard 2022-08-29 14:06:43 -04:00
Andrew Lamb 35f99fe940
fix: fix intermittent failures in `data::tests::persist` (#5437)
* fix: fix intermittent failures in data::tests::persist

* fix: tweak comments and message

* fix: space
2022-08-19 21:16:00 +00:00
kodiakhq[bot] 2b3ca54168
Merge branch 'main' into cn/upgrade-l0-metrics 2022-08-17 16:01:42 +00:00
Andrew Lamb 7f0ae53d6f
chore: Update to (almost) released object_store 0.4.0 (#5419)
* chore: update object_store

* chore: update hakari config

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-17 13:44:48 +00:00
Carol (Nichols || Goulding) ed44817ed1
feat: Add a histogram of ingested (new L0) Parquet file sizes
Connects to #5348.
2022-08-15 10:13:54 -04:00
Marco Neumann 6b8b922fe7
fix: do not loose data when Kafka reports that offset is above watermark (#5322)
* fix: do not loose data when Kafka reports that offset is above watermark

This can happen in certain cluster rebalance settings.

This is also linked to https://github.com/influxdata/rskafka/issues/147
but for the upstream issue I currently have no idea how to fix it, so
let's at least harden IOx against it.

Fixes #5128.

* refactor: panic for `SequenceNumberAfterWatermark`
2022-08-11 07:32:04 +00:00
Andrew Lamb 3a945dbcb2
chore: return a struct with named and documented fields from `compact_persisting_batch` (#5346)
* chore: return a struct with named and documented fields from `compact_persisting_batch`

* docs: Remove extra 'the' and fix a typo

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
2022-08-10 20:22:29 +00:00
Andrew Lamb 16ddc5efc6
chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360)
* chore: Update datafusion and arrow

* chore: Update Cargo.lock

* chore: update to Decimal128

* chore: Update tonic/prost/pbjson/etc

* chore: Run cargo hakari tasks

* fix: doctest in generated types

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-09 17:30:44 +00:00
Andrew Lamb 7219f512c3
fix: update sort key in catalog before adding parquet file to catalog (#5333)
* fix: update sort key before parquet file

* fix: Remove left over debugging

* fix: fix bug, improve logging

* chore: move debug log after catalog update, improve args and docs
2022-08-09 10:27:51 +00:00
Marco Neumann 9fbc95c3ad
feat: add sequencer reset count metric and log to ingester (#5286)
Split out from #5253.

Helps with #5128.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-03 13:00:36 +00:00
dependabot[bot] 94fe5b4c10
chore(deps): Bump paste from 1.0.7 to 1.0.8 (#5280)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.7 to 1.0.8.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.7...1.0.8)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-03 09:03:25 +00:00
dependabot[bot] fbd39844d8
chore(deps): Bump async-trait from 0.1.56 to 0.1.57 (#5247)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.56 to 0.1.57.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.56...0.1.57)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-01 08:30:33 +00:00
Andrew Lamb 9215a534d0
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0`

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 08:10:47 +00:00
Marko Mikulicic 9da8062a16
fix: Fix typo in log message (#5222)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-27 15:34:37 +00:00
Marco Neumann 9a9a1a4777
feat: limit per-table chunk data for every query (#5223)
* feat: `QueryChunk::as_any`

* feat: allo `ChunkPruner::prune_chunks` to fail

* feat: limit per-table chunk data for every query

Closes #5211.

* fix: address review comments

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-07-27 13:20:05 +00:00
Andrew Lamb fbf672015e
refactor: Reduce ceremony requried to create a `Span` from `SpanContext` (#5181)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-22 11:19:38 +00:00
Nga Tran 69cb3f2b19
refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151)
* refactor: remove min_sequnce_number

* fix: typos

* fix: remove min_sequencer_number from new files from merging main

* fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier

* test: add test_compactor_collision back but modify the input to make it work woth new changes

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 13:51:54 +00:00
Marco Neumann 0561423475
refactor: enforce proper `IOxSessionContext` (#5158)
- remove `IOxSessionContext::default()` because untracked contexts
  should only be created by tests
- remove `Option<IOxSessionContext>` because it is a typed workaround
  for `IOxSessionContext::default`

Tests should use `IOxSessionContext::testing` and all _normal_ users
should create proper contexts.

I suspect this will help tracing or at least prevent silent regressions.
See #5129.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-20 16:25:43 +00:00
dependabot[bot] 278a7f91af
chore(deps): Bump bytes from 1.1.0 to 1.2.0 (#5156)
Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/bytes/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: bytes
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-20 10:00:08 +00:00
Andrew Lamb e2d871b00b
chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` (#5079)
* chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18

* chore: Run cargo hakari tasks

* fix: update cargo pin

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-18 15:01:03 +00:00
Andrew Lamb 5bebff0b06
Revert "feat: skip ingester buffering if INFLUXDB_IOX_INGESTER_SKIP_BUFFER is set" (#5116)
This reverts commit ca6875f60bec935eb6079b684d6eaa0cbc8a5306.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-15 13:22:45 +00:00
kodiakhq[bot] 18ffe581b5
Merge branch 'main' into dependabot/cargo/tokio-1.20.0 2022-07-14 14:18:51 +00:00
Marco Neumann 512f9850ee
refactor: ingester seek log debug => info (#5127)
This message will be printed once per partition on ingester startup and
shouldn't be too noisy, but is very helpful to judge "replay" /
"catch-up".
2022-07-14 10:28:16 +00:00
dependabot[bot] 9b67de2f43
chore(deps): Bump tokio from 1.19.2 to 1.20.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-14 01:21:43 +00:00
Carol (Nichols || Goulding) 61c023139b
refactor: Switch compaction levels to an enum with values rather than separate consts
Bonuses:

- Type checking
- Validation
- Less casting
- Exhaustiveness checking
- Less use of the numerical value
2022-07-13 11:30:36 -04:00
Andrew Lamb 64b6b4fd6f
feat: skip ingester buffering if INFLUXDB_IOX_INGESTER_SKIP_BUFFER is set (#5115)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-13 14:21:06 +00:00
Andrew Lamb c46e1c6347
chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021)
* fix: correct nullability declaration of system tables

* chore: Update datafusion and arrow/parquet/arrow-flight

* chore: Run cargo hakari tasks

* fix: Update tests

* fix: Update tests

* fix: predicate pruning

* fix: add some tests

* fix: query_functions

* fix: fix read_buffer test

* fix: fix clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-07 19:22:15 +00:00
Marco Neumann aacdeaca52
refactor: prep work for #5032 (#5060)
* refactor: remove parquet chunk ID to `ChunkMeta`

* refactor: return `Arc` from `QueryChunk::summary`

This is similar to how we handle other chunk data like schemas. This
allows a chunk to change/refine its "believe" over its own payload while
it is passed around in the query stack.

Helps w/ #5032.
2022-07-07 13:21:48 +00:00
Andrew Lamb 8f5210ea3e
test: add test for "duration since production" in kafka `write_buffer` implementation (#5043)
* test: add test for timestamps in kafka write buffer

* refactor: move timestamp batching test to generic tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-07 10:27:27 +00:00
Marco Neumann 16bd3e67c0
refactor: unify `apply_predicate_to_metadata` (#5030)
Instead of using some hand-rolled timestamp-based logic (or just
"unknown") all over the place, just use logic introduced in #5017.

This requires slightly improved table summaries within the querier that
at least has min/max for the timestamp column. For that, the former
`IngesterChunk`-specific `calculate_summary` method was extended to
`create_basic_summary` to include that data and is now also used by
`QuerierParquetChunk`.

Note: `QuerierRBChunk` already has detailled metrics that are provided
by the read buffer implementation.

Should we ever need even better pruning for `QuerierParquetChunk` (or
`IngesterChunk`) then we _only_ need add extra data to the table
summaries.

Closes #4976.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-05 12:51:59 +00:00
Marco Neumann be53716e4d
refactor: use IDs for `parquet_file.column_set` (#4965)
* feat: `ColumnRepo::list_by_table_id`

* refactor: use IDs for `parquet_file.column_set`

Closes #4959.

* refactor: introduce `TableSchema::column_id_map`
2022-06-30 15:08:41 +00:00
Raphael Taylor-Davies 835e1c91c7
chore: update object_store to 0.3.0 (#4707)
* chore: update object_store to 0.3.0

* chore: review feedback

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-29 21:44:03 +00:00
Markus Westerlind edf3f08e81 refactor: Replace all uses of lazy_static with once_cell
Went through and remove all lazy_static uses with once_cell (while waiting for the project to compile). There are still dependencies using lazy_static so it is still in the crate graph but at least there isn't an explicit dependency on it (and it is easier to update to `std::lazy::Lazy` once that is stable).
2022-06-29 16:22:02 +02:00
Nga Tran cfcc4b8426
refactor: change level 1 to level 2 preparing for next design changes (#4954)
* refactor: change level 1 to level 2 preparing for next design changes

* fix: make level-2 consistent everywhere

* chore: remove unused comments

* refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent

* chore: add correspinding constants for the comapction levels in the comments

Co-authored-by: Dom <dom@itsallbroken.com>
2022-06-29 14:08:58 +00:00
Andrew Lamb bfddb032ce
docs: improve docs for `persist_partition_size_threshold_bytes` / `INFLUXDB_IOX_PERSIST_PARTITION_SIZE_THRESHOLD_BYTES` (#4877)
* docs: improve docs for `persist_partition_size_threshold_bytes` / `INFLUXDB_IOX_PERSIST_PARTITION_SIZE_THRESHOLD_BYTES`

* docs: improve comments about LifecycleConfig::partition_size_threshold

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-27 21:52:40 +00:00
Marco Neumann 215f297162
refactor: parquet file metadata from catalog (#4949)
* refactor: remove `ParquetFileWithMetadata`

* refactor: remove `ParquetFileRepo::parquet_metadata`

* refactor: parquet file metadata from catalog

Closes #4124.
2022-06-27 15:38:39 +00:00
Nga Tran 3c0fb6e8ef
fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead (#4942)
* fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* chore: run fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-23 19:00:13 +00:00
Andrew Lamb 49b34e1135 test: add appropriate tests 2022-06-23 11:50:55 -04:00
Andrew Lamb fb4c3ed294 fix: revert test change 2022-06-23 11:34:59 -04:00
Dom Dwyer 9a79d16585 fix: account for partition memory until persisted
The ingester maintains a rough "total memory in use" counter it uses to
try and limit the amount of memory the ingester is using overall.

When a partition is persisted, this total memory usage value is adjusted
to account for releasing the partition memory. Prior to this commit, the
ordering was:

* Writes increase the memory counter
* maybe_persist() is called to trigger persistence
* A partition is identified for persistence
* Partition memory usage is released back to the total memory counter
* Persistence starts

This meant that the partitions in the process of being persisted were
not accounted for in the ingester's total memory counter, and therefore
we could significantly overrun the configured memory limit.

After this commit, the ordering is:

* Writes increase the memory counter
* maybe_persist() is called to trigger persistence
* A partition is identified for persistence
* Persistence starts
* Persistence completes
* Partition memory usage is released back to the total memory counter

This ensures persisting partitions are sill tracked in the total memory
counter, causing pauses to correctly fire.
2022-06-23 15:40:51 +01:00
Dom Dwyer 87af3848d1 refactor: remove unused errors
These errors are not referenced, but are hidden from the "unused" lint
because of the macro magic code generation.
2022-06-23 11:24:30 +01:00
Andrew Lamb 16c558e11e
refactor: Make some structures in `LifecycleManager` non pub (#4929)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-23 09:55:39 +00:00
Dom Dwyer 75a3fd5e1e refactor: use propagated partition key in ingester
Changes the ingester to use the partition key derived in the router, and
transmitted over through the kafka API boundary.

This should have no observable behavioural change, but be more resilient
as we're no longer assuming the partitioning algorithm produces the same
value in both the router (where data is partitioned) and the ingester
(where data is persisted, segregated by partition key).

This is a pre-requisite to allowing the user to specify partitioning
schemes.
2022-06-21 15:57:30 +01:00
Marco Neumann c3912e34e9
refactor: store per-file column set in catalog (#4908)
* refactor: store per-file column set in catalog

Together with the table-wide schema and the partition-wide sort key, this should
be everything we need to read a parquet file directly into memory
without peeking any file-level metadata.

The querier will use this to directly load parquet files into the read
buffer.

**WARNING: This requires a catalog wipe!**

Ref #4124.

* refactor: use proper `ColumnSet` type
2022-06-21 10:26:12 +00:00
Andrew Lamb f151b1e89f
fix: categorize `NamespaceNotFound` as ingester not found errors as well (#4899)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-20 08:40:31 +00:00
Marco Neumann 0fbff981ec
chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894)
Closes #4889.
Closes #4890.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-17 10:28:28 +00:00
Marco Neumann 743c1692ea
refactor: stream query results from ingester to querier (#4875)
* refactor: stream partitions from ingester

Ref #4849.

* refactor: do not collect record batched on the ingester side

Ref #4849.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:58:50 +00:00
Andrew Lamb d67336fd69
fix(ingester): ensure all ingester metrics are prefixed with `ingester_` (#4871)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:52:35 +00:00
Andrew Lamb 74f4006580
fix(ingester): make ingester metrics start with `ingester` (#4870)
* fix(ingester): make ingester metrics start with `ingester`

* fix: Update ingester/src/stream_handler/handler.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:46:37 +00:00
Andrew Lamb 8c56909218
fix(ingester): Distinguish between "not found" and other flight errors (#4874)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 12:39:37 +00:00
Marco Neumann 4b945493be
test: test gRPC and stream flattening (#4873)
Ref #4849.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-16 11:44:59 +00:00
Marco Neumann 66c7d95312
refactor: use new ingester<>querier wire protocol (#4867)
* refactor: use new ingester<>querier wire protocol

Use and document the new and more flexible ingester<>querier wire
protocol.

Note that the ingester does NOT stream the response data yet, but the
internal data structures would allow that. A follow-up change will
adjust the ingester code to stream the data.

Ref #4849.

* fix: typos

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: clarify naming and public interface

* test: add schema assertion to `ingester_response_to_record_batches`

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-06-16 08:02:28 +00:00
Andrew Lamb 6b771375bf
feat: log when partitions are written due to going over size (#4868) 2022-06-15 20:12:43 +00:00
Dom Dwyer 4df2964566 refactor: store PartitionKey in DmlWrite
Carry the PartitionKey in the DmlWrite, allowing the batch to be
associated with a specific partition key.
2022-06-15 15:48:54 +01:00
Marco Neumann 7c60edd38c
refactor: prepare new ingester<>querier protocol on the querier side (#4863)
* refactor: prepare new ingester<>querier protocol on the querier side

This changes the querier internals to work with the new protocol. The
wire protocol stays the same (for now). There's a (somewhat hackish)
adapter in place on the querier side that converts the old to the new
protocol on-the-fly. This is an intermediate step before we actually
change the wire protocol (and in a step after that also take advantage
of the new possibilites on the ingester side).

Ref #4849.

* docs: explain adapter
2022-06-15 14:32:24 +00:00
Andrew Lamb 005610b172
refactor: remove some `&` use in iox_catalog (#4862)
* refactor: remove some `&` use in iox_catalog

* fix: Update data_types/src/lib.rs
2022-06-15 11:31:49 +00:00
Nga Tran b682dbbc2e
chore: Add debug info of sort_key for ingester (#4859)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 20:39:17 +00:00
Andrew Lamb c8f70b8933
feat: log query from querier to ingester at `info` level (#4856)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 18:35:50 +00:00
Andrew Lamb eca3b6b9a1
fix: reduce memory usage in ingester with less buffering prior to query engine (#4830)
* refactor: remove another buffer copy in ingester

* docs: Update arrow_util/src/util.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 18:22:55 +00:00
Andrew Lamb 7d2a5c299f
refactor: remove one buffer copy in the ingester (#4855)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 17:15:36 +00:00
Andrew Lamb e91d00b10c
chore: Update datafusion + `arrow`/`parquet`/`arrow-flight` to `16.0.0 (#4851)
* chore: TEMP Update DataFusion to pre-release

* chore: update arrow et al to 16.0.0

* chore: Run cargo hakari tasks

* fix: update reader read_dictionary API

* chore: Update to real Datafusion release

* fix: Update parquet API

* fix: update test

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-06-14 16:31:40 +00:00
Andrew Lamb 34e8659876
refactor: consolidate plan creation from `QueryChunk`s in `iox_query` (#4837)
* refactor: consolidate plan creation from Chunks

* docs: update docstrings

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 14:36:07 +00:00
Dom Dwyer b41ea1d718 refactor: PartitionKey type
This commit changes the code base to use a new reference-counted
PartitionKey type wrapper, instead of passing a bare String around.

This allows the compiler to type check & verify usage of the partition
key, instead of passing a bare string around. By reference counting the
underlying string, we reduce memory usage for some use cases.
2022-06-14 14:47:56 +01:00
Andrew Lamb 9fdbfb05e7
refactor: Use scan_and_filter in ReorgPlanner (#4822)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-10 17:31:25 +00:00
kodiakhq[bot] dd8d44e24f
Merge branch 'main' into cn/duration 2022-06-10 14:23:09 +00:00
Nga Tran 13c57d524a
feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801)
* feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string

* test: add column with comma

* fix: use new protonuf field to avoid incompactible

* fix: ensure sort_key is an empty array rather than NULL

* refactor: address review comments

* refactor: address more comments

* chore: clearer comments

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* fix: Rename migration so it will be applied after

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2022-06-10 13:31:31 +00:00
Andrew Lamb dc992209be
test: account for active writes when reporting readable status (#4782)
* test: account for active writes when reporting readable status

* fix: logical merge conflict
2022-06-10 12:59:09 +00:00
Andrew Lamb 11cec18edc
refactor: Move `scan_and_filter` into a `common` module for reuse (#4823)
* refactor: remove unused error variants

* refactor: move scan_and_filter into a module so it can be reused

* docs: update comments about pruning
2022-06-10 11:15:47 +00:00
Andrew Lamb 50697906b1
refactor: Make `DMLWrite::sequence_number` a `SequenceNumber` (#4817) 2022-06-09 19:36:37 +00:00