Commit Graph

505 Commits (50d9d4032206c374064747791fa64e1c17409e83)

Author SHA1 Message Date
Marco Neumann f511db380c
refactor: remove table name from chunks (#6063)
It should be always clear from the context to which table a chunk
belongs.

I think having a table name bound to a chunk goes back to a time where
chunks had multiple tables.

Helps with #6049.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 10:42:57 +00:00
YIXIAO SHI 586035b34d
chore: delete metric duplicate character (#6057)
* chore: delete metric duplicate character

* fix: failure ci test case

* fix: failure ci test case

* fix: failure ci test case

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-07 10:04:31 +00:00
Dom Dwyer 6fa48731aa feat: NamespaceId in DmlDelete
Changes the DmlDelete to contain the NamespaceId for which it should be
applied, propagating this value over the wire.

Like the existing IDs within the DmlWrite, these values are marked
unsafe to use due to avoid the consumers utilising them accidentally
during deployment. Unlike DmlWrite, the DmlDelete is completely unused,
so this is less of an issue.
2022-11-03 13:57:40 +01:00
Dom Dwyer 30f69ce4f6 feat: ArcMap values() snapshot
Returns a snapshot of the values within an ArcMap.
2022-11-03 11:49:01 +01:00
Dom Dwyer 17890a9906 feat: add ArcMap map type
Implements a map of K -> Arc<V> with exactly-once initialisation
semantics.

This map can be used to ensure a given key maps to singleton instances
of V; exactly what all the nodes in the ingester "buffer tree" of shard
-> namespace -> table -> partition require.

This impl contains unused funcs (silenced with an allow(dead_code)) due
to it being picked from a future branch.
2022-11-03 11:29:09 +01:00
Andrew Lamb 4fb2843d05
refactor: Rename `schema::selection::Selection` to `schema::projection::Projection` (#6037)
* chore: Rename `schema::selection::Selection` to `schema::projection::Projection`

* fix: docs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 18:15:04 +00:00
Dom Dwyer ddd6ab0ba4 refactor(write_buffer): pass IDs in wire format
This commit is part of a two-part change in order to add the table &
namespace IDs to the write buffer wire format. This commit forms the
first half; changing the producer to send the IDs.

In this commit the new ID values are never read on the consumer side,
ensuring there is no consumer dependency on them. This ensures they
remain operational during a rollout, where the consumer may be updated
to the latest code dependent on the IDs before the producer is updated
to send them. This also ensures we have a window of time where where the
consumers can be rolled back after being updated, and still handle
replaying messages in Kafka.
2022-11-02 13:28:56 +01:00
Marco Neumann 45b3984aa3
refactor: simplify `QueryChunk` data access (#6015)
* refactor: simplify `QueryChunk` data access

We have only two types for chunks (now that the RUB is gone):

1. In-memory RecordBatches
2. Parquet files

Loads of logic is duplicated in the different `read_filter`
implementations. Also `read_filter` hides a solid amount of logic from
DataFusion, which will prevent certain (future) optimizations. To enable #5897
and to simplify the interface, let the chunks return the data (batches
or metadata for parquet files) directly and let `iox_query` perform the
actual heavy-lifting.

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

* docs: improve

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-02 08:18:33 +00:00
Marco Neumann 072439e428
refactor: mandatory `QueryChunkMeta::summary` (#5997)
With #5963 merged, all chunks now provide a summary (even though it may
not contain data for all columns). So let's make it mandatory, which
also removes a few 🙈-style `.except(...)` calls.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-31 16:38:02 +00:00
Carol (Nichols || Goulding) dad1ad1318
feat: Add the catalog service to ingester, querier, and compactor
So that `remote get` that uses the catalog service can work no matter
what kind of server you contact.
2022-10-28 10:49:26 -04:00
Andrew Lamb e9d04ffcb5
feat: Log how long each persist plan takes to complete (#5989)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-28 13:52:39 +00:00
kodiakhq[bot] 1567227b49
Merge branch 'main' into dom/require-partition-key 2022-10-28 10:31:22 +00:00
Marco Neumann 8447d46093
refactor: remove `QueryChunkMeta::timestamp_min_max` (#5963)
Use the table summary instead. This allows us to have a single mechanism
that both IOx and DataFusion understand. This basically lifts the "basic
table summary" mechanism that the querier uses to `iox_query` and let
the compactor and ingester use the same mechanism.

While not strictly necessary, simplifying the `QueryChunk[Meta]`
interface helps with #5897.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-28 10:29:16 +00:00
Dom Dwyer 72a358e52f refactor(dml): PartitionKey required for writes
Changes the DmlWrite type to require a PartitionKey be specified,
instead of accepting an Option.

This requirement was already in place - the write buffer upheld an
invariant that all writes contained a partition key value (was not
"None") or it panicked at runtime when attempting to enqueue the write.

It is now possible to encode this invariant in the type system, which is
what this change does.
2022-10-28 10:57:30 +02:00
Dom Dwyer 5d2f4a0ad1 docs: fix issue URL for memory tracking bug 2022-10-27 10:15:15 +02:00
Dom Dwyer f6416675c2 docs: mark hyperlink in rustdoc comments 2022-10-27 10:15:15 +02:00
Dom Dwyer 678fb81892 refactor(ingester): use partition buffer FSM
This commit makes use of the partition buffer state machine introduced
in https://github.com/influxdata/influxdb_iox/pull/5943.

This commit significantly changes the buffering, and querying, of data
from a partition, swapping out the existing "DataBuffer" for the new
state machine implementation (itself simplified due to temporary lack of
incremental snapshot generation, see #5944).

This commit simplifies the query path, removing multiple types that
wrapped one-another to pass around various state necessary to perform a
query, with various query functions needing different types or
combinations of types. The query path now operates using a single type
(named "QueryAdaptor") that provides a queryable interface over the set
of RecordBatch returned from a partition.

There is significantly increased testing of the PartitionData itself,
covering data in various states and the ordering of returned RecordBatch
(to ensure correct materialisation of updates). There are also
invariants upheld by the type system / compiler to minimise the
complexities of working with empty batches & states, and many asserts
that ensure (mostly existing!) invariants are upheld.
2022-10-27 10:15:15 +02:00
Dom Dwyer 39f826518b revert: use histogram to record TTBR
This reverts commit c63312ce12.

This change fixed a low-priority alert when there was no traffic flowing
through the system. The loss in TTBR value fidelity due to bucketing is
a greater concern as it affects live, high-volume clusters and hinders
operational insight.
2022-10-24 10:27:22 +02:00
Dom Dwyer 7b3fa43209 refactor: disable incremental snapshot generation
This commit removes the on-demand, incremental snapshot generation
driven by queries.

This functionality is "on hold" due to concerns documented in:

    https://github.com/influxdata/influxdb_iox/issues/5805

Incremental snapshots will be introduced alongside incremental
compactions of those same snapshots.
2022-10-21 17:41:43 +02:00
Dom db83053be7
Merge branch 'main' into dom/buffer-fsm 2022-10-21 16:32:54 +01:00
Dom Dwyer 8ca72ceff1 docs: fix state mod comments 2022-10-21 17:32:19 +02:00
Dom Dwyer c8fdd76033 feat(ingester): partition buffer state machine
This commit introduces code that is intended to replace the current
implicit state machine used by PartitionData. The existing code is still
in use, the new code is NOT used in this commit. A follow-up commit will
switch over to minimise the diff.

This change has two main goals;
    * encapsulation & simplification for callers
    * robust implementation so developing correct additions is easier

This is a significant refactor of the partition buffering logic to
encapsulate the various states of data (buffering, snapshot, persisting
and the mixed states between them) within the Partition. This alleviates
the rest of the system from having to be concerned with the differences
between "buffering" data, and "unpersisted data", "snapshot data",
"persisting data", "persisting with snapshots" etc - callers now invoke
a method called get_query_data() and they are provided with all the
relevant data for a partition. This abstraction change alone
significantly reduces code and test complexity in the rest of the
ingester.

For the second goal, the new implementation leverages an explicit state
machine, encoded using typestates. Typestate ensures compile-time
correctness of transitions and method calls, and the explicit FSM itself
helps ensure the system progresses in the desired manner - this fixes
and helps prevent bugs caused by implicit states such as:

    https://github.com/influxdata/influxdb_iox/issues/5805

This state machine makes the system states explicit and
self-descriptive, helping to reduce the cost of developer on-boarding
(no prior knowledge of "how this bit works") and reduces ongoing
developer burden. This explicit nature also de-risks adding new
functionality - it should be relatively easy to add concurrent snapshot
generation or incremental compaction without introducing bugs. The state
transition logic is abstracted away from callers, minimising the
overhead of this strategy.
2022-10-21 14:25:51 +02:00
Carol (Nichols || Goulding) 59e1c1d5b9
feat: Pass trace id through Flight requests from querier to ingester
Fixes #5723.
2022-10-20 08:55:30 -04:00
Andrew Lamb 83e3a96c19
fix: improve ttbr histogram metric description (#5909)
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-20 09:03:58 +00:00
Dom ea7b4a0de6
Merge branch 'main' into dom/ingester-integration-tests 2022-10-19 13:36:54 +01:00
Marco Neumann eb5a661ab3
refactor: prep work for #5897 (#5907)
* refactor: add ID to `ParquetStorage`

* refactor: remove duplicate code

* refactor: use dedicated `StorageId`
2022-10-19 11:54:42 +00:00
Dom Dwyer 0c0a38c484 refactor: more verbose shard reset logs
Adds a little more context to the "shard reset" logs.
2022-10-19 12:28:02 +02:00
Dom Dwyer 40f1937e63 test: write buffer seeking tests
Asserts write buffer seeking behaviour, including:

    * Seeking past already persisted data correctly
    * Skipping to next available op in non-contiguous offset stream
    * Skipping to next available op for dropped ops due to retention
    * Panics when seeking beyond available data (into the future)

Removes a pair of tests that covered some of the above due to their
tight coupling with ingester internals.
2022-10-19 12:28:02 +02:00
Dom Dwyer 7729494f61 test: write, query & progress API coverage
This commit adds a new test that exercises all major external APIs of
the ingester:

    * Writing data via the write buffer
    * Waiting for data to be readable via the progress API
    * Querying data and and asserting the contents

This should provide basic integration coverage for the Ingester
internals. This commit also removes a similar test (though with less
coverage) that was tightly coupled to the existing buffering structures.
2022-10-19 11:51:15 +02:00
Dom Dwyer b12d472a17 test(ingester): add integration TestContext
Adds a test helper type that maintains the in-memory state for a single
ingester integration test, and provides easy-to-use methods to
manipulate and inspect the ingester instance.
2022-10-19 11:51:15 +02:00
Dom Dwyer d0b546109f refactor: impl converting IngesterQueryResponse
An existing function to map the complex IngesterQueryResponse type to a
simple set of RecordBatch existed in test code - this has been lifted
onto an inherent method on the response type itself for reuse.
2022-10-19 11:51:15 +02:00
Dom Dwyer c63312ce12 refactor: use histogram to record TTBR
Changes the TTBR metric from a gauge to a histogram so that observations
maintain a time dimension.
2022-10-18 16:29:09 +02:00
Andrew Lamb 8021b8be0b
fix: Use Display rather than Debug when logging errors (#5859)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-14 14:43:11 +00:00
Luke Bond 475c8a0704
fix: only emit ttbr metric for applied ops (#5854)
* fix: only emit ttbr metric for applied ops

* fix: move DmlApplyAction to s/w accessible

* chore: test for skipped ingest; comments and log improvements

* fix: fixed ingester test re skipping write

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-14 12:06:49 +00:00
Carol (Nichols || Goulding) efb964c390
feat: Enforce table column limits from the schema cache (#5819)
* fix: Avoid some allocations by collecting instead of inserting into a vec

* refactor: Encode that adding columns is for one table at a time

* test: Add another test of column limits

* test: Add below/above limit tests for create_or_get_many

* fix: Explicitly DO NOT check column limits when inserting many columns

* feat: Cache the max_columns_per_table on the NamespaceSchema

* feat: Add a function to validate column limits in-memory

* fix: Provide more useful information when over column limits

* fix: Swap types to remove intermediate allocation

* docs: Explain the interactions of the cache and the column limits

* test: Actually set up test that showcases column limit race condition

* fix: Allow writing to existing columns even if table is over column limit

Co-authored-by: Dom <dom@itsallbroken.com>
2022-10-14 11:34:17 +00:00
Andrew Lamb 9134ccd6c3
chore: Update datafusion again (#5855)
* chore: Update datafusion

* chore: Updates for changes in datafusion

* chore: more updates

* fix: update doc example

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-10-13 19:18:57 +00:00
kodiakhq[bot] 3039b5877b
Merge branch 'main' into dom/no-persist-lookups 2022-10-13 15:13:36 +00:00
Dom Dwyer 86d28d3359 fix: update cached sort key
Once persist() has successfully updated the sort key in the catalog, set
the partition sort key cache to reflect the new value.
2022-10-13 17:12:07 +02:00
Dom Dwyer 9c40d80032 refactor(ingester): log shard_id in op result
Include the shard ID in the op apply result to correlate it with other
log messages.
2022-10-13 15:41:48 +02:00
Dom Dwyer 3e70dc44a0 refactor(catalog): remove partition_info_by_id()
This method used to return a subset of partition metadata, and was used
exclusively for persistence in the ingester. It is now no longer
necessary.
2022-10-13 15:26:36 +02:00
Dom Dwyer 3fbeaa1314 refactor: assert monotonic partition persistence
Copies the existing monotonic partition persistence check into the
partition too - this ensures that even if the partitions are persisted
in order, they are never marked as persisted OUT of order.
2022-10-13 15:26:36 +02:00
Dom Dwyer 920f7edf75 refactor: defer querying for table schema
Do not query for the table schema until it is needed.
2022-10-13 15:26:36 +02:00
Dom Dwyer e556677192 perf(ingester): remove persist lookup queries
Removes the catalog queries previously used to look up various
information about the partition/table/namespace that was already in
memory.

As part of this change, the compaction helper function is changed to
accept the inputs it needs, rather than a struct of data from the
catalog - this significantly simplifies testing.

This commit also adds additional context to all log messages in the
persist() fn.
2022-10-13 15:26:36 +02:00
Dom Dwyer 10d77b0ef7 refactor: use deferred sort key loading
Changes the persist() implementation in the ingester to load the sort
key using the deferred loading mechanism, instead of on-demand.
2022-10-13 15:26:36 +02:00
Dom Dwyer dbcbb5b824 refactor: include sequence numbers in apply() logs
Include the op sequence number in the error/success apply() log
messages.
2022-10-13 14:19:02 +02:00
Dom Dwyer 15e153a74c perf(ingester): cheaper table discovery
This commit changes the table ID lookup query from an expensive,
JOIN multi-query to a simple, single table, indexed lookup.

As this is on the hot path, this should help with the recovery rate of
the ingesters.
2022-10-13 13:44:50 +02:00
Luke Bond 11900cea4d
chore: add some tracing logs to the ingester (#5839) 2022-10-12 12:10:20 +00:00
Dom Dwyer b294bb98aa refactor: move query types to query_handler
Moves types that are only used for handling queries to the query_handler
module.
2022-10-11 17:58:55 +02:00
Dom Dwyer c4f542bbe2 refactor(ingester): remove tombstone support
This commit removes tombstone support from the ingester, and deletes
associated code/helpers/tests. This commit does NOT remove tombstone
support from any other service, but MAY include removing overlapping
test coverage.

This also removes the tombstone support from the Ingester -> Querier RPC
response message.

This has the nice side effect of removing a whole lot of thread spawning
in the ingester tests for the Executor, speeding everything up!
2022-10-11 13:10:04 +02:00
Luke Bond fda1479db0
chore: add trace log to ingester to aid debugging (#5829) 2022-10-11 10:33:42 +00:00
Dom Dwyer 97c6e0f8ce refactor: use TableName, not Arc<str>
Adds a type wrapper TableName, internally an Arc<str> to leverage the
type system instead of passing around untyped strings.
2022-10-10 19:09:43 +02:00
Dom Dwyer 4518bd49d1 test: constify duration seconds 2022-10-10 14:39:35 +02:00
Dom Dwyer ab78f99ab2 refactor: eager background task abort
Changes the get() code path to abort the background load task when the
caller will resolve the sort key.

Note that an aborted future will leave the DeferredSortKey without a
background task to fetch the key, and the next caller will have to query
the catalog. Given the rarity of aborted futures, and desire to minimise
catalog load, this seems like a decent trade-off.

This commit also documents the many-readers eager loading problem.
2022-10-10 14:39:35 +02:00
Dom Dwyer afcb96ae47 perf(ingester): deferred sort key lookup queries
This commit carries the SortKey in the PartitionData, and configures the
ingester to use deferred sort key lookups, smearing the lookups across a
fixed period of time after initialising the PartitionData, instead of
querying for the sort key at persist time.

This allows large numbers of PartitionData to be initialised without
causing a equally large spike in catalog load to resolve the sort key -
instead this load is spread out randomly to reduce peak query rps.
2022-10-06 16:39:54 +02:00
Dom Dwyer c022ab6786 feat: deferred partition sort key fetcher
Adds a new DeferredSortKey type that fetches a partition's sort key from
the catalog in the background, or on-demand if not yet pre-fetched.

From the caller's perspective, little has changed compared to reading it
from the catalog directly - the sort key is always returned when calling
get(), regardless of the mechanism, and retries are handled
transparently. Internally the sort key MAY have been pre-fetched in the
background between the DeferredSortKey being initialised, and the call
to get().

The background task waits a (uniformly) random duration of time before
issuing the catalog query to pre-fetch the sort key. This allows large
numbers of DeferredSortKey to (randomly) smear the lookup queries over a
large duration of time. This allows a large number of DeferredSortKey to
be initialised in a short period of time, without creating an equally
large spike in queries against the catalog in the same time period.
2022-10-06 16:37:04 +02:00
kodiakhq[bot] ffa1704d96
Merge branch 'main' into dom/namespace-name 2022-10-06 13:58:47 +00:00
Marco Neumann c4c83e0840
fix: query error propagation (#5801)
- treat OOM protection as "resource exhausted"
- use `DataFusionError` in more places instead of opaque `Box<dyn Error>`
- improve conversion from/into `DataFusionError` to preserve more
  semantics

Overall, this improves our error handling. DF can now return errors like
"resource exhausted" and gRPC should now automatically generate a
sensible status code for it.

Fixes #5799.
2022-10-06 08:54:01 +00:00
Dom Dwyer abb9122e2c refactor: carry namespace name in NamespaceData
Changes the ingester's NamespaceData to carry a ref-counted string
identifier as well as the ID.

The backing storage for the name in NamespaceData is shared with the
index map in ShardData, so it is effectively free!
2022-10-05 13:03:16 +02:00
Dom Dwyer 1a7eb47b81 refactor: persist() passes all necessary IDs
This commit changes the persist() call so that it passes through all
relevant IDs so that the impl can locate the partition in the buffer
tree - this will enable elimination of many queries against the catalog
in the future.

This commit also cleans up the persist() impl, deferring queries until
the result will be used to avoid unnecessary load, improves logging &
error handling, and documents a TOCTOU bug in code:

    https://github.com/influxdata/influxdb_iox/issues/5777
2022-10-04 14:28:01 +02:00
Dom Dwyer f9bf86927d refactor: ref PartitionData by key & ID
Changes the TableData to hold a map of partition key -> PartitionData,
and partition ID -> PartitionData simultaneously. This allows for cheap
lookups when the caller holds an ID.

This commit also manages to internalise the partition map within the
TableData - one less pub / peeking!

This commit also switches from a BTreeMap to a HashMap as the backing
collection, as maintaining key ordering doesn't appear to be necessary.
2022-10-04 14:28:01 +02:00
Dom Dwyer 0847cc5458 refactor: PartitionData::id() -> partition_id()
Consistent naming is consistent - all the others are thing_id().
2022-10-04 14:28:01 +02:00
Dom Dwyer 66e05b5ea7 refactor: ref NamespaceData by name & ID
Changes the ShardData to hold a map of namespace name -> NamespaceData,
and namespace ID -> NamespaceData simultaneously.

This allows for cheap lookups when the caller holds an ID, and is part
of preparatory work to transition away from using string names in the
ingester for tables.

This commit also switches from a BTreeMap to a HashMap as the backing
collection, as maintaining key ordering doesn't appear to be necessary.
2022-10-04 14:28:01 +02:00
Dom Dwyer 9c0e4e98c4 refactor: ref TableData by name & ID
Changes the NamespaceData to hold a map of table name -> TableData, and
table ID -> TableData simultaneously.

This allows for cheap lookups when the caller holds an ID, and is part
of preparatory work to transition away from using string names in the
ingester for tables.

This commit also switches from a BTreeMap to a HashMap as the backing
collection, as maintaining key ordering doesn't appear to be necessary.
2022-10-04 14:28:01 +02:00
Dom Dwyer 7efd81a63a docs: comment write record ordering 2022-10-03 12:23:30 +02:00
Dom Dwyer b23ad31711 fix: spurious memory accounting for failed write
Fixes a case where the ingester may incorrectly record a write as having
been buffered in memory, when in fact the buffering failed.

This could cause the effective buffer size to be reduced over time as
more and more data is spuriously "added" to the buffer, but never
released back to the memory tracker as it is never persisted.
2022-10-03 12:13:43 +02:00
Dom Dwyer 20451921d0 test: MockLifecycleHandle captures calls
Changes the NoopLifecycleHandle to MockLifecycleCall, and adds code
causing it to log all calls made to the log_write() method.

This will allow tests to assert calls and their values in DML buffering
tests.
2022-10-03 12:13:43 +02:00
Dom Dwyer 7dd28f4230 test: simplify PartitionProvider mock
The PartitionKey is now part of the PartitionData, so there is no need
to specify the redundant ID when configuring the mock.
2022-09-30 16:32:39 +02:00
Dom Dwyer c33499764d test: share populate_catalog() across tests
Parametrises test_util::populate_catalog() and exports for re-use in
ingester tests.
2022-09-30 16:32:37 +02:00
Dom Dwyer fc47f6ab8f test: re-use test_utils::make_op
Share the make_op helper across all tests in the Ingester.
2022-09-30 16:32:36 +02:00
Dom Dwyer f0885612e9 test: shared mock LifecycleHandle impl
Moves the NoopLifecycleHandle to the Ingester's test_utils to share it
across multiple components.
2022-09-30 16:32:34 +02:00
Dom Dwyer e84186763f refactor: LifecycleStats tracks Namespace/TableId
Changes the lifecycle handle to also track the namespace + table ID in
addition to the existing shard ID.

Adds asserts to ensure the values never vary for a given partition.
2022-09-30 15:29:39 +02:00
Dom Dwyer 726b1d1d3b refactor: PartitionData carries parent IDs
This commit changes the PartitionData buffer structure to carry the IDs
of all its parents - the table, namespace, and shard. Previously only
the table & shard were carried.
2022-09-29 15:07:03 +02:00
Dom e9bd03b77c
Merge branch 'main' into dom/partition-contains-key 2022-09-29 12:32:35 +01:00
Dom Dwyer f5a7fbf8e2 refactor: PartitionData carries PartitionKey
Changes the PartitionData to carry the derived PartitionKey for which it
is buffering ops for. This is used at persist time.
2022-09-29 13:22:50 +02:00
Dom Dwyer cd4087e00d style: add no todo!() or dbg!() lints
Some crates had theme, some not - lets be consistent and have the
compiler spot dbg!() and todo!() macro calls - they should never be in
prod code!
2022-09-29 13:10:07 +02:00
kodiakhq[bot] 54e68637dc
Merge branch 'main' into dom/partition-cache 2022-09-28 15:22:40 +00:00
Dom Dwyer 82b7479f97 refactor(write_buffer): seek error at seek time
Moves the "you've tried to seek into the future!" error to the point at
which the seek attempt was made.

This makes more sense than deferring the seek error until read time, and
is easier to determine this is the case rather than at read time (where
the read response error contains an invalid high_watermark value of -1,
making it impossible to conclusively determine what has happened).
2022-09-28 16:44:59 +02:00
Dom Dwyer 5f2f735c7e fix: spurious watermark < read offset panic
In staging we observed an ingester panic due to the write buffer stream
yielding an WriteBufferErrorKind::SequenceNumberAfterWatermark,
suggesting the ingester was attempting to read from an offset that
exceeds the current max write offset in Kafka (high watermark offset).

This turned out not to be the case - the partition had a single write at
offset 2, and the ingester was attempting to seek to offset 1. The first
read would fail (offset 1 does not exist) and the error handling did not
account for the high watermark not being correctly set (-1 in the
response).

I have no idea why rskafka returns this watermark / doesn't retry / etc
but this change will allow the ingesters to recover.
2022-09-28 15:22:34 +02:00
Dom Dwyer 8cf81f457a perf(ingester): amortise Partition cache memory
Remove each cache hit from the partition cache, as each partition should
be looked up at most once.

This amortises the memory usage of the cache, as it should be "drained"
of hot partitions.
2022-09-27 17:16:18 +02:00
Dom Dwyer 1311a8746d refactor(ingester): use Partition cache
Cache the 10,000 most recent partitions at startup, and share them
across all shards.

At commit time, there are approx ~8,000 partitions per day, per
ingester, so this should cache all of the partitions for a given day so
far at startup.
2022-09-27 17:15:59 +02:00
Dom Dwyer 2068ff394b perf(ingester): cache Partition
This commit implements a PartitionCache decorator over the
PartitionProvider abstraction.

When an ingester starts up, the internal data structures are empty and
are lazily initialised for each namespace / table / partition as they
are observed in the stream of DML ops.

This lazy initialisation includes resolving the Partition ID and last
persisted sequence number offset value from the catalog for each
partition in each table in each namespace for which an op is observed -
this occurs in the hot path, while blocking ingest for a shard.
resolving each partition will cause a catalog query, this can cause a
spike in queries against the catalog, also resulting in unnecessarily
slow ingester recovery - we're effectively lazily warming a cache of
PartitionData in the hot path!

Instead this cache can be used to pre-warm the N most recently created
partitions (which are likely to have ongoing writes) at startup to
eliminate the hot-path overhead and associated catalog queries.

NOTE: unlike most of the other hot-path queries, partition persist
offset resolution cannot be eliminated by changes to the Kafka wire
format.
2022-09-27 17:15:57 +02:00
Dom Dwyer a3d6e7a45a refactor(ingester): server-wide PartitionProvider
Lifts the PartitionProvider initialisation higher in the stack to a
point where a single instance can be used across all shards an ingester
manages.

This is a pre-requisite for sharing a cache of Partitions across all
shards.
2022-09-27 17:15:31 +02:00
Dom Dwyer 38ebd5fb20 test: simplify partition provider mock
Removes redundant fields from the MockPartitionProvider.
2022-09-27 17:11:13 +02:00
Dom 2ef04e99da
Merge branch 'main' into dom/non-pub-shard 2022-09-27 14:21:23 +01:00
Andrew Lamb 66dbb9541f
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 (#5694)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight`  to 23.0.0

* chore: Update thrift / remove parquet_format

* fix: Update APIs

* chore: Update lock + Run cargo hakari tasks

* fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-27 12:50:54 +00:00
Dom Dwyer b873297fad refactor(ingester): limit visibility
Marks many internal data structures as non-pub.

Many remain as they're used across tests / from multiple callers
"peeking", but this limits the scope of false sharing in the future.
2022-09-27 14:27:32 +02:00
Dom Dwyer 11be746dc0 refactor: internalise ShardData init
Move the initialisation of ShardData (an internal ingester data
structure) into the ingester itself.

Previously callers would initialise the ingester state, and pass it into
the IngesterData constructor.
2022-09-27 14:26:17 +02:00
Dom Dwyer 61aecc3044 refactor: decouple partition init from table
Removes the "how" of initialising a per-partition buffer structure
(PartitionData) from the per-table buffer (TableData).

This is a cleaner separation of concerns - a table buffer is responsible
for addressing and initialising per-table partitions as necessary, and
buffering of ops for them. It does not have to be concerned with the
series of steps necessary to look up the various bits of data in order
to construct a PartitionData.

This abstract provider can be layered up to provide more complex
behaviours - I intend to add a read-through cache impl that decorates
the catalog impl in this commit, which should eliminate most partition
queries at ingester startup utilising the indirection added here.
2022-09-26 14:35:15 +02:00
Carol (Nichols || Goulding) c8108f01e7
chore: Upgrade to Rust 1.64 (#5727)
* chore: Upgrade to Rust 1.64

* fix: Use iter find instead of a for loop, thanks clippy

* fix: Remove some needless borrows, thanks clippy

* fix: Use then_some rather than then with a closure, thanks clippy

* fix: Use iter retain rather than filter collect, thanks clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 18:04:00 +00:00
Marco Neumann 55ef272920
refactor: acquire table locks concurrently (#5722)
Waiting for one after the other (one per shard) in serial fashion
likely increases latency too much.
2022-09-22 10:56:22 +00:00
Marco Neumann 365a246f8d
refactor: do not run de-dup in ingester for querier requests (#5626)
* refactor: do not run de-dup in ingester for querier requests

This removes the entire de-dup logic from the inegster for querier
requests. Furthermore, it even removes the entire datafusion execution
from the querier and just dumps the in-memory record batches as quickly
as possible. No filters are applied. Note that even prior to this PR,
we've never applied projections (tracked by #5624).

**Pros:**

- speed up query planning within the querier (since we need the ingester
  response for state reconciling)
- lowered ingester CPU load

**Cons:**

- more querier<>ingester network traffic

Closes #5602.

* test: extend query test case

* fix: ingester tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 07:33:54 +00:00
Marco Neumann c66f16e4af
fix: ingester retries (#5708)
* fix: retry ingester requests faster

The retries introduced in #5695 are too slow and block the entire
querier for minutes (until the very long gRPC timeout kicks in).

* fix: add error details on why the query planning failed

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-21 09:27:47 +00:00
Dom Dwyer c6fe0dab3e refactor(ingester): reduced internal visibility
Changes many pub fields / methods to be pub(super), or if necessary,
pub(crate).

This helps maintain an internal API boundary for code hygiene, and helps
identify functions that are unused / only used in tests (which I've
annotated with cfg(test) and intend to remove - we should be driving
code under test via the public API rather than using test-only state
mutation, otherwise we're just testing our tests!)
2022-09-20 16:24:27 +01:00
Dom Dwyer 6d00d6b683 test(ingester): refactor querier API tests
This commit changes the prepare_data_to_querier() tests to drive the
ingester state by applying DML ops, therefore driving the prod code
paths (and testing them!) rather than having the tests set up what the
tests believe is the correct internal ingester state, and then asserting
on that state.

This gives us much better coverage of prod code paths, decouples the
tests from the internal state/representation of ingesters (making the
tests less fragile), and removes a bunch of special-cased, test-only
functions that are functionally similar, but not the same as, the prod
functions.

Unblocks #5658, further clean-up to come.
2022-09-20 16:24:27 +01:00
Dom Dwyer b0eb85ddd5 refactor: store ShardId in child nodes
Instead of passing the ShardId into each function for child nodes of the
Shard, store it. This avoids the possibility of mistakenly passing the
wrong value.
2022-09-16 18:00:11 +02:00
Dom Dwyer 07b08fa9cb refactor: add table name in PartitionData
A partition belongs to a table - this commit stores the table name in
the PartitionData (which was readily available at construction time)
instead of redundantly passing it into various functions at the risk of
getting it wrong.
2022-09-16 17:59:22 +02:00
Dom Dwyer c7ba0bea91 refactor: ShardId & TableId in PartitionData
When we construct a PartitionData we have the ShardId and TableId. This
commit stores them in the PartitionData for later use, rather than
repeatedly passing them in again when constructing snapshots, at the
risk of passing the wrong IDs.
2022-09-16 17:17:16 +02:00
Dom Dwyer 85d6efafe1 refactor: snapshot_to_persisting redundant ID
Partition::snapshot_to_persisting() passes the ID of the partition it is
calling `snapshot_to_persisting()` on. The partition already knows what
its ID is, so at best it's redundant, and at worst, inconsistent with
the actual ID.
2022-09-16 17:08:08 +02:00
Dom Dwyer ce0d189260 perf: O(1) partition persist mark discovery
Changes the ingest code path to eliminate scanning the parquet_files
table to discover the last persisted offset per partition, instead
utilising the new persisted_sequence_number field on the Partition
itself to read the same value.

This lookup blocks ingest for the shard, so removing the expensive query
from the ingest hot path should improve catch-up time after a
restart/deployment.
2022-09-16 14:06:42 +02:00
Dom Dwyer 66bf0ff272 refactor(db): NULLable persisted_sequence_number
Makes the partition.persisted_sequence_number column in the catalog DB
NULLable. 0 is a valid persisted sequence number.
2022-09-15 18:19:39 +02:00
Dom Dwyer 234d460fcb chore: rename update_persisted_sequence_number fn 2022-09-15 16:10:35 +02:00
Dom Dwyer f91d802107 feat: store per-partition persist markers
Changes the ingester to record the per-partition, maximum persisted
sequencer offsets to the catalog. This will enable quick O(1) lookup in
the future, but the currently persisted value is only used to assert the
per-partition monotonic persist ordering invariant.
2022-09-15 16:10:35 +02:00
Dom Dwyer 300938f858 refactor: assert partition persistence ordering
Assert the per-shard / per-partition persistence watermarks
monotonically increase, and document the invariant.

NOTE: this is not a new invariant, just a new assertion to validate it.
2022-09-15 16:10:35 +02:00
Dom Dwyer d199a83355 feat(catalog): per-partition persist mark API
Adds the "persisted_sequence_number" field to the Partition model, and
updates the catalog API to read & update it.
2022-09-15 16:10:35 +02:00
Dom Dwyer fc17f2ec2d refactor: hoist persistence watermark from buffer
The maximum persisted sequence number is tracked to answer "up to where
has this partition been persisted", used for querying and skipping
writes that have already been applied (though I suspect this is
redundant).

This is a property of the partition, not the actual data buffer, so this
commit hoists it up out of the data buffer and onto the per-partition
data structure, internalising the field in the process (not pub).
2022-09-14 18:07:45 +02:00
Dom Dwyer ee8cdb48af style(ingester): fmt imports & long strings
Rewrite the imports to be a consistent order; std, external, crate and
merge all crate-level imports into one use statement.
2022-09-14 14:20:19 +02:00
Dom Dwyer 074722eb3e refactor(ingester): split data.rs into modules
Breaks the gigantic data.rs file into sub-modules for Shard, Namespace,
Table, Partition, and finally the actual data buffer used to store
writes.
2022-09-14 14:20:19 +02:00
Marco Neumann e4a66f69c7
refactor: dedicated query path for querier tombstone materialization (#5618)
I would like to remove de-dup from the querier<>ingester query path in #5602,
but the tombstone application and parquet write path can still do the full work.
2022-09-13 07:16:38 +00:00
Dom Dwyer 6342be2fa9 refactor: consistent logging
Changes the ingester to log ALL reasons for a partition being marked for
persistence, rather than just one of the 4 previously. Fields are
consistent, but verbose / repetitive.

Also cleans up some misleading messages like "updating ..." to be logged
only when the update actually takes place.
2022-09-12 15:57:27 +02:00
Marco Neumann 8933f47ec1
refactor: make `QueryChunk::partition_id` non-optional (#5614)
In our data model, a chunk always belongs to a partition[^1], so let's
not make this attribute optional. The optional value only leads to
-- mostly surprising -- conditional behavior, ranging from "do not equalize
the partition sort key" (querier) to "always consider the chunk overlapping"
(iox_query when dealing with ingester chunks).

[^1]: This is even true when the chunk belongs to a parquet file that is not
      yet added to the catalog, contrary to what a comment in the ingester
      stated. The catalog and data model used by the querier are two totally
      different things.
2022-09-12 13:52:51 +00:00
Marco Neumann caa0dfd1e0
refactor: query code clean ups (#5612)
* refactor: remove dead code

* refactor: `Deduplicator::build_scan_plan` consumes `self`

There is no good reason to use the same `Deduplicator` twice. In
contrast I'm quite sure that this would lead to nasty bugs, because
`split_overlapped_chunks` exists early in some cases so the 2nd plan
would have old and new chunks mixed together.
2022-09-12 13:00:56 +00:00
YIXIAO SHI 52ae60bf2e
chore: fix comment typo (#5551)
Co-authored-by: Dom <dom@itsallbroken.com>
2022-09-07 08:49:29 +00:00
Marco Neumann 87772a6aec
refactor: debug log improvements (#5553)
* feat: extend log output for ingester responses

* feat: add debug log for parquet `read_filter` calls

* feat: add debug log to `get_write_info`

* feat: add debug log parquet cache invalidation
2022-09-05 13:54:13 +00:00
Dom Dwyer 2a19606456 feat(ingester): restrict partition row count
This limit restricts a single partition to containing at most N rows
before it is marked for persistence (note: being marked for persistence
does not currently prevent further ingest for that partition.)
2022-08-31 15:48:18 +02:00
Nga Tran cb10a7c6d8
feat: More accurate memory estimate for compaction (#5471)
* feat: initial implementation of memory estimation for a compaction

* feat: estimate size of files and have the right actions for the needed budget

* feat: run candidates in parallel

* fix: have the right name for the column field of the output struct

* feat: add metrics for estimated budgets

* chore: cleanup

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fix syntax after applying review's suggestions

* refactor: Convert a Vec to VecDeque to go well with pop and push

* chore: remove max_concurrent_size_bytes and input_size_threshold_bytes

* chore: remove input_file_count_threshold

* test: tests for estimate_arrow_bytes_for_file

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-30 13:44:44 +00:00
Carol (Nichols || Goulding) dbd27f648f
refactor: Rename more mentions of Kafka to their other name where appropriate 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 58f0b63cdc
refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 74c9529062
fix: Rename KafkaPartition to ShardIndex 2022-08-29 14:07:18 -04:00
Carol (Nichols || Goulding) c9567cad7d
fix: Rename some more sequencer to shard 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) fe9c474620
fix: rustfmt 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) fbae4282df
fix: Rename another sequencer to shard to be hopefully clearer 2022-08-29 14:06:45 -04:00
Jake Goulding 4abf21c724
refactor: Rename Sequencer (and its entourage) to Shard 2022-08-29 14:06:43 -04:00
Andrew Lamb 35f99fe940
fix: fix intermittent failures in `data::tests::persist` (#5437)
* fix: fix intermittent failures in data::tests::persist

* fix: tweak comments and message

* fix: space
2022-08-19 21:16:00 +00:00
Carol (Nichols || Goulding) ed44817ed1
feat: Add a histogram of ingested (new L0) Parquet file sizes
Connects to #5348.
2022-08-15 10:13:54 -04:00
Marco Neumann 6b8b922fe7
fix: do not loose data when Kafka reports that offset is above watermark (#5322)
* fix: do not loose data when Kafka reports that offset is above watermark

This can happen in certain cluster rebalance settings.

This is also linked to https://github.com/influxdata/rskafka/issues/147
but for the upstream issue I currently have no idea how to fix it, so
let's at least harden IOx against it.

Fixes #5128.

* refactor: panic for `SequenceNumberAfterWatermark`
2022-08-11 07:32:04 +00:00
Andrew Lamb 3a945dbcb2
chore: return a struct with named and documented fields from `compact_persisting_batch` (#5346)
* chore: return a struct with named and documented fields from `compact_persisting_batch`

* docs: Remove extra 'the' and fix a typo

Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
2022-08-10 20:22:29 +00:00
Andrew Lamb 7219f512c3
fix: update sort key in catalog before adding parquet file to catalog (#5333)
* fix: update sort key before parquet file

* fix: Remove left over debugging

* fix: fix bug, improve logging

* chore: move debug log after catalog update, improve args and docs
2022-08-09 10:27:51 +00:00
Marco Neumann 9fbc95c3ad
feat: add sequencer reset count metric and log to ingester (#5286)
Split out from #5253.

Helps with #5128.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-03 13:00:36 +00:00
Marko Mikulicic 9da8062a16
fix: Fix typo in log message (#5222)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-27 15:34:37 +00:00
Marco Neumann 9a9a1a4777
feat: limit per-table chunk data for every query (#5223)
* feat: `QueryChunk::as_any`

* feat: allo `ChunkPruner::prune_chunks` to fail

* feat: limit per-table chunk data for every query

Closes #5211.

* fix: address review comments

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-07-27 13:20:05 +00:00
Andrew Lamb fbf672015e
refactor: Reduce ceremony requried to create a `Span` from `SpanContext` (#5181)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-22 11:19:38 +00:00
Nga Tran 69cb3f2b19
refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151)
* refactor: remove min_sequnce_number

* fix: typos

* fix: remove min_sequencer_number from new files from merging main

* fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier

* test: add test_compactor_collision back but modify the input to make it work woth new changes

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 13:51:54 +00:00
Marco Neumann 0561423475
refactor: enforce proper `IOxSessionContext` (#5158)
- remove `IOxSessionContext::default()` because untracked contexts
  should only be created by tests
- remove `Option<IOxSessionContext>` because it is a typed workaround
  for `IOxSessionContext::default`

Tests should use `IOxSessionContext::testing` and all _normal_ users
should create proper contexts.

I suspect this will help tracing or at least prevent silent regressions.
See #5129.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-20 16:25:43 +00:00
Andrew Lamb 5bebff0b06
Revert "feat: skip ingester buffering if INFLUXDB_IOX_INGESTER_SKIP_BUFFER is set" (#5116)
This reverts commit ca6875f60bec935eb6079b684d6eaa0cbc8a5306.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-15 13:22:45 +00:00
Marco Neumann 512f9850ee
refactor: ingester seek log debug => info (#5127)
This message will be printed once per partition on ingester startup and
shouldn't be too noisy, but is very helpful to judge "replay" /
"catch-up".
2022-07-14 10:28:16 +00:00
Carol (Nichols || Goulding) 61c023139b
refactor: Switch compaction levels to an enum with values rather than separate consts
Bonuses:

- Type checking
- Validation
- Less casting
- Exhaustiveness checking
- Less use of the numerical value
2022-07-13 11:30:36 -04:00
Andrew Lamb 64b6b4fd6f
feat: skip ingester buffering if INFLUXDB_IOX_INGESTER_SKIP_BUFFER is set (#5115)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-13 14:21:06 +00:00
Marco Neumann aacdeaca52
refactor: prep work for #5032 (#5060)
* refactor: remove parquet chunk ID to `ChunkMeta`

* refactor: return `Arc` from `QueryChunk::summary`

This is similar to how we handle other chunk data like schemas. This
allows a chunk to change/refine its "believe" over its own payload while
it is passed around in the query stack.

Helps w/ #5032.
2022-07-07 13:21:48 +00:00
Andrew Lamb 8f5210ea3e
test: add test for "duration since production" in kafka `write_buffer` implementation (#5043)
* test: add test for timestamps in kafka write buffer

* refactor: move timestamp batching test to generic tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-07 10:27:27 +00:00
Marco Neumann 16bd3e67c0
refactor: unify `apply_predicate_to_metadata` (#5030)
Instead of using some hand-rolled timestamp-based logic (or just
"unknown") all over the place, just use logic introduced in #5017.

This requires slightly improved table summaries within the querier that
at least has min/max for the timestamp column. For that, the former
`IngesterChunk`-specific `calculate_summary` method was extended to
`create_basic_summary` to include that data and is now also used by
`QuerierParquetChunk`.

Note: `QuerierRBChunk` already has detailled metrics that are provided
by the read buffer implementation.

Should we ever need even better pruning for `QuerierParquetChunk` (or
`IngesterChunk`) then we _only_ need add extra data to the table
summaries.

Closes #4976.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-05 12:51:59 +00:00
Marco Neumann be53716e4d
refactor: use IDs for `parquet_file.column_set` (#4965)
* feat: `ColumnRepo::list_by_table_id`

* refactor: use IDs for `parquet_file.column_set`

Closes #4959.

* refactor: introduce `TableSchema::column_id_map`
2022-06-30 15:08:41 +00:00
Markus Westerlind edf3f08e81 refactor: Replace all uses of lazy_static with once_cell
Went through and remove all lazy_static uses with once_cell (while waiting for the project to compile). There are still dependencies using lazy_static so it is still in the crate graph but at least there isn't an explicit dependency on it (and it is easier to update to `std::lazy::Lazy` once that is stable).
2022-06-29 16:22:02 +02:00
Nga Tran cfcc4b8426
refactor: change level 1 to level 2 preparing for next design changes (#4954)
* refactor: change level 1 to level 2 preparing for next design changes

* fix: make level-2 consistent everywhere

* chore: remove unused comments

* refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent

* chore: add correspinding constants for the comapction levels in the comments

Co-authored-by: Dom <dom@itsallbroken.com>
2022-06-29 14:08:58 +00:00
Andrew Lamb bfddb032ce
docs: improve docs for `persist_partition_size_threshold_bytes` / `INFLUXDB_IOX_PERSIST_PARTITION_SIZE_THRESHOLD_BYTES` (#4877)
* docs: improve docs for `persist_partition_size_threshold_bytes` / `INFLUXDB_IOX_PERSIST_PARTITION_SIZE_THRESHOLD_BYTES`

* docs: improve comments about LifecycleConfig::partition_size_threshold

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-27 21:52:40 +00:00
Marco Neumann 215f297162
refactor: parquet file metadata from catalog (#4949)
* refactor: remove `ParquetFileWithMetadata`

* refactor: remove `ParquetFileRepo::parquet_metadata`

* refactor: parquet file metadata from catalog

Closes #4124.
2022-06-27 15:38:39 +00:00
Nga Tran 3c0fb6e8ef
fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead (#4942)
* fix: avoid using min_time, which can be negative, for ChunkId. Using object store id which is uuid instead

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* chore: run fmt

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-23 19:00:13 +00:00
Andrew Lamb 49b34e1135 test: add appropriate tests 2022-06-23 11:50:55 -04:00
Andrew Lamb fb4c3ed294 fix: revert test change 2022-06-23 11:34:59 -04:00
Dom Dwyer 9a79d16585 fix: account for partition memory until persisted
The ingester maintains a rough "total memory in use" counter it uses to
try and limit the amount of memory the ingester is using overall.

When a partition is persisted, this total memory usage value is adjusted
to account for releasing the partition memory. Prior to this commit, the
ordering was:

* Writes increase the memory counter
* maybe_persist() is called to trigger persistence
* A partition is identified for persistence
* Partition memory usage is released back to the total memory counter
* Persistence starts

This meant that the partitions in the process of being persisted were
not accounted for in the ingester's total memory counter, and therefore
we could significantly overrun the configured memory limit.

After this commit, the ordering is:

* Writes increase the memory counter
* maybe_persist() is called to trigger persistence
* A partition is identified for persistence
* Persistence starts
* Persistence completes
* Partition memory usage is released back to the total memory counter

This ensures persisting partitions are sill tracked in the total memory
counter, causing pauses to correctly fire.
2022-06-23 15:40:51 +01:00
Dom Dwyer 87af3848d1 refactor: remove unused errors
These errors are not referenced, but are hidden from the "unused" lint
because of the macro magic code generation.
2022-06-23 11:24:30 +01:00