* refactor: do not override parquet file size in querier
This is going to be an issue when we actually rely on the size for
reading, see #5531.
* refactor: use selected file size mocking in compactor
Do not blindly override parquet file sizes for all subsystems.
This is going to be an issue when we actually rely on the size for
reading, see #5531.
* refactor: remove ability to override file sizes in catalog
Blindly overriding data for all subsystems is dangerous, because some
parts of our stack actually rely on the actual file size. See #5531.
* docs: explain `size_overrides`
changed io_shared to iox-shared in the following files: update_catalog.rs, partition.rs, lib.rs (in the service_grpc_catalog folder) and lib.rs (in the service_grpc_object_store folder).
There's no technical reason to dyn-dispatch the loader. The user may
free to do so however.
Note that there's no change within `querier` because the type
inference will now automatically treat the passed `Arc` as `Arc<L>`
insteado of `Arc<dyn Loader<...>>`.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: Separate errors to make debugging easier
* feat: Turn on reqwest verbose connection logging for debugging
https://docs.rs/reqwest/latest/reqwest/struct.ClientBuilder.html#method.connection_verbose
> Enabling this option will emit log messages at the TRACE level for read and write operations on connections.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: make compactors to select candidates based on the last n minutes to reduce workload for postgres catalog query
* refactor: remove 1-minute case per review comment
* refactor: do not always box `FunctionLoader`
While constructing our caches, we rarely spell out types. Instead we
just "box" the end result (`Box<dyn Cache<...>>`) before storing it
within the struct (e.g. `NamespaceCache`).
This means that `FunctionLoader` does NOT need to erase the type of the
function it contains. This saves us a `Box` which means less pointer
chasing and more room for the compiler to optimize.
* docs: typos
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
The API user may still use a `Box<dyn ...>` if they want, but they
technically don't have to.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
The API user still CAN use dynamic dispatch but doesn't have to. This
also simplifies the generics a bit.
This is similar to #5520.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Parse various InfluxQL literals
* feat: Parse regex, refactor single and double quoted string parsing
* chore: Literals do not include sign; those are unary expressions
* chore: Add docs
* chore: Integer literals are unsigned
Add more tests for max values
* chore: Impl Display for Literal; add macro to write escaped strings
Also added Duration type for InfluxQL durations, so they can be properly
formatted when displayed.
The macro uses match to efficiently map a small number of characters
to their escaped equivalent. It also removes a bit of boilerplate.
* chore: Don't tie lifetime of AST elements to source `str`
* feat: Impl From trait for Literal, Regex and Duration
* chore: Derive Copy for Duration
* chore: PR Feedback, use unwrap_err for better output when API fails
* chore: Drive-by cleanup using unwrap_err
And only allow setting this when no record batch or line protocol is
specified so that there isn't a way to create a parquet file with data
that has a mismatched row count.
* fix: loop forever in compact_hot_partition_candidates
* chore: cleanup
* fix: avoid using continues that will cause bugs in corner cases
* fix: Pass compaction fn as a closure instead to allow collection of groups in test
* fix: Add Send bound as suggested by clippy
* fix: fix the test to return data of round 3 instead of round 2
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This removes some `Box<dyn ...>` indirection when the user doesn't want
it (you still can, but don't have to) and makes the whole type handling
easier to understand.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This limit restricts a single partition to containing at most N rows
before it is marked for persistence (note: being marked for persistence
does not currently prevent further ingest for that partition.)
* feat: initial implementation of memory estimation for a compaction
* feat: estimate size of files and have the right actions for the needed budget
* feat: run candidates in parallel
* fix: have the right name for the column field of the output struct
* feat: add metrics for estimated budgets
* chore: cleanup
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: fix syntax after applying review's suggestions
* refactor: Convert a Vec to VecDeque to go well with pop and push
* chore: remove max_concurrent_size_bytes and input_size_threshold_bytes
* chore: remove input_file_count_threshold
* test: tests for estimate_arrow_bytes_for_file
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Changes the compactor code to tolerate a SplitExec yielding an empty
partition (with no rows).
This raises a WARN as the situation in which this is acceptable is very
rare, and is more likely indicative of an opportunity to improve the
SplitExec usage (i.e. pruning out unnecessary split points).