We currently only use the human-readable version string for the CLI
help, but for #5464 I want to use the GIT hash and a process-time UUID.
This is the prep work for that.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: upsert partition & update sort key for each day in bulk ingest
feat: import schema now supports earliest/latest time merging
chore: tests & tidying up for bulk ingest catalog update
* fix: always sort time last in PK in import schema update catalog
* chore: additional test for computing sort key in bulk ingest
* chore: bulk import catalog update gets sequencer from sharder service
chore: import update schema tests refactor using sharder svc mock
* chore: dead code fix
* chore: import schema sequencer lookup test
* chore: clarifying comment in import schema catalog update
* feat: file file_count_threshold for comapcting cold partitions to make it consistent with the hot case and help set up to avoid oom easier
* chore: remove unecessary commments
* chore: struct for overrides of import schema conflicts
* chore: import schema override shouldn't support tags
* feat: import schema merge can take an override schema
* fix: schema override in test had superfluous tag
* chore: test for batch schema merge with override in import schema
* feat: import schema merge now takes override schema
This was part of
"feat: Different branch to hook up new compaction algorithm (#5194)"
and will be added back in a new test for compaction specifically in the
next commit.
This reverts part of commit 69640c0ba5.
* feat: initial commit of schema merge bulk import tool
* chore: use observability depds instead of tracing-*
* chore: removed debug printlns
* chore: fix feature decls for cloud providers for import crate
* chore: use println instead of info in import- no need for a simple CLI
* chore: tidy whitespace
* chore: remove unused dep in import
* chore: Run cargo hakari tasks
* chore: removed unimpld import job subcommand
* chore: clarifying comment about custom serialisation code
* chore: clarifying comment about schema merge code in import
* chore: fix wrong comment in import command
* chore: bump object store dep to get bugfix
* chore: rename import schema struct for clarity
* chore: run `cargo hakari generate`
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: make querier RAM pool split a proper feature
- use propre pool names
- expose sizing via CLI/env
Closes https://github.com/influxdata/conductor/issues/1102.
* refactor: improve naming and docs
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: `QueryChunk::as_any`
* feat: allo `ChunkPruner::prune_chunks` to fail
* feat: limit per-table chunk data for every query
Closes#5211.
* fix: address review comments
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* chore: cherry pick the first 3 commits of branch cn/connect-new-compaction
* fix: modify the test to work correctly with compactor running
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: remove min_sequnce_number
* fix: typos
* fix: remove min_sequencer_number from new files from merging main
* fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier
* test: add test_compactor_collision back but modify the input to make it work woth new changes
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: Fix SeriesKey sort order for special _measurement and _field
* fix: Update expected test output
* fix: Update more tests
* fix: Re-sort tag key when using binary encoding
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* refactor: remove unused logging config
* chore: remove the object store garbage collector CLI tool
* refactor: accept an object store and catalog
* refactor: make Result type alias public like the error
* refactor: remove public modifier from modules
* refactor: allow shutting down the object store garbage collector
* feat: Introduce the object-store garbage collection server
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: initial implementation for selecting compaction candidates
* feat: 2 catalog functions to choose the most thorughput partitions to compact and the selecting candidate function itself
* test: tests for the new 2 queries
* feat: more tests and metrics for chooing compaction candidates
* chore: Apply self suggestions from self review
* chore: cleanup
* chore: fix doc comment
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: address review comments
* fix: get the right time provider for the tests
* refactor: remove the left over compaction_
* fix: typos
* fix: make the param name and env name consistent
* refactor: make relevant iSomething to uSomething
* fix: typo
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* test: document how to run tests
Also fix a few issues for local runs.
* docs: add back one-liner for running end to end tests
* docs: add comment for clap_blocks test
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* docs: add comment in influxdb_iox/tests/end_to_end_cases/cli.rs
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* refactor: change level 1 to level 2 preparing for next design changes
* fix: make level-2 consistent everywhere
* chore: remove unused comments
* refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent
* chore: add correspinding constants for the comapction levels in the comments
Co-authored-by: Dom <dom@itsallbroken.com>
Previously the column data type was exposed using an internal i32 value.
This commit changes the Schema API to use a self-descriptive proto enum
for the column data type.
* refactor: store per-file column set in catalog
Together with the table-wide schema and the partition-wide sort key, this should
be everything we need to read a parquet file directly into memory
without peeking any file-level metadata.
The querier will use this to directly load parquet files into the read
buffer.
**WARNING: This requires a catalog wipe!**
Ref #4124.
* refactor: use proper `ColumnSet` type
* refactor: use new ingester<>querier wire protocol
Use and document the new and more flexible ingester<>querier wire
protocol.
Note that the ingester does NOT stream the response data yet, but the
internal data structures would allow that. A follow-up change will
adjust the ingester code to stream the data.
Ref #4849.
* fix: typos
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* refactor: clarify naming and public interface
* test: add schema assertion to `ingester_response_to_record_batches`
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* feat: extend flight client to accept multiple (changing) schemas
See #4849.
Originally I intended not to use Flight at all for the new
ingester<>querier protocol. However since flight also deals with
dictionary batches and multiple batches and the gRPC protocol that I
would write would look very similar, I will use Flight with a bit more
flexible message types.
The rough idea for the protocol is the following stream:
- for each partition:
1. "none" message with partition metadata
2. for each chunk (can have different schemas under certain
circumstances):
1. "schema" message (resets dictionary state)
2. (optional) dictionary batch messages
3. one or more "record batch" message
The nice thing about it is that the same arrow client works also for the
existing client<>querier protocol since there we just send:
1. "schema" message (no app metadata)
2. (optional) dictionary batch messages
3. zero, one or more "record batch" message (no app metadata)
* refactor: separate high- and low-level flight client
It is very unlikely that a user will use the high-level batch-producing
functionality and the low-level stuff within the same session. So let's
split this into to clients (high-level uses the low-level one
internally) to avoid confusion.
Also add documentation on our protocol handling.
* refactor: enumerate all variants in match statement to better catch errors in the future
* chore: TEMP Update DataFusion to pre-release
* chore: update arrow et al to 16.0.0
* chore: Run cargo hakari tasks
* fix: update reader read_dictionary API
* chore: Update to real Datafusion release
* fix: Update parquet API
* fix: update test
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
This commit changes the code base to use a new reference-counted
PartitionKey type wrapper, instead of passing a bare String around.
This allows the compiler to type check & verify usage of the partition
key, instead of passing a bare string around. By reference counting the
underlying string, we reduce memory usage for some use cases.
Unset the all env vars for the following CLI e2e tests:
* default_mode_is_run_all_in_one
* default_run_mode_is_all_in_one
This prevents them from executing against the "prod" catalog, running
migrations and inserting values to the prod database specified in the
prod DSN env (INFLUXDB_IOX_CATALOG_DSN).
Warn when downloading files to an in-memory object store.
The "remote partition pull" command downloads parquet files from an
object store via a router, and saves them locally. It's pretty unlikely
the user intends to download those files to memory of the CLI process
which then exits when the pull is complete, throwing away the downloaded
files, but this is the default.
This is a rather quick fix for prod. On the mid-term we probably wanna
rethink our deployment strategy, e.g. by using "one query per pod" and
by deploying queryd w/ IOx into the same pod.
* refactor: split compact_partition into two functions to handle concurrency better
* feat: limit number of files to compact
* test: add test for limit num files
* chore: fix cipply
* feat: split group if over max size
* fix: split the overlapped group to limit size or file num
* chore: reduce config values
* test: add tests and clearer comments for the split_overlapped_groups and test_limit_size_and_num_files
* chore: more comments
* chore: cleanup
* feat: enable debugging of failed querier->ingester requests
- extend `query-ingester` CLI to allow usage of predicates
- on failed requests: log all information that required for the CLI
- test the "ingester fails" scenario
* test: explain
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* docs: improve
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* refactor: move b64 pred. serde into a single crate
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
The default behavior of the ingester is to panic if the min unpersisted
sequence number in the catalog is unknown to the write buffer due to the
retention policies having evicted that sequence number.
Specifying `--skip-to-oldest-available` changes this behavior to skip to
the oldest sequence number the write buffer does have available and go
from there.
Fixes#4624.
* ci: fix cargo deny
* chore: downgrade `socket2`, version 0.4.5 was yanked
* chore: rename `query` to `iox_query`
`query` is already taken on crates.io and yanked and I am getting tired
of working around that.
* feat: `SortKey::size`
* feat: `FunctionEstimator`
* feat: querier RAM pool
Let's put all the caches into a single RAM pool, so we can at least
somewhat control RAM usage. Note that this does NOT limit the peak
memory during query execution though, but should at least stop unlimited
cache growth. A follow-up PR will add metrics.
* refactor: improve some size calculations
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Increase logging to investigate multi ingester flaky test
* feat: Temporarily disable a test while logging is increased in CI
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Tool for automating arrow version update
* chore: Update datafusion and arrow/parquet/arrow-flight
* fix: update for changes in Arrow API
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This is useful for local instances that run against a prod system,
because port forwarding can lead to long connection delays.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Add lookup of partitions by table id to catalog.
Add API to catalog to return partitions by table id.
Add to client to return partitions by table id.
Add CLI to pull remote schema, partition, and parquet files into a local catalog and object store.
* feat: share server processes in end to end test
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Connects to #4399.
Only file-based write buffer is supported. If `--data-dir` is specified,
store it there, otherwise store it in a temp directory to be ephemeral
Connects to #4399.
Manually flatten the arguments to set different defaults, not allow
setting the partition min/max, but still allow customization of the
other arguments.
Prior to this commit, adding any metric to the catalog (and only the
catalog) would cause the end_to_end_ng_cases::metrics::test_metrics test
to fail due to asserting an exact number of metrics observed.
This commit changes the check condition to a more permisive >= rather
than ==.
* refactor: Expose data generation tool for wider use
* feat: Add a step for retrieving the server metrics
* refactor: Copy the OG end-to-end metrics test to NG
* feat: Rewrite the NG end-to-end metrics test
This is still broken because the the row timestamp metrics don't exist
in NG.
* fix: Test metrics relevant to NG
* refactor: Move the data generator to the test helper crate
* refactor: Extract a ReadFilter request builder into the test helper crate
* refactor: Make test helper request builder able to build other gRPC requests
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@gmail.com>
* feat: Copy end-to-end logging test to NG
This was created with:
cp influxdb_iox/tests/end_to_end_cases/influxdb_ioxd.rs influxdb_iox/tests/end_to_end_ng_cases/logging.rs
* feat: Port logging test to NG end-to-end tests
And re-enable it, it was ignored.
* fix: Specify that an in-memory catalog should be used for the logging test
* fix: Check for gRPC instead of HTTP
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: use `info!` rather `println!` in end to end tests
* chore: change from println to info in end_to_end_ng_cases too
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* test: add a test with multiple ingesters
* docs: improve doc test
* refactor: Print out step number in StepTest
* fix: make timeout more explicit
* fix: add tests for merge_info
* feat: add per kafka partition durability reporting to write info response
* fix: buf lint + test cleanup
* fix: clean up protobuf
* refactor: pull out conversion of KafkaPartitionStatus into a function
* fix: fmt
* fix: typo
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Attaching the "batch => partition" mapping via per-batch schema KV
metadata does NOT work because flight will transmit the schema once for
all batches (even though on the Rust side we have a schema ref attached
to every batch, probably for convenience). Instead we now use the same
global protobuf metadata that we also use for the "partition => max
sequence number" information. This somewhat limits our ability to create
record batches lazily on the ingester side (since the global metadata is
sent before any actual payload) but I think we should not modify the
usage of the flight protocol too much right now (e.g. by sending more
schema messages). If this becomes an issue, we can always find a more
complex solution in the future.
Add method to catalog to get parquet file by object store id.
Add gRPC service for object store to get a file from by its uuid.
Add the object store service to router2 with object store config.
* feat: add catalog client and remote command
Adds the catalog gRPC service to influxdb_iox_client.
Adds a new remote command to execute commands against a remote IOx host.
Adds partition subcommand to remote to get the details of a partition by id.
* test: add end to end test for `remote partition` CLI (#4336)
* chore: cleanup partition CLI PR feedback
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* fix: return "not found" gRPC error instead of "internal" when ingester does not know table
* fix: properly handle "namespace not found" in ingester queries
* fix: make `initialize_db` work with async code
* test: add custom step for NG tests
* fix: handle "unknown table/namespace" resp. in querier
* docs: explain test setup
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* refactor: Use declarative steps to reduce duplication in end to end testing,
* fix: improve whitespace formatting
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: querier<>ingester flight protocol adjustments
This makes a few adjustments to the querier<>ingester flight protocol.
Query Scope
===========
The querier will request data for ALL sequencer IDs for now. There is
no reason to have a request per sequencer ID. We can add a range/set
filter later if we want, but this is not required for now.
Partition-level
===============
The only time when the querier cares about sequencer IDs (i.e. sharding)
at all is when it selects which ingesters to ask for unpersisted data
(this is currently not implemented, it just asks all ingesters).
Afterwards the querier only cares about partitions (which are bound to
specific sequencers anyways) because this is the level where parquet
file persistence and compaction as well as deduplication happen. So we
make partitions a first-class citizen in the ingester response.
Metadata VS RecordBatches
=========================
The global app-metadata will list all partitions and their max
persisted parquet files and tombstones (theoretically tombstones are at
table-level, but the ingester could in the future break them down to the
partition-level). Then it receives a stream of record batches. Each
record batch is tagged (via key-value metadata in its schema) so it can
be assigned to a partition. At the moment the ingester returns 0 or 1
batches per unpersisted partition (0 in case we've filtered out all the
data via the predicate), but in the future it is free to return multiple
batches. This setup gives the ingester more freedom over memory
management and (potentially parallel) query processing, while at the
same time keeps the set of duplicated information minimal and allows
easy extensions (since the global metadata is a full-blown protobuf
message).
Querier
=======
At the moment the querier ignores all the metdata. Follow-up PRs will
change that.
* docs: improve
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
* refactor: make code clearer
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
This commit adds an optional feature (disabled by default) to the IOx
server binary named "tokio_console". When enabled, it adds support for
the tokio-console to IOx:
https://github.com/tokio-rs/console
Enabling this feature drops the compile-time log filter to TRACE, and
enables the necessary dependencies to support the instrumentation needed
for the console to function.
Unfortunately, this feature uses tonic 0.7 (latest) while we use tonic
0.6, so we wind up with two tonic versions being compiled in when this
feature is enabled.
"end-user -> querier" and "querier -> ingester" should use a single
Flight client implementation. The difference is just the request and
response metadata.
This changes our default Flight client to use protobuf instead of JSON
for the ticket format.
* feat: Add basic Querier <--> Ingester "Service Configuration"
* docs: update comments in test
* refactor: cleanup tests a little
* refactor: make trait more consistent
* docs: improve comments in IngesterPartition
* feat: return write_token from HTTP writes to router2
* fix: Update router2/src/dml_handlers/instrumentation.rs
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: Use WriteSummary::default more vigorously
* fix: fix typo and add links to follow on issues
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Support `SHOW NAMESPACES` in sql repl
* feat: add basic support to clients
* fix: add get_namespaces service test
* fix: proper error handling
* test: end to end test for namespace client
* refactor: Use QuerierDatabase rather than Catalog
* refactor: remove unused function
Add configuration options for compactor for the max size of level 0 files and split percentage.
Add metrics for compaction to track the number of candidates, compactions, and durations.
Add functions to separate identifying partitions to compact from running compaction.
Make compaction run in smaller chunks, specifically per partition.
Update compaction to automatically promote level 0 files that are non-overlapping without waiting some period of time.
Closes#4120
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: split up ingester schema test, add mini cluster
* feat: add schema cli test
* feat: add end to end test for querier
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Min/max values and distinct counts are already optional, so let's make
the null counts optional as well. This will be helpful for NG to deal w/
partial statistics (e.g. we only populate stats for the time column).
Note that the total count is still mandatory, but we normally have the
chunk/file-level row count at hand.
* refactor: Extract common, OG database and router out of influxdb_ioxd
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
* refactor: move environment variable mapping in end to end tests into TestConfig
* fix: clippy
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>