* chore: Update Datafusion and arrow/arrow-flight/parquet to `28.0.0`
* chore: Update thrift to 0.17
* fix: use workspace arrow-flight in ingester2
* chore: Update for API changes
* fix: test
* chore: Update hakari
* chore: Update hakari again
* chore: Update trace_exporters to latest thrift
* fix: update test
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Adds an ingester2 crate to hold the MVP of the Kafkaless project.
This was necessary due to the tight coupling of the ingester internals
with tests in external crates, and eases the parallel development of two
version of the ingester.
This commit contains various changes from the "ingester" crate, mostly
removing the concept/references to a "shard" or "ShardId" where
possible.
This commit does not copy over all of the "ingester" crate - only those
components that are definitely needed. I will drag across more as
functionality is implemented.
* chore: move ns api from querier to router
* chore: add explanatory comment in querier about moved namespace API
* fix: add namespace service to router
* fix: querier returns unimplemented error for ns retention, not panic
* chore: reuse namespace -> proto in router ns api
* chore: grpc namespace - consume ns to avoid clone
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update datafusion pin + api code
* chore: Run cargo hakari tasks
* refactor: combine_sort_key is more idomatic and add rationale comments
* refactor: satisfy borrow checker and updated comments
* fix: Add test case for combine_sort_key
* fix: Apply suggestions from code review
Co-authored-by: Marco Neumann <marco@crepererum.net>
* fix: Add back test for deeply nested expression
* fix: Update output ordering
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Partition implementation of Visitable for InfluxQL AST
* feat: Added consistent structures for each clause to simplify visitor
Continued to expand `accept` and `pre` / `post` visit implementations.
* feat: Added insta and tests using snapshots (thanks @crepererum)
The insta crate simplifies the process of validating the combination of
visitor and accept implementations are called and in the correct order.
* chore: Run cargo hakari tasks
* feat: Added remaining snapshot tests
Some tests are failing as some minor type changes must be added along
with the addition of related visitor functions.
* feat: Add types to represent each clause in numerous statements
These clauses permit distinct visit functions on the `Visitor` type.
* chore: Reformat `SELECT`
* chore: Explicitly specify access to export selected types only
This required completing all the missing documentation for the exported
types.
* chore: Update Cargo.lock
* chore: macro to implement common traits and hide 0th tuple element
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
* feat: initial commit of schema merge bulk import tool
* chore: use observability depds instead of tracing-*
* chore: removed debug printlns
* chore: fix feature decls for cloud providers for import crate
* chore: use println instead of info in import- no need for a simple CLI
* chore: tidy whitespace
* chore: remove unused dep in import
* chore: Run cargo hakari tasks
* chore: removed unimpld import job subcommand
* chore: clarifying comment about custom serialisation code
* chore: clarifying comment about schema merge code in import
* chore: fix wrong comment in import command
* chore: bump object store dep to get bugfix
* chore: rename import schema struct for clarity
* chore: run `cargo hakari generate`
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Adds a decorator over the underlying kafka client to capture the latency
distribution of the low-level kafka writes, independent of the
aggregation/DML batching framework that sits "above" this client.
The latency measurements include the serialisation overhead, protocol
overhead, and actual network I/O.
* refactor: remove unused logging config
* chore: remove the object store garbage collector CLI tool
* refactor: accept an object store and catalog
* refactor: make Result type alias public like the error
* refactor: remove public modifier from modules
* refactor: allow shutting down the object store garbage collector
* feat: Introduce the object-store garbage collection server
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: reduce proptest features
* chore: remove `grpc-router`
This crate is currently unused and we don't have immediate plans to use
it. And there's GIT, so it can always be restored.
* chore: `cargo update`
* ci: fix cargo deny
* chore: downgrade `socket2`, version 0.4.5 was yanked
* chore: rename `query` to `iox_query`
`query` is already taken on crates.io and yanked and I am getting tired
of working around that.
Add method to catalog to get parquet file by object store id.
Add gRPC service for object store to get a file from by its uuid.
Add the object store service to router2 with object store config.
Create new crate for iox_catalog_service.
Add rpc to return parquet_file records by partition id.
Add CatalogService to router2.
The catalog service will be added to over time to provide access to the catalog over gRPC.
* feat: return write_token from HTTP writes to router2
* fix: Update router2/src/dml_handlers/instrumentation.rs
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: Use WriteSummary::default more vigorously
* fix: fix typo and add links to follow on issues
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: Extract common, OG database and router out of influxdb_ioxd
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
* refactor: split influxdb_ioxd, clap_blocks, and serving_readiness out of influxdb_iox
split out serving readiness, get compiling
* fix: hakari
* fix: hakari again
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: port sqlx-hotswap-pool over from conductor
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: workspace hack fixes
* fix: unique schema per test db connection
* fix: adjust search path in catalog pg tests to see if it fixes test schema issue
* fix: actually fixed sqlx hotswap pool test
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: remove references to perf_image in CI
* chore: adding gitops adapter image build in CI
* chore: gitops adapter bin now same as dir & package so docker build works
* fix: circle config package change after renaming gitops adapter package
* feat: Add a way to run ingester with an in-memory catalog from the CLI
If you set the --catalog-dsn string to "mem", rather than using that as
a Postgres connection URL, create an in-memory catalog.
Planning on using this in tests, so not documenting.
* fix: Set default topic to the same value as SHARED_KAFKA_TOPIC
Namely, both should use an underscore. I don't think there's a way to
directly share these values between a constant and an annotation.
* feat: Add a flight API (handshake only) to ingester
* fix: Create partitions if using file-based write buffer
* fix: Change the server fixture to handle ingester server type
For now, the ingester doesn't implement the deployment API. Not sure if
it should or not.
* feat: Start implementing ingester do_get, namely decoding the query
Skip serialization of the predicate for the moment.
* refactor: Rename ingest protos to ingester to match crate name
* refactor: Rename QueryResults to QueryData
* feat: Move ingester flight client to new querier crate
* fix: Off by one error, different starting indexes in sequencers
* fix: Create new CLI argument to pick the catalog type
* fix: Create a CLI option to set the number of topics to auto-create in the write buffer
* fix: Check the arrow flight service's health to tell that the ingester gRPC is up
* fix: Set postgres as the default catalog type
* fix: Return an error rather than panicking if CLI args aren't right
* refactor: Extract JobRegistry from the server crate
Both the server crate and a db crate that I'm about to extract depend on
JobRegistry, so to avoid making circular dependencies, extract the
JobRegistry to its own crate.
* refactor: Move db out of server into its own crate
Fixes#2821.
* fix: Add tokio rt-multi-thread feature so cargo test -p client_util compiles
* fix: Alphabetize dependencies
* fix: Add the data_types_conversions feature to get tests passing
* fix: Remove dev dependencies already listed under normal dependencies
* fix: Make sure the workspace is using the new resolver
Use `codegen-units = 1`, thin-LTO and debug section compression to make our binary smaller (which is good for deploy and
test times) and faster.
# Summary
The binary size of `influxdb_iox` after building with:
```console
$ cargo build --release --no-default-features --features="aws,gcp,azure,jemalloc_replacing_malloc"
```
The profile was:
```toml
[profile.release]
debug = true
```
The commit was:
```text
89ece8b493
```
The size results are:
| Method | Size |
| ------------------------------------------ | ----- |
| baseline | 833MB |
| baseline + dbg compression | 222MB |
| baseline + strip | 49MB |
| codegen-units | 520MB |
| codegen-units + strip | 40MB |
| codegen-units + dbg compression | 143MB |
| thin LTO | 715MB |
| thin LTO + strip | 49MB |
| thin LTO + dbg compression | 199MB |
| codegen-units + thin LTO | 449MB |
| codegen-units + thin LTO + strip | 40MB |
| codegen-units + thin LTO + dbg compression | 130MB |
For the methods that were successfully measured I couldn't really see any compile time differences on my laptop.
# Methods
## Strip
Remove debug symbols. We don't really want this, so this is just to get an idea of the size
```console
$ strip baseline
```
## Debug Sections compression
Debug sections make a large amount of our binary size (a stripped executable is 49MB instead of 833MB). Since we like to
have debug symbols we cannot just strip them. However these symbols are only used for:
- backtrace generation (something went wrong, not BAU)
- profiling
- debugging
So in normal operation and most test scenarios, we're just wasting memory. So we could compress them:
```console
$ objcopy --compress-debug-sections baseline baseline-dbg_compressed
```
There is also elfutils:
```console
$ eu-elfcompress test
```
Elfutils nearly ends up with the same size (220MB instead of 222MB that objcopy achieves), but takes more time and is
probably not worth it.
Note that compressed debug sections exist since many years. The Rust ecosystem supports reading them since over a year,
see:
- <https://github.com/gimli-rs/gimli/issues/195>
- <https://github.com/rust-lang/backtrace-rs/issues/342>
## Codegen Units
The rust compiler parallelizes codegen work. This split into units however means that optimizations are somewhat
limited. This can be change by:
```toml
[profile.release]
...
codegen-units = 1
```
As a nice side effect this should also make our code faster.
## Thin LTO
Get LLVM to run "thin" Link Time Optimization:
```toml
[profile.release]
...
lto = "thin"
```
As a nice side effect this should also make our code faster.
## Fat LTO
Get LLVM to run "fat" Link Time Optimization:
```toml
[profile.release]
...
lto = "fat"
```
There are no results for this because this took a massive amount of memory and CPU time and did not finish on my system.
Kafka is now sufficiently tested via the `write_buffer` crate. The
end2end tests can now use the in-memory mock implementation or -- if
servers can only be controlled via CLI -- the file-based implementation.