Add method to catalog to get parquet file by object store id.
Add gRPC service for object store to get a file from by its uuid.
Add the object store service to router2 with object store config.
Create new crate for iox_catalog_service.
Add rpc to return parquet_file records by partition id.
Add CatalogService to router2.
The catalog service will be added to over time to provide access to the catalog over gRPC.
* feat: return write_token from HTTP writes to router2
* fix: Update router2/src/dml_handlers/instrumentation.rs
Co-authored-by: Dom <dom@itsallbroken.com>
* refactor: Use WriteSummary::default more vigorously
* fix: fix typo and add links to follow on issues
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: Extract common, OG database and router out of influxdb_ioxd
* chore: Run cargo hakari tasks
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
* refactor: split influxdb_ioxd, clap_blocks, and serving_readiness out of influxdb_iox
split out serving readiness, get compiling
* fix: hakari
* fix: hakari again
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: port sqlx-hotswap-pool over from conductor
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
* chore: workspace hack fixes
* fix: unique schema per test db connection
* fix: adjust search path in catalog pg tests to see if it fixes test schema issue
* fix: actually fixed sqlx hotswap pool test
Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: remove references to perf_image in CI
* chore: adding gitops adapter image build in CI
* chore: gitops adapter bin now same as dir & package so docker build works
* fix: circle config package change after renaming gitops adapter package
* feat: Add a way to run ingester with an in-memory catalog from the CLI
If you set the --catalog-dsn string to "mem", rather than using that as
a Postgres connection URL, create an in-memory catalog.
Planning on using this in tests, so not documenting.
* fix: Set default topic to the same value as SHARED_KAFKA_TOPIC
Namely, both should use an underscore. I don't think there's a way to
directly share these values between a constant and an annotation.
* feat: Add a flight API (handshake only) to ingester
* fix: Create partitions if using file-based write buffer
* fix: Change the server fixture to handle ingester server type
For now, the ingester doesn't implement the deployment API. Not sure if
it should or not.
* feat: Start implementing ingester do_get, namely decoding the query
Skip serialization of the predicate for the moment.
* refactor: Rename ingest protos to ingester to match crate name
* refactor: Rename QueryResults to QueryData
* feat: Move ingester flight client to new querier crate
* fix: Off by one error, different starting indexes in sequencers
* fix: Create new CLI argument to pick the catalog type
* fix: Create a CLI option to set the number of topics to auto-create in the write buffer
* fix: Check the arrow flight service's health to tell that the ingester gRPC is up
* fix: Set postgres as the default catalog type
* fix: Return an error rather than panicking if CLI args aren't right
* refactor: Extract JobRegistry from the server crate
Both the server crate and a db crate that I'm about to extract depend on
JobRegistry, so to avoid making circular dependencies, extract the
JobRegistry to its own crate.
* refactor: Move db out of server into its own crate
Fixes#2821.
* fix: Add tokio rt-multi-thread feature so cargo test -p client_util compiles
* fix: Alphabetize dependencies
* fix: Add the data_types_conversions feature to get tests passing
* fix: Remove dev dependencies already listed under normal dependencies
* fix: Make sure the workspace is using the new resolver
Use `codegen-units = 1`, thin-LTO and debug section compression to make our binary smaller (which is good for deploy and
test times) and faster.
# Summary
The binary size of `influxdb_iox` after building with:
```console
$ cargo build --release --no-default-features --features="aws,gcp,azure,jemalloc_replacing_malloc"
```
The profile was:
```toml
[profile.release]
debug = true
```
The commit was:
```text
89ece8b493
```
The size results are:
| Method | Size |
| ------------------------------------------ | ----- |
| baseline | 833MB |
| baseline + dbg compression | 222MB |
| baseline + strip | 49MB |
| codegen-units | 520MB |
| codegen-units + strip | 40MB |
| codegen-units + dbg compression | 143MB |
| thin LTO | 715MB |
| thin LTO + strip | 49MB |
| thin LTO + dbg compression | 199MB |
| codegen-units + thin LTO | 449MB |
| codegen-units + thin LTO + strip | 40MB |
| codegen-units + thin LTO + dbg compression | 130MB |
For the methods that were successfully measured I couldn't really see any compile time differences on my laptop.
# Methods
## Strip
Remove debug symbols. We don't really want this, so this is just to get an idea of the size
```console
$ strip baseline
```
## Debug Sections compression
Debug sections make a large amount of our binary size (a stripped executable is 49MB instead of 833MB). Since we like to
have debug symbols we cannot just strip them. However these symbols are only used for:
- backtrace generation (something went wrong, not BAU)
- profiling
- debugging
So in normal operation and most test scenarios, we're just wasting memory. So we could compress them:
```console
$ objcopy --compress-debug-sections baseline baseline-dbg_compressed
```
There is also elfutils:
```console
$ eu-elfcompress test
```
Elfutils nearly ends up with the same size (220MB instead of 222MB that objcopy achieves), but takes more time and is
probably not worth it.
Note that compressed debug sections exist since many years. The Rust ecosystem supports reading them since over a year,
see:
- <https://github.com/gimli-rs/gimli/issues/195>
- <https://github.com/rust-lang/backtrace-rs/issues/342>
## Codegen Units
The rust compiler parallelizes codegen work. This split into units however means that optimizations are somewhat
limited. This can be change by:
```toml
[profile.release]
...
codegen-units = 1
```
As a nice side effect this should also make our code faster.
## Thin LTO
Get LLVM to run "thin" Link Time Optimization:
```toml
[profile.release]
...
lto = "thin"
```
As a nice side effect this should also make our code faster.
## Fat LTO
Get LLVM to run "fat" Link Time Optimization:
```toml
[profile.release]
...
lto = "fat"
```
There are no results for this because this took a massive amount of memory and CPU time and did not finish on my system.
Kafka is now sufficiently tested via the `write_buffer` crate. The
end2end tests can now use the in-memory mock implementation or -- if
servers can only be controlled via CLI -- the file-based implementation.
Two reasons:
1. I wanna decouple `parquet_file` from `query` (nearly done, needs a
small follow-up PR).
2. `predicate` will have more and more features (like serialization)
which justifies a new home
Currently heappy is reporting "bytes allocated" and "inuse_bytes" ("bytes allocated - freed bytes").
Since the allocation site and free site can be wildly different, it's very hard to correlate these events
and have a meaningful "inuse_bytes" graph.
I have a plan for how to fix this for good, but in the meantime
I'd like to report raw "freed bytes".
In aed37ab50a I fixed the main
problem that rendered free tracking useless: the allocator was actually allocating more bytes than requested; we were tracking the amount of memory requested by the user and not the size of the block that was returned to the user. However, when we were tracking a call to free we used the size of the memory block which was bigger than the amount that was tracked; this led to negative numbers of "in use memory".
* feat: add jaeger and otlp flags
* chore: add jaeger and otlp features to CI test and deploy image
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore: Update deps (including arrow 5.0.0 --> arrow 5.1.0)
* chore: update all the things
* refactor: Update serving readiness check due to change in Tonic API
* chore: update more deps
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Attempt to dump a stacktrace to stdout prior to process abort
* refactor: rewrite signal handling in terms of libc, remove sig dep
* refactor: print to stderr
* fix: Update src/main.rs
* docs: note provenance
This refactors the write buffer a bit for:
- **Testing:** Add generic tests for the Kafka and the mocking
implementation. The same interface can be used easily add new
implementations (e.g. via Redis, filesystem, ...).
- **Partition on Write:** The caller of the writer operation must now
specify the partition/sequencer ID. The implicit partitioning of the
Kafka writer would have lead to broken data since we must never spill
entries w/ the same primary key over multiple partitions. At the
moment we will only use partition 0 but we can easily implement
better logic in the future.
- **Improved Mocking:** The mocked implementation now simulates a system
that feels more real. Especially the handling around multiple streams
and "write while read" has been improved. This will be helpful for
testing and for new features like seeking (during replay). A solid
realistic mock also helps us to ensure that the tests using the mock
do not rely on unrealistic behavior too much.
* feat: make aws, gcp, azure dependencies optional
* fix: only run object store tests if the features are enabled
* fix: clean up testing
* fix: rename step
* fix: add to list of jobs
* fix: remove test with object store
* fix: review comments
The entire persistence windows data structures (including the
checkpoints) have nothing to do with the mutable buffer per se. So lets
move them into their own crate. This also makes `parquet_file` not
longer depend on `mutable_buffer`.
Instead of waiting for the server ID to be set and then mark the server
as errored, directly check the object store on startup. This is
important so that we fail fast when Istio isn't up and running yet.
This PR factors out the tracing/logging CLI optinos into the `trogging` utility crate,
so that multiple binaries from the IOx suite (such as conductor) can use the same (and quite complex)
logging/tracing configuration options (flags and env vars).
Closesinfluxdata/conductor#343
__Rationale__
We currently use the `tracing` framework to output to both log outputs (e.g. stdout for k8s) and distributed tracing collectors (e.g. opentelemetry jaeger).
However, due to a limitation in the `tracing` SDK, we can only have one "filter" level that applies
to both logs and tracing outputs. This is unpractical because tracing collectors are designed
to receive high verbosity data (which will be then sampled within the opentelemetry library),
while logs generally are limited to the DEBUG level on production.
This PR adds a `FilteredLayer` tracing subscriber layer, that wraps a subscriber layer with a independent
filter, which can filter events goint to the wrapper subscriber layer more agressively than the global layer.
This will allow us to emit logs at INFO or DEBUG level while passing all events to opentelemetry at TRACE
level (and opentelemetry SDK will then sample the events so that only a small part will be sent to the
ot collector)
__Note__
This PR just implements the `FilteredLayer` and a test. Another PR will integrate this with
our log/tracing setup code.
* refactor: Separate query_tests into its own crate
* fix: references
* refactor: break out server benchmarks
* fix: Update query_tests/src/lib.rs
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
There may be many reasons for the discrepancy in jemalloc reported allocations total sizes and RSS.
One of them is that our binary doesn't use jmalloc for all the allocations.
Turns out that jemallocator only sets the global rust allocator. Any call to `malloc` will still
go throught the system allocator. Presumably those calls come from linked C code,
but it's also not impossible that not all rust code honours the global allocator (I have no idea, but let's see)
Logs and traces are emitted via one pipeline. For now, it is not
possible to emit both, but it should be possible in a few weeks, as
tokio/tracing/tracing-subscriber is going through some refactoring recently.
All affected flags are well-documented, and I have tested all but the
OTLP output flags.
chore: clippy happy
chore: revert instrumentation changes
feat: add log format logfmt, log destinations stderr, stdout
chore: clippy happy
Rationale
---------
Our CLI needs to be able to accept configuration as JSON and render configuration as JSON.
Protobufs technically have an official JSON encoding rule called 'jsonpb` but prost doesn't
offer native supprot for it.
`prost` allows us to specify arbitrary derive metadata to be added to generated
code. We emit the `serde` derive directives in the two packages that generate prost code
(`generated_types` and `google_types`).
We use the `serde(rename_all = "camelCase")` to approximate `jsonpb`.
We instruct `prost` to use `bytes::Bytes` for some types, hence we must turn on the `serde` feature
on the `bytes` dependency.
We also use json to serialize the output of the `database get` command, to showcase the feature
and get rid of a TODO. In a subsequent PR I'll teach `database create` (and the yet to be done `database update`) to accept an option JSON configuration body so we can configure partitioning, lifecycle, sharding etc rules etc.
Caveats
-------
This is not technically `jsonpb`. Main issues:
1. default values not omitted
2. no special rendering of special types like `google.protobuf.Any`
Future work
-----------
Figure out if we can get fully compliant `jsonpb`, or at least a decent approximation.
Effect
------
```console
$ cargo run -- database get foobar_weather
{
"name": "foobar_weather",
"partitionTemplate": {
"parts": [
{
"part": {
"time": "%Y-%m-%d %H:00:00"
}
}
]
},
"lifecycleRules": {
"mutableLingerSeconds": 0,
"mutableMinimumAgeSeconds": 0,
"mutableSizeThreshold": 0,
"bufferSizeSoft": 0,
"bufferSizeHard": 0,
"sortOrder": {
"order": 2,
"sort": {
"createdAtTime": {}
}
},
"dropNonPersisted": false,
"immutable": false
},
"walBufferConfig": null,
"shardConfig": {
"specificTargets": null,
"hashRing": null,
"ignoreErrors": false
}
}
```
* refactor: inline catalog crate to server
* refactor: Add fine grained (object level) catalog locking
* fix: Move mod definition and use to top of file
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Rework Db to use Catalog for chunk state
* docs: Update server/src/db.rs
* fix: fmt
* fix: fmt
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>