This prevents users from `parquet_file::metadata` to also depend on
`parquet` directly. Furthermore they don't need to important dozend of
functions and can instead just use `IoxParquetMetaData` directly.
__Rationale__
We currently use the `tracing` framework to output to both log outputs (e.g. stdout for k8s) and distributed tracing collectors (e.g. opentelemetry jaeger).
However, due to a limitation in the `tracing` SDK, we can only have one "filter" level that applies
to both logs and tracing outputs. This is unpractical because tracing collectors are designed
to receive high verbosity data (which will be then sampled within the opentelemetry library),
while logs generally are limited to the DEBUG level on production.
This PR adds a `FilteredLayer` tracing subscriber layer, that wraps a subscriber layer with a independent
filter, which can filter events goint to the wrapper subscriber layer more agressively than the global layer.
This will allow us to emit logs at INFO or DEBUG level while passing all events to opentelemetry at TRACE
level (and opentelemetry SDK will then sample the events so that only a small part will be sent to the
ot collector)
__Note__
This PR just implements the `FilteredLayer` and a test. Another PR will integrate this with
our log/tracing setup code.
* refactor: Separate query_tests into its own crate
* fix: references
* refactor: break out server benchmarks
* fix: Update query_tests/src/lib.rs
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* feat: push metrics into catalog
* chore: minor cleanup
* fix: include db labels in chunk metric domains
* chore: fmt
* fix: don't allow dropping moving chunks
* chore: further tweaks
* chore: review feedback
* feat: use new_unregistered() for metric instruments instead of default
* chore: use &[KeyValue] instead of &Vec<KeyValue>
* refactor: make GauageValue non default constructible
This commit adds a new custom UDF to IOx that provide a regex operator to Datafusion plans.
Effectively it allows predicates to contain regex operators that are applied as filters, only allowing rows that satisfy the regex to be returned.
I did not use the Arrow regex kernel for this work because that does not return a boolean array indicating which rows matched a regex, but instead returns a new string array of results. This doesn't work well with DF's approach to filtering.
Logs and traces are emitted via one pipeline. For now, it is not
possible to emit both, but it should be possible in a few weeks, as
tokio/tracing/tracing-subscriber is going through some refactoring recently.
All affected flags are well-documented, and I have tested all but the
OTLP output flags.
chore: clippy happy
chore: revert instrumentation changes
feat: add log format logfmt, log destinations stderr, stdout
chore: clippy happy
* feat: use a dedicated tokio threadpool for running queries
* feat: plumb number of executor threads through to command line
thread through command line
* fix: Logical merge conflict
* fix: another logical conflict
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
The initial benchmarks look like this on my i9 MBP:
```
Data in one open chunk and one closed chunk of mutable buffer/tag0/no_pred 1.00 91.0±2.55ms ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag0/with_pred 1.00 11.5±0.72ms ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag1/no_pred 1.00 120.3±5.10ms ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag1/with_pred 1.00 11.2±0.22ms ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag2/no_pred 1.00 203.2±8.45ms ? ?/sec
Data in one open chunk and one closed chunk of mutable buffer/tag2/with_pred 1.00 11.2±0.21ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag0/no_pred 1.00 100.3±3.73ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag0/with_pred 1.00 31.2±1.80ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag1/no_pred 1.00 126.7±2.29ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag1/with_pred 1.00 33.0±1.70ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag2/no_pred 1.00 212.0±6.86ms ? ?/sec
Data in open chunk of mutable buffer, and one chunk of read buffer/tag2/with_pred 1.00 18.1±0.99ms ? ?/sec
Data in single open chunk of mutable buffer/tag0/no_pred 1.00 98.7±6.08ms ? ?/sec
Data in single open chunk of mutable buffer/tag0/with_pred 1.00 11.2±0.37ms ? ?/sec
Data in single open chunk of mutable buffer/tag1/no_pred 1.00 118.9±3.97ms ? ?/sec
Data in single open chunk of mutable buffer/tag1/with_pred 1.00 11.7±0.64ms ? ?/sec
Data in single open chunk of mutable buffer/tag2/no_pred 1.00 202.1±8.49ms ? ?/sec
Data in single open chunk of mutable buffer/tag2/with_pred 1.00 11.1±0.27ms ? ?/sec
Data in two read buffer chunks/tag0/no_pred 1.00 109.2±5.20ms ? ?/sec
Data in two read buffer chunks/tag0/with_pred 1.00 44.2±1.83ms ? ?/sec
Data in two read buffer chunks/tag1/no_pred 1.00 132.9±3.79ms ? ?/sec
Data in two read buffer chunks/tag1/with_pred 1.00 41.7±2.43ms ? ?/sec
Data in two read buffer chunks/tag2/no_pred 1.00 222.4±7.00ms ? ?/sec
Data in two read buffer chunks/tag2/with_pred 1.00 27.9±0.92ms ? ?/sec
```
Rationale
---------
Our CLI needs to be able to accept configuration as JSON and render configuration as JSON.
Protobufs technically have an official JSON encoding rule called 'jsonpb` but prost doesn't
offer native supprot for it.
`prost` allows us to specify arbitrary derive metadata to be added to generated
code. We emit the `serde` derive directives in the two packages that generate prost code
(`generated_types` and `google_types`).
We use the `serde(rename_all = "camelCase")` to approximate `jsonpb`.
We instruct `prost` to use `bytes::Bytes` for some types, hence we must turn on the `serde` feature
on the `bytes` dependency.
We also use json to serialize the output of the `database get` command, to showcase the feature
and get rid of a TODO. In a subsequent PR I'll teach `database create` (and the yet to be done `database update`) to accept an option JSON configuration body so we can configure partitioning, lifecycle, sharding etc rules etc.
Caveats
-------
This is not technically `jsonpb`. Main issues:
1. default values not omitted
2. no special rendering of special types like `google.protobuf.Any`
Future work
-----------
Figure out if we can get fully compliant `jsonpb`, or at least a decent approximation.
Effect
------
```console
$ cargo run -- database get foobar_weather
{
"name": "foobar_weather",
"partitionTemplate": {
"parts": [
{
"part": {
"time": "%Y-%m-%d %H:00:00"
}
}
]
},
"lifecycleRules": {
"mutableLingerSeconds": 0,
"mutableMinimumAgeSeconds": 0,
"mutableSizeThreshold": 0,
"bufferSizeSoft": 0,
"bufferSizeHard": 0,
"sortOrder": {
"order": 2,
"sort": {
"createdAtTime": {}
}
},
"dropNonPersisted": false,
"immutable": false
},
"walBufferConfig": null,
"shardConfig": {
"specificTargets": null,
"hashRing": null,
"ignoreErrors": false
}
}
```
* refactor: inline catalog crate to server
* refactor: Add fine grained (object level) catalog locking
* fix: Move mod definition and use to top of file
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Rework Db to use Catalog for chunk state
* docs: Update server/src/db.rs
* fix: fmt
* fix: fmt
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This is a strawman for what routing rules might look like in DatabaseRules. Once there's a chance for discussion, I'd move next to looking at how the Server would split up an incoming write into separate FB blobs to be sent to remote IOx servers. That might change what the API/configuration looks like as that's how it would be used (at least for writes).
After that it would make sense to move to adding the proto definitions with conversions and gRPC and CLI CRUD to configure routing rules.
* refactor: Move test server fixture into its own module
* fix: Update tests/end-to-end.rs
* fix: better error handling and display
* fix: tweak startup message
* feat: add basic gRPC health service
* feat: update README.md add /health to HTTP API
* feat: add health client to influxdb_iox_client
feat: end-to-end test health check service
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Addresses the API aspect of #818
Adds a utility module that helps computing the length of a stream while buffering it
for later replay (in-memory or spilling it in a temporary file).
* feat: Enable/Disable logging in tests via RUST_LOG environment variable
* docs: Add section to contributing
* docs: tweak readme
* fix: Use same logging system in tests as in influxdb_ioxd
This is the promised cleanup. This structure gets rid of a lot of
intermediate structures and encodes through associated types how the
object stores and path types are related.
The enums are still necessary to avoid having generics leak all over
the place, but the object store variants and path variants should always
match because they'll always come from the object store trait
implementations that use the associated types.
* chore: Update arrow + tokio deps
* chore: Use bleeding edge azure
* chore: Update aws + other deps
* fix: fmt
* fix: Switch to in-house version of routerify
* fix: Upgrade to hyper 0.14
The hyper::error module is now private; hyper::Error is the public
re-export
* fix: Upgrade cloud storage to get tokio upgrade
* fix: Upgrade open_telemetry
* fix: Do not call `panic::set_hook` during another panic
Doing so leads to a double panic which aborts the process.
* fix: new h2 error who dis
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
* feat: Chunk Migration APIs and query data in the read buffer via SQL
* fix: Make code more consistent
* fix: fmt / clippy
* chore: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: Remove unecessary Result and make chunks() infallable
* chore: Apply more suggestions from code review
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: Edd Robinson <me@edd.io>
* refactor: consolidate line protocol schema creation into data_types, and port code to use it
refactor: Port mutable buffer to use SchemaBuilder
* fix: doctest
* refactor: remove unecessary clippyisms
* docs: Improve comments via suggestions from code review
Co-authored-by: Edd Robinson <me@edd.io>
* refactor: use more idomatic try_ naming and TryInto trait
* docs: Change from line protocol data model to InfluxDB data model
* refactor: rename LP --> Influx in code
* feat: add support for UInteger type
Co-authored-by: Edd Robinson <me@edd.io>
Adds serialization with compression and checksum for WAL buffer segments.
This required a weird structure where the flatbuffer bytes of ReplicatedWrite were kept as a raw payload. I did this because otherwise each of the replicated writes would have been rebuilt in the segment.
The other thing that isn't ideal is that deserializing a segment actually marshals it into a Rust struct as opposed to keeping the entire thing as raw flatbuffers. We could update this later to have a concept of an open segment (regular rust stuct) and closed segments that are just the flatbuffers.
Refactors the API method errors.
The user of the API client needs to be able to distinguish between various error
states when an API request fails. The most ergonomic way of exposing this
information is by returning an error enum that is specific to each API method
(or at least the important ones with well defined failure modes) - currently
only the `create_database()` method has significant error states, so this is
the only one with a specific error type in this impl.
This change defines a bunch of API error codes in the API client, adds them to
the IOx API error response body, and maps them in the API client. Due to error
wrapping the error code mapping in the IOx server is less exhaustive than I had
hoped however.
* feat: implement chunk listing and snapshotting in mutable buffer
* fix: update to use latest version of string interner and remove custom clone
* docs: fix comment
Initialises a new library crate and implements a basic IOx API client.
The API client supports:
- ping
- create database
Care has been taken to abstract away the underlying HTTP client used
(reqwest) and avoid leaking it into the public API (error types is a
common leak!) This makes updating the HTTP client and/or swapping it for
something else a backwards compatible change for end users of the crate.
Outstanding items:
- move shared API types into a sensible location
- discriminate between various IOx error responses
The former doesn't need doing until we publish the crate and will likely
be rather invasive / conlict prone so aiming to merge this PR and then
move things around in a follow-up.
The latter would allow us to expose error conditions to the user such
that they can take actions to remidy the situation / know if the request
can or should be retried / etc. Currently we expose a string error
message when requests fail, requiring string matching and/or passing the
string higher in the stack (and thus punting the problem to the caller).
It would be very nice to have typed errors, but a detail I have left for
later.