Logs and traces are emitted via one pipeline. For now, it is not
possible to emit both, but it should be possible in a few weeks, as
tokio/tracing/tracing-subscriber is going through some refactoring recently.
All affected flags are well-documented, and I have tested all but the
OTLP output flags.
chore: clippy happy
chore: revert instrumentation changes
feat: add log format logfmt, log destinations stderr, stdout
chore: clippy happy
Rationale
---------
Our CLI needs to be able to accept configuration as JSON and render configuration as JSON.
Protobufs technically have an official JSON encoding rule called 'jsonpb` but prost doesn't
offer native supprot for it.
`prost` allows us to specify arbitrary derive metadata to be added to generated
code. We emit the `serde` derive directives in the two packages that generate prost code
(`generated_types` and `google_types`).
We use the `serde(rename_all = "camelCase")` to approximate `jsonpb`.
We instruct `prost` to use `bytes::Bytes` for some types, hence we must turn on the `serde` feature
on the `bytes` dependency.
We also use json to serialize the output of the `database get` command, to showcase the feature
and get rid of a TODO. In a subsequent PR I'll teach `database create` (and the yet to be done `database update`) to accept an option JSON configuration body so we can configure partitioning, lifecycle, sharding etc rules etc.
Caveats
-------
This is not technically `jsonpb`. Main issues:
1. default values not omitted
2. no special rendering of special types like `google.protobuf.Any`
Future work
-----------
Figure out if we can get fully compliant `jsonpb`, or at least a decent approximation.
Effect
------
```console
$ cargo run -- database get foobar_weather
{
"name": "foobar_weather",
"partitionTemplate": {
"parts": [
{
"part": {
"time": "%Y-%m-%d %H:00:00"
}
}
]
},
"lifecycleRules": {
"mutableLingerSeconds": 0,
"mutableMinimumAgeSeconds": 0,
"mutableSizeThreshold": 0,
"bufferSizeSoft": 0,
"bufferSizeHard": 0,
"sortOrder": {
"order": 2,
"sort": {
"createdAtTime": {}
}
},
"dropNonPersisted": false,
"immutable": false
},
"walBufferConfig": null,
"shardConfig": {
"specificTargets": null,
"hashRing": null,
"ignoreErrors": false
}
}
```
* refactor: inline catalog crate to server
* refactor: Add fine grained (object level) catalog locking
* fix: Move mod definition and use to top of file
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: Rework Db to use Catalog for chunk state
* docs: Update server/src/db.rs
* fix: fmt
* fix: fmt
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* refactor: Move test server fixture into its own module
* fix: Update tests/end-to-end.rs
* fix: better error handling and display
* fix: tweak startup message
* feat: add basic gRPC health service
* feat: update README.md add /health to HTTP API
* feat: add health client to influxdb_iox_client
feat: end-to-end test health check service
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* fix: test_helpers crate should only be a dev-dep
* fix: object_store no longer has a build script, so no longer needs a build dep
* chore: Alphabetize all Cargo.tomls
* chore: Update arrow + tokio deps
* chore: Use bleeding edge azure
* chore: Update aws + other deps
* fix: fmt
* fix: Switch to in-house version of routerify
* fix: Upgrade to hyper 0.14
The hyper::error module is now private; hyper::Error is the public
re-export
* fix: Upgrade cloud storage to get tokio upgrade
* fix: Upgrade open_telemetry
* fix: Do not call `panic::set_hook` during another panic
Doing so leads to a double panic which aborts the process.
* fix: new h2 error who dis
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
Co-authored-by: Jake Goulding <jake.goulding@integer32.com>
Refactors the API method errors.
The user of the API client needs to be able to distinguish between various error
states when an API request fails. The most ergonomic way of exposing this
information is by returning an error enum that is specific to each API method
(or at least the important ones with well defined failure modes) - currently
only the `create_database()` method has significant error states, so this is
the only one with a specific error type in this impl.
This change defines a bunch of API error codes in the API client, adds them to
the IOx API error response body, and maps them in the API client. Due to error
wrapping the error code mapping in the IOx server is less exhaustive than I had
hoped however.
Initialises a new library crate and implements a basic IOx API client.
The API client supports:
- ping
- create database
Care has been taken to abstract away the underlying HTTP client used
(reqwest) and avoid leaking it into the public API (error types is a
common leak!) This makes updating the HTTP client and/or swapping it for
something else a backwards compatible change for end users of the crate.
Outstanding items:
- move shared API types into a sensible location
- discriminate between various IOx error responses
The former doesn't need doing until we publish the crate and will likely
be rather invasive / conlict prone so aiming to merge this PR and then
move things around in a follow-up.
The latter would allow us to expose error conditions to the user such
that they can take actions to remidy the situation / know if the request
can or should be retried / etc. Currently we expose a string error
message when requests fail, requiring string matching and/or passing the
string higher in the stack (and thus punting the problem to the caller).
It would be very nice to have typed errors, but a detail I have left for
later.
Replaces the hand-rolled config system with a StructOpt managed config struct.
I've got most of it ported across, but the interaction between all the logging
config bits is complex! I've left what is there and hooked in the value from
the config struct (which directly replaces the env var in usage, as it also
sources from the env).
* feat: Create configuration system, port IOx to use it
* docs: Apply suggestions from code review
Co-authored-by: Paul Dix <paul@influxdata.com>
* fix: fix test for setting values
Co-authored-by: Paul Dix <paul@influxdata.com>
This moves the HTTP API over to Routerify, which has the basic route parsing logic that will enable the API design for IOx.
I had a little trouble with the error handling in Routerify so I ended up creating a macro for constructing error responses in the HTTP API. I'm not sure what I think of this pattern so I'm interested in what others think. Another option would be to have two functions for each API endpoint. One which is x_handler with a Routerify function signature. Then another which is just x that has the Result<Response<Body>, ApplicationError> return type, which would make using the ? operator work in those functions. That would eliminate the need for the return_err macro.
I'm happy to refactor to that if people prefer it.
Configures the IOx tracing to compile out trace!() level events in the release
binary. This effectively gives contributors three levels of output:
* Important to the user (info & friends)
* Not important for regular running, but needed to debug
* Only useful to devs in a specific part of the system, never seen by user
Documents this behaviour (and general usage guidelines) for contributors.
Adds telemetry / tracing with support for a Jaeger backend, and changes the
logger from env_logger to a tracing subscriber to collect the log entries.
Events are batched and then emitted asynchronosuly via UDP to the Jaeger
collector using the tokio runtime. There's a bunch of settings (env
vars) related to batch sizes and flush frequency etc - they're all using
their default values at the moment (if it ain't broke...) See the docs
for more info:
https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/sdk-environment-variables.md#opentelemetry-environment-variable-specification
This is only part 1 of telemetry - it does NOT propagate traces across RPC
boundaries as we're still defining how all this should work. I've created #541
to track this.
Closes#202 and closes#203.
* feat: Implement write buffer to Parquet snapshotting
This introduces snapshot to the server packages to manage snapshotting. It also introduces a new trait for representing a Partition. There is a very crude API wired up in http_routes for testing purposes. Follow on work will bring the server package into http_routes and rework the snapshot API.
This commit refactors the flatbuffers data types from the wal to a new crate where they can be used by storage, write buffer, and cluster. It also refactors cluster to move the configuration types out to the data types crate so they can be used across storage and elsewhere.
Finally, it adds a new method to store replicated writes on a database in the database trait and implements it.
* test: traits for database and tests for http handler
* refactor: Use generics and trait bounds instead of trait objects
* refactor: Replace trait objects with an associated type
* refactor: Extract an associated Error type on the Database traits
* refactor: Remove some explicit conversions to_string that Snafu takes care of
* docs: add comments
* refactor: move traits into storage module
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
This is the initial prototype of the WriteBuffer and WAL. This does the following:
* accepts a slice of ParsedLine into the DB
* writes those into an in memory structure with tags represented as u32 dictionaries and all field types supported
* persists those writes into the WAL as Flatbuffer blobs (one WAL entry per slice of lines written, or WriteBatch)
* has a method to return a table from the buffer as an Arrow RecordBatch
* recovers the WAL after the database is closed and opened back up again
* has a single test that covers the end-to-end from the DB side
* It doesn't include partitioning yet. Although the write_lines method does actually try to do partitions on time. That'll get changed to be something more general defined by a per database configuration.
* hooked up to the v2 HTTP write API
* hooked up to a read API which will execute a SQL query against the data in the buffer
This includes a refactor of the WAL:
Refactors the WAL to remove async and threading so that it can be moved higher up. This simplifies the API while keeping just about the same amount of code in ParitionStore to handle the asynchronous writes.
This also modifies the WAL to remove the SideFile implementation, which was causing significant performance problems and write amplification. The downside is that WAL writes are no longer guarranteed atomic.
Further, this modifies the WAL to keep the active segement file handle open. Appends now don't have to list the directory contents and look for the latest file and open the file handle to do appends, which should also improve performance and reduce iops.
Jake dug into why the end-to-end tests fail with delorean running in the
Docker image I built, and it appears to be a crash with an illegal
instruction from CRoaring.
We think it's this issue: https://github.com/saulius/croaring-rs/pull/62
which was merged and released, so let's try updating CRoaring.
* feat: benchmark for lp->parquet performance
* feat: improve parser performance by storing contiguous EscapedStr
* fix: remove all string copies during LP-Parquet conversion
* refactor: Implement from_str as From<&str> only
* refactor: implement Deref instead of as_str
* refactor: Remove ends_with because Deref now makes it work
* refactor: Eq can be derived
* refactor: Remove unused From implementation
* refactor: Replace single-character strings with chars as requested by clippy
Co-authored-by: Carol (Nichols || Goulding) <carol.nichols@integer32.com>
* refactor: move all dstool code into delorean binary
* fix: Move code/mods to make it compile and run
* fix: warn if db dir does not exist
* refactor: Match argument subcommands w/ more idomatic rust
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
fix: restore hyper logging
fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: update expected code
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* feat: Add parquet writer, hook up conversion in dstool
* fix: use bigger executor for test
* fix: less cloning
* fix: make unsupported messages less pejorative
* fix: fmt
* fix: Rename writer and do not require std::File, add example
* fix: clippy and fmt
* fix: remove unnecessary module in end to end tests
* fix: remove strange use of tempfile
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: Apply suggestions from code review
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* fix: cleanup use
* fix: Use more specific error messages
* fix: comment tweak
* fix: touchup temp path creation
* fix: clippy!
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
* refactor: rename the module containing generated types
The nested `delorean` was confusing anyway, and this will make more
sense when we extract a new crate.
* refactor: Move the generated types to their own crate
This allows us to have more lax warnings in that crate alone, keeping
the main crate more strict.
* style: Re-enable elided lifetimes lint in the main crate
This commit adds benchmarks for the float encoder and decoder. The
following scenarios are benchmarked:
- sequential values;
- random values;
- real CPU values (from Telegraf output).
Each scenario is benchmarked with a variety of block sizes.