influxdb/docs/tracing.md

154 lines
6.1 KiB
Markdown
Raw Normal View History

# Tracing in IOx
IOx makes use of Rust's tracing ecosystem to output both application logs and
tracing events to Jaeger for distributed request correlation.
## The Components
The `tracing` framework aims to provide individual, pluggable crates that are
composed together to build a feature set custom to the application's needs.
### The `tracing` framework
Rust has the excellent [`tracing` crate] to facilitate collection of application
events with associated contextual information.
The most user visible components provided are the `debug!()`, `info!()`, etc
function-like macros, and the `#[tracing::instrument]` attribute macro.
The `info!()` macro and friends emit an "event" - a point in time event tagged
with relevant data.
If you add the `instrument` macro to a function, the tracing framework records
the function entry as a "span" that covers the duration of the call, and records
the timestamp when it returns. Any events or sub-spans emitted further down the
call stack inside this function's span are recorded as children of this span to
track causality.
The `tracing` crate focuses on providing a good API for emitting events & spans.
Consumers of the tracing data are called `Subscribers` and they are implemented
in other crates - there are subscribers that consume events and emit logs, some
that use the function entry spans for timing/metric data, and more every day!
### OpenTelemetry
[OpenTelemetry] is a relatively new standard to encourage interoperability within
the telemetry/tracing ecosystem. The `tracing-opentelemetry` crate subscribes to
the data produced by the `tracing` crate.
It's essentially an adaptor, consuming from `tracing` and converting them to
OpenTelemetry compatible events.
There are many [configuration options] supported at runtime.
### Jaeger (`opentelemetry_jaeger`)
[Jaeger] is a widely used telemetry collector with its own wire protocol,
separate from OpenTelemetry. While they're working on adding support for the
OpenTelemetry standard, it is not yet fully integrated (though I believe there
are builds available.)
The [`opentelemetry_jaeger` crate] sits at the end of the chain - the `tracing`
events are converted to `tracing-opentelemetry` events, which dispatches them to
this crate to be emitted to a listening Jaeger service.
## Using it
### Running IOx
First off, when running IOx you need to choose how much information you want.
You do this by setting the `RUST_LOG="debug"` environment variable (or `info`,
`error`, etc) or passing `-v` or `-vv` on the command line. The `RUST_LOG`
environment variable provides [granular control] over what log verbosity is
configured for individual internal components (tip: try `RUST_LOG="iox=debug"`!)
You can also choose to collect traces in Jaeger - this helps visualise
request-scoped events and provides timing information, and is generally a pretty
helpful tool when debugging. To enable Jaeger tracing output, set the following
environment variables:
```shell
OTEL_SERVICE_NAME="iox" # Defaults to iox if not specified
# No default, must be set
OTEL_EXPORTER_JAEGER_AGENT_HOST="jaeger.influxdata.net"
OTEL_EXPORTER_JAEGER_AGENT_PORT="6831"
```
### Working on IOx
When you're writing code, you should liberally use `debug` level tracing, as
well as `info` and above for information that is important to the user. This
helps greatly when tracking down any problems or to simply understand where
requests wind up in the codebase - this is particularly important when requests
span multiple servers!
Instrument your important functions at `info` - these provide a logical request
call stack rather than a _function_ call stack, so you can construct meaningful
traces without lots of noise.
Emit logs as you normally would, but [include contextual information] in them as
**fields** rather than interpolated strings:
```rust
#[tracing::instrument]
fn say_hello(name: &str) {
// This is good - fields come first!
info!(name, "hello there");
// This is alright, but less structured / useful
info!("hello there {}", name);
}
```
Instrumented functions wind up as distinct spans in the Jaeger trace, and all
the events emitted within them are output to stdout as log lines (with the
contextual information) as well as in the Jaeger span.
Instrumented functions also record the call arguments automatically (though you
can [control it]) with args printed using `Display` at `level=info` and above,
or `Debug` being used for `level=debug`. For example:
If your function returns a `Result<T, E>` and `E: Display` then you can record
the error too:
```rust
#[tracing::instrument(err)]
fn my_function(arg: usize) -> Result<(), std::io::Error> {
Ok(())
}
```
You can also instrument a function only at the "debug" level (`info` is the
default):
```rust
#[tracing::instrument(level = "debug")]
fn say_hello() {}
```
## Good to Know
* If you have an outer func that is instrumented with `level=debug`, and an
inner that is instrumented for `level=info`, the inner shows up when
`level=info` even if the outer does not (children are independent of their
parents).
* You can control the log level of the Jaeger events with the OpenTelemetry
config envs, but ultimately the `RUST_LOG` env sets the log level filter - you
cannot configure telemetry to emit at a log level lower than `RUST_LOG`.
* Be careful passing around sensitive arguments - instrumented functions will
record them! It's best to wrap them in some wrapper type that implements
neither `Display` nor `Debug` - then they can never be printed anywhere,
including in tracing!
[configuration options]: https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/sdk-environment-variables.md#opentelemetry-environment-variable-specification
[control it]: https://docs.rs/tracing/0.1.22/tracing/attr.instrument.html
[granular control]: https://docs.rs/tracing-subscriber/0.2.15/tracing_subscriber/filter/struct.EnvFilter.html
[`tracing` crate]: https://docs.rs/tracing/0.1.22/tracing/
[OpenTelemetry]: https://opentelemetry.io/
[Jaeger]: https://www.jaegertracing.io/
[`opentelemetry_jaeger` crate]: https://docs.rs/opentelemetry/0.10.0/opentelemetry/
[include contextual information]: https://docs.rs/tracing/0.1.22/tracing/#recording-fields