influxdb/docs/tracing.md

6.6 KiB

Tracing in IOx

IOx makes use of Rust's tracing ecosystem to output both application logs and tracing events to Jaeger for distributed request correlation.

The Components

The tracing framework aims to provide individual, pluggable crates that are composed together to build a feature set custom to the application's needs.

The tracing framework

Rust has the excellent tracing crate to facilitate collection of application events with associated contextual information.

The most user visible components provided are the debug!(), info!(), etc function-like macros, and the #[tracing::instrument] attribute macro.

The info!() macro and friends emit an "event" - a point in time event tagged with relevant data.

If you add the instrument macro to a function, the tracing framework records the function entry as a "span" that covers the duration of the call, and records the timestamp when it returns. Any events or sub-spans emitted further down the call stack inside this function's span are recorded as children of this span to track causality.

The tracing crate focuses on providing a good API for emitting events & spans. Consumers of the tracing data are called Subscribers and they are implemented in other crates - there are subscribers that consume events and emit logs, some that use the function entry spans for timing/metric data, and more every day!

OpenTelemetry

OpenTelemetry is a relatively new standard to encourage interoperability within the telemetry/tracing ecosystem. The tracing-opentelemetry crate subscribes to the data produced by the tracing crate.

It's essentially an adaptor, consuming from tracing and converting them to OpenTelemetry compatible events.

There are many configuration options supported at runtime.

Jaeger (opentelemetry_jaeger)

Jaeger is a widely used telemetry collector with its own wire protocol, separate from OpenTelemetry. While they're working on adding support for the OpenTelemetry standard, it is not yet fully integrated (though I believe there are builds available.)

The opentelemetry_jaeger crate sits at the end of the chain - the tracing events are converted to tracing-opentelemetry events, which dispatches them to this crate to be emitted to a listening Jaeger service.

Using it

Running IOx

First off, when running IOx you need to choose how much information you want. You do this by setting the RUST_LOG="debug" environment variable (or info, error, etc) or passing -v or -vv on the command line. The RUST_LOG environment variable provides granular control over what log verbosity is configured for individual internal components (tip: try RUST_LOG="iox=debug"!)

You can also choose to collect traces in Jaeger - this helps visualise request-scoped events and provides timing information, and is generally a pretty helpful tool when debugging. To enable Jaeger tracing output, set the following environment variables:

OTEL_SERVICE_NAME="iox" # Defaults to iox if not specified

# No default, must be set
OTEL_EXPORTER_JAEGER_AGENT_HOST="jaeger.influxdata.net"
OTEL_EXPORTER_JAEGER_AGENT_PORT="6831"

Working on IOx

When you're writing code, you should liberally use debug level tracing, as well as info and above for information that is important to the user. This helps greatly when tracking down any problems or to simply understand where requests wind up in the codebase - this is particularly important when requests span multiple servers!

Instrument your important functions at info - these provide a logical request call stack rather than a function call stack, so you can construct meaningful traces without lots of noise.

The trace!() level is useful to instrument areas of code for development or testing that should not wind up in the release binary - as such, the trace level is compiled at when building a "release" binary. As a guideline:

  • trace!() for debug builds and verbose messaging, used liberally as required. Compiled out in release builds.
  • debug!() for events that would help tracking down bugs, but are not useful to the user
  • info!() & above for info useful to the user

Emit logs as you normally would, but include contextual information in them as fields rather than interpolated strings:

#[tracing::instrument]
fn say_hello(name: &str) {
	// This is good - fields come first!
    info!(name, "hello there");

	// This is alright, but less structured / useful
    info!("hello there {}", name);
}

Instrumented functions wind up as distinct spans in the Jaeger trace, and all the events emitted within them are output to stdout as log lines (with the contextual information) as well as in the Jaeger span.

Instrumented functions also record the call arguments automatically (though you can control it) with args printed using Display at level=info and above, or Debug being used for level=debug. For example:

If your function returns a Result<T, E> and E: Display then you can record the error too:

#[tracing::instrument(err)]
fn my_function(arg: usize) -> Result<(), std::io::Error> {
    Ok(())
}

You can also instrument a function only at the "debug" level (info is the default):

#[tracing::instrument(level = "debug")]
fn say_hello() {}

Good to Know

  • If you have an outer func that is instrumented with level=debug, and an inner that is instrumented for level=info, the inner shows up when level=info even if the outer does not (children are independent of their parents).

  • You can control the log level of the Jaeger events with the OpenTelemetry config envs, but ultimately the RUST_LOG env sets the log level filter - you cannot configure telemetry to emit at a log level lower than RUST_LOG.

  • Be careful passing around sensitive arguments - instrumented functions will record them! It's best to wrap them in some wrapper type that implements neither Display nor Debug - then they can never be printed anywhere, including in tracing!