346 lines
13 KiB
Markdown
346 lines
13 KiB
Markdown
# IOx — Profiling
|
|
|
|
This document explains certain profiling strategies.
|
|
|
|
## Preparation
|
|
If you want to profile IOx, make sure to build+run it using an appropriate profile:
|
|
|
|
- **release:** This is a production quality binary. Use `cargo run --release` or `cargo build --release` (binary is
|
|
`./target/release/influxdb_iox`).
|
|
- **quick-release:** This comes close to a production-like binary but the build step is not as heavy, allowing you to
|
|
iterate faster -- especially after you have found an issue and want to try out improvements). Use
|
|
`cargo run --profile quick-release` or `cargo build --profile quick-release` (binary is
|
|
`./target/quick-release/influxdb_iox`).
|
|
- **debug:** This is an unoptimized debug binary. It can still be helpful though to figure out why tests (which use the
|
|
same profile) take so long. Use `cargo run` / `cargo run --dev` or `cargo build` / `cargo build --dev` (binary is
|
|
`./target/dev/influxdb_iox`).
|
|
|
|
Note that your concrete test hardware (esp. if is not an x64 CPU or a battery-driven laptop), your operating system
|
|
(esp. if it is not Linux) and other factors can play a role and may result to different result compared to prod.
|
|
|
|
If you want to trace memory allocations, you need to disable [jemalloc] by passing `--no-default-features` to cargo.
|
|
|
|
|
|
## Out-of-memory (OOM)
|
|
When profiling a process that may potentially use too much memory and affect your whole system by doing so, you may want
|
|
to limit its resources a bit.
|
|
|
|
### ulimit
|
|
Set a [ulimit] before running the process:
|
|
|
|
```console
|
|
$ # set ulimit to 1GB (value is in Kb)
|
|
$ ulimit -v 1048576
|
|
$ cargo run --release ...
|
|
```
|
|
|
|
The advantage of [ulimit] is that out-of-memory situations are clearly signaled to the process and you get backtraces
|
|
when running under a debugger.
|
|
|
|
### system OOM killer
|
|
Your system likely has an OOM killer configured. The issue with this is that it will use SIGKILL to terminate the
|
|
process, which you cannot investigate using a debugger (so no backtrace!).
|
|
|
|
The OOM killer is also used by all cgroup-based containers on Linux, e.g. [Docker], [Podman], [systemd-run].
|
|
|
|
|
|
## Embedded CPU Profiler
|
|
IOx includes an embedded `pprof` exporter compatible with the [go pprof] tool.
|
|
|
|
To use it, aim your favorite tool at your IOx host at the HTTP `/debug/pprof/profile` endpoint.
|
|
|
|
### Use the Go `pprof` tool:
|
|
|
|
Example
|
|
|
|
```shell
|
|
go tool pprof 'http://localhost:8080/debug/pprof/profile?seconds=5'
|
|
```
|
|
|
|
And you get output like:
|
|
|
|
```text
|
|
Fetching profile over HTTP from http://localhost:8080/debug/pprof/profile?seconds=5
|
|
Saved profile in /Users/mkm/pprof/pprof.cpu.006.pb.gz
|
|
Type: cpu
|
|
Entering interactive mode (type "help" for commands, "o" for options)
|
|
(pprof) top
|
|
Showing nodes accounting for 93, 100% of 93 total
|
|
Showing top 10 nodes out of 185
|
|
flat flat% sum% cum cum%
|
|
93 100% 100% 93 100% backtrace::backtrace::libunwind::trace
|
|
0 0% 100% 1 1.08% <&str as nom::traits::InputTakeAtPosition>::split_at_position1_complete
|
|
0 0% 100% 1 1.08% <(FnA,FnB) as nom::sequence::Tuple<Input,(A,B),Error>>::parse
|
|
0 0% 100% 1 1.08% <(FnA,FnB,FnC) as nom::sequence::Tuple<Input,(A,B,C),Error>>::parse
|
|
0 0% 100% 5 5.38% <F as futures_core::future::TryFuture>::try_poll
|
|
0 0% 100% 1 1.08% <T as alloc::slice::hack::ConvertVec>::to_vec
|
|
0 0% 100% 1 1.08% <alloc::alloc::Global as core::alloc::Allocator>::allocate
|
|
0 0% 100% 1 1.08% <alloc::borrow::Cow<B> as core::clone::Clone>::clone
|
|
0 0% 100% 3 3.23% <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
|
|
0 0% 100% 1 1.08% <alloc::vec::Vec<T,A> as core::iter::traits::collect::Extend<T>>::extend
|
|
```
|
|
|
|
### Interactive visualizations
|
|
|
|
The `go tool pprof` command can also open an interactive visualization in a web browser page,
|
|
that allows you to render a call graph, or a flamegraph or other visualizations, and also search for symbols etc. See:
|
|
|
|
```shell
|
|
go tool pprof -http=localhost:6060 'http://localhost:8080/debug/pprof/profile?seconds=30'
|
|
```
|
|
|
|
### Use the built-in flame graph renderer
|
|
|
|
You may not always have the `go` toolchain on your machine.
|
|
IOx also knows how to render a flamegraph SVG directly if opened directly in the browser:
|
|
|
|
For example, if you aim your browser at an IOx server with a URL such as http://localhost:8080/debug/pprof/profile?seconds=5
|
|
|
|
You will see a beautiful flame graph such as
|
|
|
|
![Flame Graph](images/flame_graph.png)
|
|
|
|
### Capture to a file and view afterwards
|
|
|
|
You can also capture to a file and then view with pprof afterwards
|
|
|
|
```console
|
|
$ # write data to profile.proto
|
|
$ curl 'http://localhost:8080/debug/pprof/profile?seconds=30' -o profile.proto
|
|
|
|
$ # view with pprof tool
|
|
$ go tool pprof -http=localhost:6060 profile.proto
|
|
```
|
|
|
|
### Pros & Cons
|
|
While the builtin CPU profiler is convenient and can easily be used on a deployed production binary, it may lack certain
|
|
flexibility and features. Also note that this is a sampling profiler, so you may miss certain events.
|
|
|
|
|
|
## Embedded Heap Profiler
|
|
|
|
IOx includes a memory heap profile tool as well as a CPU profiler. The memory usage tool is based on [heappy].
|
|
|
|
Support is is not compiled in by default, but must be enabled via the `heappy` feature:
|
|
|
|
```shell
|
|
# Compile and run IOx with heap profiling enabled
|
|
cargo run --no-default-features --features=heappy -- run all-in-one
|
|
```
|
|
|
|
Now, you aim your browser at an IOx server with a URL such as http://localhost:8080/debug/pprof/allocs?seconds=5
|
|
|
|
You will see a green flamegraph such as
|
|
|
|
![Heappy Graph](images/heappy_graph.png)
|
|
|
|
|
|
### Pros & Cons
|
|
[Heappy] is probably the easiest way to profile memory, but due to its simple nature its output is limited (e.g. it is
|
|
hard to track "wandering" allocations that are created in one and de-allocated in another place).
|
|
|
|
|
|
## cargo-flamegraph
|
|
You can use [cargo-flamegraph] which is an all-in-one solution to create flamegraphs for production binaries, tests, and
|
|
benchmarks.
|
|
|
|
|
|
## `perf` + X (Linux only)
|
|
While [cargo-flamegraph] is nice and simple, sometimes you need more control over the profiling or want to use a
|
|
different viewer. For that, install [cargo-with] and make sure you have [perf] installed. To profile a specific test,
|
|
e.g. `test_cases_delete_three_delete_three_chunks_sql` in `query_tests`:
|
|
|
|
```console
|
|
$ # cargo-with requires you to change the CWD first:
|
|
$ cd query_tests
|
|
$ cargo with 'perf record -F99 --call-graph dwarf -- {bin}' -- test -- test_cases_delete_three_delete_three_chunks_sql
|
|
```
|
|
|
|
Now you have a `perf.data` file that you can use with various tools.
|
|
|
|
### Speedscope
|
|
First prepare the `perf` output:
|
|
|
|
```console
|
|
$ perf script > perf.txt
|
|
```
|
|
|
|
Now to to [speedscope.app] and upload `perf.txt` to view the profile.
|
|
|
|
### Hotspot
|
|
[Hotspot] can analyze `perf.data` directly:
|
|
|
|
![Hotspot Screenshot](images/hotspot.png)
|
|
|
|
|
|
## Advanced `perf` (Linux only)
|
|
[perf] has loads of other tricks, e.g. syscall counting. So imagine after some profiling we figured out that our
|
|
test is using too many [`getrandom`] calls. Now we can can first use [perf] to generate a profile for that specific
|
|
syscall. Make sure you have [cargo-with] installed and run:
|
|
|
|
```console
|
|
$ # cargo-with requires you to change the CWD first:
|
|
$ cd query_tests
|
|
$ cargo with 'perf record -e syscalls:sys_enter_getrandom --call-graph dwarf -- {bin}' -- test -- test_cases_delete_three_delete_three_chunks_sql
|
|
```
|
|
|
|
and then we can generate a callgraph using [gprof2dot]:
|
|
|
|
```console
|
|
$ perf script | gprof2dot --format=perf | dot -Tsvg > perf.svg
|
|
```
|
|
|
|
![Getrandom callgraph](images/getrandom_callgraph.svg)
|
|
|
|
|
|
## heaptrack (Linux only)
|
|
[heaptrack] is a quite fast and accurate way to profile heap allocations under Linux. It works with Rust applications
|
|
to. Install [heaptrack] and [cargo-with] and run:
|
|
|
|
```console
|
|
$ cargo with 'heaptrack' -- run --profile=quick-release --no-default-features -- run -v
|
|
```
|
|
|
|
Note the `--no-default-features` flag which will disable [jemalloc] so that [heaptrack] can inspect memory allocations.
|
|
|
|
After the program exists, the [heaptrack] GUI will spawn automatically:
|
|
|
|
![heaptrack GUI](images/heaptrack.png)
|
|
|
|
### Pros & Cons
|
|
[heaptrack] is relatively fast, esp. compared to [Valgrind](#valgrind). It also works in cases when the process OOMs --
|
|
even when it get killed by `SIGKILL`.
|
|
|
|
Be aware that [heaptrack] does NOT work with tests (e.g. via
|
|
`cargo with 'heaptrack' -- test -p compactor -- my_test --nocapture`).[^heaptrack_tests] You have to isolate the code
|
|
into an ordinary binary, so create a file `my_crate/src/bin/foo.rs` and replace `#[tokio::test]` with `#[tokio::main]`.
|
|
|
|
|
|
## bpftrace (Linux only)
|
|
You may use even more advanced tools like [bpftrace] to trace about any aspect of the operating system. Install
|
|
[bpftrace] and [cargo-with], then create a tracing script in `program.txt`:
|
|
|
|
```text
|
|
tracepoint:syscalls:sys_exit_read /pid == cpid/ { @bytes = hist(args->ret); }
|
|
```
|
|
|
|
This example will produce a histogram of all sizes returned by the [`read`] syscall:
|
|
|
|
```console
|
|
$ cargo with "sudo bpftrace -c {bin} program.txt" -- run --profile=quick-release
|
|
Finished quick-release [optimized + debuginfo] target(s) in 0.20s
|
|
Attaching 1 probe...
|
|
^C[influxdb_iox/src/main.rs:327]
|
|
|
|
|
|
@bytes:
|
|
[0] 1 |@@@@@@@@@@@@@ |
|
|
[1] 1 |@@@@@@@@@@@@@ |
|
|
[2, 4) 0 | |
|
|
[4, 8) 1 |@@@@@@@@@@@@@ |
|
|
[8, 16) 1 |@@@@@@@@@@@@@ |
|
|
[16, 32) 1 |@@@@@@@@@@@@@ |
|
|
[32, 64) 0 | |
|
|
[64, 128) 1 |@@@@@@@@@@@@@ |
|
|
[128, 256) 0 | |
|
|
[256, 512) 0 | |
|
|
[512, 1K) 4 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|
|
|
[1K, 2K) 2 |@@@@@@@@@@@@@@@@@@@@@@@@@@ |
|
|
```
|
|
|
|
**WARNING: Due to the `sudo` hack, only use this for trusted programs!**
|
|
|
|
|
|
## Instruments: CPU / performance profiling (macOS Only)
|
|
|
|
Instruments may be used to profile binaries on macOS. There are several instruments available, but perhaps the most
|
|
useful for IOx development are the
|
|
|
|
* Sampling CPU profiler,
|
|
* Cycle-based CPU profiler,
|
|
* System Trace (system calls, CPU scheduling)
|
|
* File Activity (file system and disk I/O activity)
|
|
|
|
|
|
## Instruments: Allocations (macOS Only)
|
|
|
|
The allocations instrument is a powerful tool for tracking heap allocations on macOS and recording call stacks.
|
|
|
|
![Allocation call stacks](images/instruments_heap_1.png)
|
|
|
|
![Allocation statistics](images/instruments_heap_stats.png)
|
|
|
|
It can be used with Rust and `influxdb_iox`, but requires some additional steps on aarch64 and later versions of macOS
|
|
due to increased security.
|
|
|
|
### Preparing binary
|
|
|
|
Like heaptrack, you must compile `influxdb_iox` with `--no-default-features` to ensure the default system allocator is
|
|
used. Following the compilation step,
|
|
[you must codesign the binary](https://developer.apple.com/forums/thread/685964?answerId=683365022#683365022)
|
|
with the `get-task-allow` entitlement set to `true`. Without the codesign step, the Allocations instrument will fail to
|
|
start with an error similar to the following:
|
|
|
|
> Required Kernel Recording Resources Are in Use
|
|
|
|
First, generate a temporary entitlements plist file, named `tmp.entitlements`:
|
|
|
|
```sh
|
|
/usr/libexec/PlistBuddy -c "Add :com.apple.security.get-task-allow bool true" tmp.entitlements
|
|
```
|
|
|
|
Then codesign the file with the `tmp.entitlements` file:
|
|
|
|
```sh
|
|
codesign -s - --entitlements tmp.entitlements -f target/release/influxdb_iox
|
|
```
|
|
|
|
You can verify the file is correctly code-signed as follows:
|
|
|
|
```sh
|
|
codesign --display --entitlements - target/release/influxdb_iox
|
|
```
|
|
```
|
|
Executable=/Users/stuartcarnie/projects/rust/influxdb_iox/target/release/influxdb_iox
|
|
[Dict]
|
|
[Key] com.apple.security.get-task-allow
|
|
[Value]
|
|
[Bool] true
|
|
```
|
|
|
|
or the running `influxdb_iox` process using its PID:
|
|
|
|
```sh
|
|
codesign --display --entitlements - +<PID>
|
|
```
|
|
|
|
|
|
## Tracing
|
|
See [Tracing: Running Jaeger / tracing locally](tracing.md#running-jaeger--tracing-locally).
|
|
|
|
|
|
## Valgrind
|
|
See [Valgrind](valgrind.md).
|
|
|
|
|
|
[bpftrace]: https://github.com/iovisor/bpftrace
|
|
[cargo-flamegraph]: https://github.com/flamegraph-rs/flamegraph
|
|
[cargo-with]: https://github.com/cbourjau/cargo-with
|
|
[Docker]: https://www.docker.com/
|
|
[`getrandom`]: https://www.man7.org/linux/man-pages/man2/getrandom.2.html
|
|
[go pprof]: https://golang.org/pkg/net/http/pprof/
|
|
[gprof2dot]: https://github.com/jrfonseca/gprof2dot
|
|
[heappy]: https://github.com/mkmik/heappy
|
|
[heaptrack]: https://github.com/KDE/heaptrack
|
|
[Hotspot]: https://github.com/KDAB/hotspot
|
|
[jemalloc]: https://jemalloc.net/
|
|
[perf]: https://perf.wiki.kernel.org/index.php/Main_Page
|
|
[Podman]: https://podman.io/
|
|
[`read`]: https://www.man7.org/linux/man-pages/man2/read.2.html
|
|
[speedscope.app]: https://www.speedscope.app/
|
|
[systemd-run]: https://www.freedesktop.org/software/systemd/man/systemd-run.html
|
|
[ulimit]: https://ss64.com/bash/ulimit.html
|
|
|
|
|
|
[^heaptrack_tests]: I have no idea why.
|