diff --git a/docs/multi_core_tasks.md b/docs/multi_core_tasks.md new file mode 100644 index 0000000000..882aca2a54 --- /dev/null +++ b/docs/multi_core_tasks.md @@ -0,0 +1,63 @@ +# Patterns for CPU usage in the Runtime + +As discussed on https://github.com/influxdata/delorean/pull/221 and https://github.com/influxdata/delorean/pull/226, we needed a strategy for handling I/O as well as CPU heavy requests in a flexible manner that will allow making appropriate tradeoffs between resource utilization and availability under load. + +## TLDR; + +1. Use only async I/O via `tokio` -- do not use blocking I/O calls (in new code). + +2. All CPU bound tasks should be scheduled on the separate application level `thread_pool` not with `tokio::task::spawn` nor `tokio::task::spawn_blocking` nor a new threadpool. + +We will work, over time, to migrate the rest of the codebase to use these patterns. + +## Background + +At a high level there are two different kinds of work the server does: + +1. Responding to external (I/O) requests (aka responding to `GET /ping`, `POST /api/v2/write`, or the various gpc APIs). + +2. CPU heavy tasks such data parsing, format conversion, or query processing + +## Desired Properties / Rationale + +We wish to have the following properties in the runtime: + +1. Incoming requests are always processed and responded to in a "timely" manner, even while the server is load / overloaded. For example, even if the server is currently consuming 100% of the CPU doing useful work (converting data, running queries, etc), it still must be able to answer a `/ping` request to tell observers that it is still alive and not be taken out of a load balancer rotation or killed by kubernetes. + +2. Ability to (eventually) control the priority of CPU heavy tasks, so we can prioritize certain tasks over others (e.g. query requests over background data rearrangement) + +## Guidelines + +The above desires, leads us to the following guidelines: + +### Do not use blocking I/O. + +**What**: All I/O should be done with `async` functions / tokio extensions (aka use `tokio::fs::File` rather than `std::fs::File`, etc.) + +**Rationale**: This ensures that we do not tie up threads which are servicing I/O requests with blocking functions. + +This can not always be done (e.g. with a library such as parquet writer which is not `async`). + +### All CPU heavy work should be done on the single app level worker pool, separate from the tokio runtime + +**What**: All CPU heavy work should be done on the single app level worker pool. We provide a `thread_pool` interface that interacts nicely with async tasks (e.g. that allows an async task to `await` for a CPU heavy task to complete). + +**Rationale**: A single app level worker pool gives us a single place to control work priority, eventually, so that tasks such as compaction of large data files can have lower precedence than incoming queries. By using a different pool than the tokio runtime, with a limited number of threads, we avoid over-saturating the CPU with OS threads and thereby starving the limited number tokio I/O threads. A separate, single app level pool also limits the number of underlying OS CPU threads which are spawned, even under heavy load, keeping thread context switching overhead low. + +There will, of course, always be a judgment call to be made of where "CPU bound work" starts and "work acceptable for I/O processing" ends. A reasonable rule of thumb is if a job will *always* be completed in less than 100ms then that is probably fine for an I/O thread). This number may be revised as we tune the system. + +## Examples + +TODO: picture of how async + CPU heavy threads interact + +The standard pattern to run CPU heavy tasks from async code is: +* Launch the cpu heavy tasks on the thread pool using `spawn_with_handle` +* `await` their completion using `join_all` or some other similar `Future` combinator. + +## Alternatives Considered + +### Use tokio::task::spawn_blocking for all CPU heavy work +While this definitely avoids tying up all I/O handling threads, it can also result in a large number of threads when the system is under load and has more work coming in than it can complete. + +### Use the main tokio::task::spawn for all work +While this likely results in the best possible efficiency, it can mean that I/O requests (such as responding to `/ping` are not handled in a timely manner)