Merge pull request #641 from influxdata/query/optimize

Query optimization guide
pull/647/head
Scott Anderson 2019-12-02 10:23:27 -07:00 committed by GitHub
commit 60cba2512b
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 74 additions and 0 deletions

View File

@ -0,0 +1,74 @@
---
title: Optimize Flux queries
description: >
Optimize your Flux queries to reduce their memory and compute (CPU) requirements.
weight: 104
menu:
v2_0:
name: Optimize queries
parent: Query data
v2.0/tags: [query]
---
Optimize your Flux queries to reduce their memory and compute (CPU) requirements.
- [Start queries with pushdown functions](#start-queries-with-pushdown-functions)
- [Avoid short window durations](#avoid-short-window-durations)
- [Use "heavy" functions sparingly](#use-heavy-functions-sparingly)
- [Balance time range and data precision](#balance-time-range-and-data-precision)
## Start queries with pushdown functions
Some Flux functions can push their data manipulation down to the underlying
data source rather than storing and manipulating data in memory.
These are known as "pushdown" functions and using them correctly can greatly
reduce the amount of memory necessary to run a query.
#### Pushdown functions
- [range()](/v2.0/reference/flux/stdlib/built-in/transformations/range/)
- [filter()](/v2.0/reference/flux/stdlib/built-in/transformations/filter/)
- [group()](/v2.0/reference/flux/stdlib/built-in/transformations/group/)
Use pushdown functions at the beginning of your query.
Once a non-pushdown function runs, Flux pulls data into memory and runs all
subsequent operations there.
##### Pushdown functions in use
```js
from(bucket: "example-bucket")
|> range(start: -1h) //
|> filter(fn: (r) => r.sensor == "abc123") // Pushed to the data source
|> group(columns: ["_field", "host"]) //
|> aggregateWindow(every: 5m, fn: max) //
|> filter(fn: (r) => r._value >= 90.0) // Run in memory
|> top(n: 10) //
```
## Avoid short window durations
Windowing (grouping data based on time intervals) is commonly used to aggregate and downsample data.
Increase performance by avoiding short window durations.
More windows require more compute power to evaluate which window each row should be assigned to.
Reasonable window durations depend on the total time range queried.
## Use "heavy" functions sparingly
The following functions use more memory or CPU than others.
Consider their necessity in your data processing before using them:
- [map()](/v2.0/reference/flux/stdlib/built-in/transformations/map/)
- [reduce()](/v2.0/reference/flux/stdlib/built-in/transformations/aggregates/reduce/)
- [window()](/v2.0/reference/flux/stdlib/built-in/transformations/window/)
- [join()](/v2.0/reference/flux/stdlib/built-in/transformations/join/)
- [union()](/v2.0/reference/flux/stdlib/built-in/transformations/union/)
- [pivot()](/v2.0/reference/flux/stdlib/built-in/transformations/pivot/)
{{% note %}}
We're continually optimizing Flux and this list may not represent its current state.
{{% /note %}}
## Balance time range and data precision
To ensure queries are performant, balance the time range and the precision of your data.
For example, if you query data stored every second and request six months worth of data,
results will include a minimum of ≈15.5 million points.
Flux must store these points in memory to generate a response.
To query data over large periods of time, create a task to [downsample data](/v2.0/process-data/common-tasks/downsample-data/), and then query the downsampled data instead.