updated query optimizations guide to address PR feedback

pull/641/head
Scott Anderson 2019-11-27 14:16:56 -07:00
parent d9c0d409ed
commit bc756e1f61
1 changed files with 27 additions and 35 deletions

View File

@ -1,8 +1,7 @@
---
title: Optimize Flux queries
description: >
Optimize your Flux queries by changing function order, understanding how functions work,
and being conscious of the nature of the data queried.
Optimize your Flux queries to reduce their memory and compute (CPU) requirements.
weight: 104
menu:
v2_0:
@ -11,25 +10,28 @@ menu:
v2.0/tags: [query]
---
Optimize your Flux queries with a few simple principles.
Query optimizations center around reducing the memory and compute (CPU) requirements
of a query by changing function order, understanding how functions work, and being
conscious of the nature of the data queried.
Optimize your Flux queries to reduce their memory and compute (CPU) requirements.
- [Start queries with pushdown functions](#start-queries-with-pushdown-functions)
- [Avoid short window durations](#avoid-short-window-durations)
- [Use "heavy" functions sparingly](#use-heavy-functions-sparingly)
- [Balance time range and data precision](#balance-time-range-and-data-precision)
## Start queries with pushdown functions
Certain Flux functions can push their data manipulation down to the underlying
data source rather than pulling the data into memory and manipulating it there.
Using "pushdown" functions reduces the amount of memory necessary to run a query.
However, to benefit from these performance gains, you must **use pushdown functions
at the beginning of your query**.
When a non-pushdown function runs, Flux pulls the data into memory and manipulates the data there.
All subsequent functions must operate in memory, including pushdown-capable functions.
Some Flux functions can push their data manipulation down to the underlying
data source rather than storing and manipulating data in memory.
These are known as "pushdown" functions and using them correctly can greatly
reduce the amount of memory necessary to run a query.
#### Pushdown functions
- [range()](/v2.0/reference/flux/stdlib/built-in/transformations/range/)
- [filter()](/v2.0/reference/flux/stdlib/built-in/transformations/filter/)
- [group()](/v2.0/reference/flux/stdlib/built-in/transformations/group/)
Use pushdown functions at the beginning of your query.
Once a non-pushdown function runs, Flux pulls data into memory and runs all
subsequent operations there.
##### Pushdown functions in use
```js
from(bucket: "example-bucket")
@ -42,20 +44,15 @@ from(bucket: "example-bucket")
|> top(n: 10) //
```
## Don't over-window data
## Avoid short window durations
Windowing (grouping data based on time intervals) is commonly used to aggregate and downsample data.
It's important to not "over-window" your data by using short window durations.
The more windows Flux creates, the more compute power it needs to evaluate which
window each row should be assigned to.
Increase performance by avoiding short window durations.
More windows require more compute power to evaluate which window each row should be assigned to.
Reasonable window durations depend on the total time range queried.
## Use "heavy" functions sparingly
Some Flux functions are known to use more memory or CPU than others.
These provide vital functionality to Flux and your data processing workflow,
but use them only when necessary.
The following functions are known to be "heavy:"
The following functions use more memory or CPU than others.
Consider their necessity in your data processing before using them:
- [map()](/v2.0/reference/flux/stdlib/built-in/transformations/map/)
- [reduce()](/v2.0/reference/flux/stdlib/built-in/transformations/aggregates/reduce/)
@ -65,18 +62,13 @@ The following functions are known to be "heavy:"
- [pivot()](/v2.0/reference/flux/stdlib/built-in/transformations/pivot/)
{{% note %}}
Flux engineers are in the process of optimizing functions.
This list may not represent the current state of Flux and will be updated over time.
We're continually optimizing Flux and this list may not represent its current state.
{{% /note %}}
## Balance time range vs data precision
To ensure queries are performant, be sure to balance the time range of your query
with the precision of your data.
For example, if you query data with values stored every second and you request
six months worth of data, results will include a minimum of ≈15.5 million points.
Flux has to store that data in memory as it generates a response.
## Balance time range and data precision
To ensure queries are performant, balance the time range and the precision of your data.
For example, if you query data stored every second and request six months worth of data,
results will include a minimum of ≈15.5 million points.
Flux must store these points in memory to generate a response.
To query data over large periods of time, consider creating a task to
[downsample high-resolution data](/v2.0/process-data/common-tasks/downsample-data/)
into lower resolution data.
Then query the low-resolution data using larger time ranges.
To query data over large periods of time, create a task to [downsample data](/v2.0/process-data/common-tasks/downsample-data/), and then query the downsampled data instead.