Strategies to improve query performance: operations, number and size … (#5215)
* Strategies to improve query performance: operations, number and size of parquet files Fixes #5108 - Add query performance strategies to optimize-queries * Apply suggestions from code review * Apply suggestions from code review * Update content/influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries.md Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore(v3): WIP query perf Fixes #5108 * chore(v3): WIP: Query performance * chore(v3): WIP: Explain the EXPLAIN report and indicators of query expense, performance problems. * WIP: optimize queries - how to read a query plan, operators * WIP: Read a query plan example 2 * WIP: moved how to read a query plan to its own page. * WIP(v3): operators * chore(v3): WIP add query plan info from DataFusion slides and @NGA-TRAN * chore(v3): WIP read a query plan - explain tree format and reorganize * WIP: query plan - Adds Query Plan reference - Completes Analyze a Query Plan, pending cleanup, continue at :471 - Added image from Nga's blog - Updates EXPLAIN doc - TODO: Create public docs for https://github.com/influxdata/docs.influxdata.io/blob/main/content/operations/specifications/iox_runbooks/slow-queries.md * chore(spelling): Vale config changes - Add vale to package.json and use Yarn to manage the binary. You can use `npx vale` to run manually. - Move InfluxData spelling ignore list into the style. - Reorganize custom (product) spelling lists to comply with Vale 3.x - Add InfluxDB v3 terms * chore(spelling): Vale config changes - Add vale to package.json and use Yarn to manage the binary. You can use `npx vale` to run manually. - Move InfluxData spelling ignore list into the style. - Reorganize custom (product) spelling lists to comply with Vale 3.x - Add InfluxDB v3 terms * chore(v3): Reorg of query troubleshooting and optimizing docs - Adds query-data/troubleshoot-and-optimize - Splits optimize docs into troubleshoot and optimize docs - Moves Flight response doc to flight-responses.md * chore: Fixes broken links, typos, missing content, etc. - Fixes various errors and style violations reported by Vale. - Fixes broken links and missing content in glossaries. - Fixes missing and extraneous whitespace. * Apply suggestions from code review * Update content/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/analyze-query-plan.md * chore(clustered): Query plan: Apply review suggestions Co-Authored-By: Nga Tran <20850014+NGA-TRAN@users.noreply.github.com> * feature(v3): Analyze a query plan: - Apply code formatting to plan implementor names - Simplify some points - Add links * add query plan html diagram (#5365) * Update content/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/analyze-query-plan.md Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> * Update content/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/optimize-queries.md Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> * Update content/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/trace.md Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> * Update content/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/trace.md Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> * Update content/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/trace.md Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> * Update content/influxdb/cloud-serverless/query-data/troubleshoot-and-optimize/analyze-query-plan.md Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> * fix(v3): finish the EXPLAIN descriptions and examples * chore(tests): Setup a python venv in test containers * fix(ci): Vale vocab * fix(v3): Punctuation typo * chore(ci): Update README * fix(v3): Apply review suggestions and capitalization Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> * fix(v3): Add note to optimize page and revise troubleshoot * fix(v3): optimize-queries link --------- Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Nga Tran <20850014+NGA-TRAN@users.noreply.github.com> Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>pull/5367/head
parent
259e8e67b4
commit
db059f99b0
|
@ -6,7 +6,40 @@ ignorecase: true
|
|||
swap:
|
||||
# NOTE: The left-hand (bad) side can match the right-hand (good) side; Vale
|
||||
# will ignore any alerts that match the intended form.
|
||||
"FlightSQL": Flight SQL
|
||||
"anaconda": Anaconda
|
||||
"(?i)api": API
|
||||
"arrow": Arrow
|
||||
"authtoken": authToken
|
||||
"Authtoken": AuthToken
|
||||
"chronograf": Chronograf
|
||||
"cli": CLI
|
||||
"(?i)clockface": Clockface
|
||||
"the\b[?compactor": the Compactor
|
||||
"data explorer": Data Explorer
|
||||
"datetime": dateTime
|
||||
"dedupe": deduplicate
|
||||
"(?i)executionplan": ExecutionPlan
|
||||
"fieldkey": fieldKey
|
||||
"fieldtype": fieldType
|
||||
"flight": Flight
|
||||
"(?i)flightquery": FlightQuery
|
||||
"(?i)FlightSQL": Flight SQL
|
||||
"/b(?i)influxdata/b": InfluxData
|
||||
"/w*/b(?i)influxdb": InfluxDB
|
||||
"(?i)influxql": InfluxQL
|
||||
"influxer": Influxer
|
||||
"the\b[?ingester": the Ingester
|
||||
"(?i)iox": v3
|
||||
"java[ -]?scripts?": JavaScript
|
||||
"SQL Alchemy": SQLAlchemy
|
||||
"kapa": Kapacitor
|
||||
"logicalplan": LogicalPlan
|
||||
"the\b[?object store": the Object store
|
||||
"a {{% product-name %}}": an {{% product-name %}}
|
||||
"Pandas": pandas
|
||||
" parquet": Parquet
|
||||
"the\b[?querier": the Querier
|
||||
"SQL Alchemy": SQLAlchemy
|
||||
"superset": Superset
|
||||
"tagkey": tagKey
|
||||
"telegraf": Telegraf
|
||||
"telegraph": Telegraf
|
||||
|
|
|
@ -6,6 +6,31 @@ scope:
|
|||
- ~table.cell
|
||||
ignore:
|
||||
# Located at StylesPath/ignore1.txt
|
||||
- InfluxDataDocs/Terms/influxdb.txt
|
||||
- InfluxDataDocs/Terms/configuration-terms.txt
|
||||
- InfluxDataDocs/Terms/query-functions.txt
|
||||
|
||||
- InfluxDataDocs/Terms/telegraf.txt
|
||||
filters:
|
||||
# Ignore Hugo, layout, and design words.
|
||||
- 'Flexbox'
|
||||
- '(?i)frontmatter'
|
||||
- '(?i)shortcode(s?)'
|
||||
- '(?i)tooltip(s?)'
|
||||
# Ignore all words starting with 'py'.
|
||||
# e.g., 'PyYAML'.
|
||||
- '[pP]y.*\b'
|
||||
# Ignore underscore-delimited words.
|
||||
# e.g., avg_temp
|
||||
- '\b\w+_\w+\b'
|
||||
- '\b_\w+\b'
|
||||
# Ignore SQL variables.
|
||||
- '(?i)AS \w+'
|
||||
# Ignore custom words
|
||||
- '(?i)deduplicat(ion|e|ed|es|ing)'
|
||||
- '(?i)downsampl(e|ing|ed|es)'
|
||||
- 'InfluxDB-specific'
|
||||
- '(?i)repartition(ed|s|ing)'
|
||||
- '(?i)subcommand(s?)'
|
||||
- '(?i)union(ing|ed|s)?'
|
||||
- 'unsignedLong'
|
||||
- 'US (East|West|Central|North|South|Northeast|Northwest|Southeast|Southwest)'
|
||||
|
|
|
@ -0,0 +1,126 @@
|
|||
api
|
||||
apis
|
||||
args
|
||||
authtoken
|
||||
authz
|
||||
boolean
|
||||
booleans
|
||||
bundler
|
||||
bundlers
|
||||
chronograf
|
||||
cli
|
||||
clockface
|
||||
cloud
|
||||
codeblock
|
||||
compactor
|
||||
conda
|
||||
csv
|
||||
dashboarding
|
||||
datagram
|
||||
datasource
|
||||
datetime
|
||||
deserialize
|
||||
downsample
|
||||
dotenv
|
||||
enum
|
||||
executionplan
|
||||
fieldkey
|
||||
fieldtype
|
||||
file_groups
|
||||
flighquery
|
||||
Grafana
|
||||
groupId
|
||||
gzip
|
||||
gzipped
|
||||
homogenous
|
||||
hostname
|
||||
hostUrl
|
||||
hostURL
|
||||
HostURL
|
||||
implementor
|
||||
implementors
|
||||
influxctl
|
||||
influxd
|
||||
influxdata.com
|
||||
influx3
|
||||
ingester
|
||||
ingesters
|
||||
iox
|
||||
kapacitor
|
||||
lat
|
||||
locf
|
||||
logicalplan
|
||||
logstash
|
||||
lon
|
||||
lookahead
|
||||
lookbehind
|
||||
metaquery
|
||||
metaqueries
|
||||
middleware
|
||||
namespace
|
||||
noaa
|
||||
npm
|
||||
oauth
|
||||
output_ordering
|
||||
pandas
|
||||
param
|
||||
performant
|
||||
projection
|
||||
protofiles
|
||||
pushdown
|
||||
querier
|
||||
rearchitect
|
||||
rearchitected
|
||||
redoc
|
||||
remediations
|
||||
repartition
|
||||
retention_policy
|
||||
retryable
|
||||
rp
|
||||
serializable
|
||||
serializer
|
||||
serverless
|
||||
shortcode
|
||||
signout
|
||||
Splunk
|
||||
SQLAlchemy
|
||||
stderr
|
||||
stdout
|
||||
subcommand
|
||||
subcommands
|
||||
subnet
|
||||
subnets
|
||||
subprocessor
|
||||
subprocessors
|
||||
subquery
|
||||
subqueries
|
||||
substring
|
||||
substrings
|
||||
superset
|
||||
svg
|
||||
syntaxes
|
||||
tagkey
|
||||
tagset
|
||||
telegraf
|
||||
telegraf's
|
||||
tombstoned
|
||||
tsm
|
||||
uint
|
||||
uinteger
|
||||
unescaped
|
||||
ungroup
|
||||
ungrouped
|
||||
unprocessable
|
||||
unix
|
||||
unmarshal
|
||||
unmarshalled
|
||||
unpackage
|
||||
upsample
|
||||
upsert
|
||||
urls
|
||||
venv
|
||||
VSCode
|
||||
WALs
|
||||
Webpack
|
||||
xpath
|
||||
XPath
|
|
@ -21,6 +21,7 @@ begin
|
|||
between
|
||||
bit
|
||||
bit_length
|
||||
bitwise
|
||||
both
|
||||
by
|
||||
cascade
|
||||
|
@ -314,6 +315,10 @@ to_timestamp_micros
|
|||
to_timestamp_millis
|
||||
to_timestamp_seconds
|
||||
|
||||
# SQL_DATE_TIME_KEYWORDS
|
||||
dow
|
||||
doy
|
||||
|
||||
# SQL_INFO_SYSTEM_FUNCTIONS
|
||||
# Source: https://github.com/influxdata/influxdb_iox/blob/4f9c901dcfece5fcc4d17cfecb6ec45a0dccda5a/flightsql/src/sql_info
|
||||
array
|
||||
|
@ -328,21 +333,38 @@ cos
|
|||
tan
|
||||
asin
|
||||
acos
|
||||
acosh
|
||||
asinh
|
||||
atan
|
||||
atanh
|
||||
atan2
|
||||
cbrt
|
||||
exp
|
||||
gcd
|
||||
isnan
|
||||
iszero
|
||||
lcm
|
||||
log
|
||||
ln
|
||||
log2
|
||||
log10
|
||||
nanvl
|
||||
sqrt
|
||||
pow
|
||||
floor
|
||||
ceil
|
||||
round
|
||||
|
||||
# InfluxQL operators
|
||||
bitfield
|
||||
|
||||
# is_aggregate_function
|
||||
# Source: https://github.com/influxdata/influxdb_iox/blob/4f9c901dcfece5fcc4d17cfecb6ec45a0dccda5a/influxdb_influxql_parser/src/functions.rs
|
||||
approx_distinct
|
||||
approx_median
|
||||
approx_percentile_cont
|
||||
approx_percentile_cont_with_weight
|
||||
covar
|
||||
cumulative_sum
|
||||
derivative
|
||||
difference
|
||||
|
|
|
@ -0,0 +1 @@
|
|||
[Tt]elegraf
|
|
@ -41,7 +41,7 @@ swap:
|
|||
cellular network: mobile network
|
||||
chapter: documents|pages|sections
|
||||
check box: checkbox
|
||||
check: select
|
||||
# check: select
|
||||
# CLI: command-line tool
|
||||
click on: click|click in
|
||||
# Cloud: Google Cloud Platform|GCP
|
||||
|
|
|
@ -1,96 +0,0 @@
|
|||
\b.*_.*\b
|
||||
Anaconda
|
||||
APIs?
|
||||
Arrow
|
||||
authToken
|
||||
AuthToken
|
||||
[Bb]oolean
|
||||
bundlers?
|
||||
[Cc]hronograf
|
||||
CLI
|
||||
Clockface
|
||||
[Cc]loud
|
||||
codeblock
|
||||
conda
|
||||
csv
|
||||
CSV
|
||||
Data Explorer
|
||||
dashboarding
|
||||
datasource
|
||||
dateTime
|
||||
deserialize
|
||||
[Dd]ownsampl.*\b
|
||||
dotenv
|
||||
enum
|
||||
Flight
|
||||
FlightQuery
|
||||
Grafana
|
||||
groupId
|
||||
gzip(ped)?
|
||||
homogenous
|
||||
hostname
|
||||
hostUrl
|
||||
hostURL
|
||||
HostURL
|
||||
influxctl
|
||||
[Ii]nflux[Dd]ata
|
||||
influxdb?
|
||||
InfluxDB
|
||||
influxql
|
||||
InfluxQL
|
||||
influx3
|
||||
iox
|
||||
IOx
|
||||
Kapacitor
|
||||
lat
|
||||
locf
|
||||
[Ll]ogstash
|
||||
lon
|
||||
lookahead
|
||||
lookbehind
|
||||
middleware
|
||||
namespace
|
||||
noaa
|
||||
NOAA
|
||||
npm
|
||||
OAuth
|
||||
pandas
|
||||
performant
|
||||
pushdown
|
||||
pyarrow
|
||||
Py.*\b
|
||||
pyinflux
|
||||
rearchitect(ed)?
|
||||
Redoc
|
||||
retention_policy
|
||||
rp
|
||||
serializable
|
||||
serializer
|
||||
[Ss]erverless
|
||||
shortcode
|
||||
Splunk
|
||||
SQLAlchemy
|
||||
stdout
|
||||
subnet
|
||||
subquer(y|ies)
|
||||
substring
|
||||
Superset
|
||||
svg
|
||||
tagset
|
||||
[Tt]elegraf
|
||||
[Tt]ombstoned
|
||||
tsm|TSM
|
||||
uint|UINT
|
||||
uinteger
|
||||
unescaped
|
||||
unprocessable
|
||||
unix
|
||||
upsample
|
||||
upsert
|
||||
urls
|
||||
venv
|
||||
VSCode
|
||||
WALs?
|
||||
Webpack
|
||||
xpath
|
||||
XPath
|
|
@ -1 +0,0 @@
|
|||
Pandas
|
|
@ -0,0 +1 @@
|
|||
clustered
|
|
@ -0,0 +1,6 @@
|
|||
API token
|
||||
bucket name
|
||||
Cloud Dedicated
|
||||
cloud-dedicated
|
||||
Cloud Serverless
|
||||
cloud-serverless
|
|
@ -1,10 +1,13 @@
|
|||
# Lint cloud-dedicated
|
||||
docspath=.
|
||||
contentpath=$docspath/content
|
||||
vale --config=$contentpath/influxdb/cloud-dedicated/.vale.ini --output=line --relative --minAlertLevel=error $contentpath/influxdb/cloud-dedicated
|
||||
npx vale --config=$contentpath/influxdb/cloud-dedicated/.vale.ini --output=line --relative --minAlertLevel=error $contentpath/influxdb/cloud-dedicated
|
||||
|
||||
# Lint cloud-serverless
|
||||
vale --config=$contentpath/influxdb/cloud-serverless/.vale.ini --output=line --relative --minAlertLevel=error $contentpath/influxdb/cloud-serverless
|
||||
npx vale --config=$contentpath/influxdb/cloud-serverless/.vale.ini --output=line --relative --minAlertLevel=error $contentpath/influxdb/cloud-serverless
|
||||
|
||||
# Lint clustered
|
||||
npx vale --config=$contentpath/influxdb/clustered/.vale.ini --output=line --relative --minAlertLevel=error $contentpath/influxdb/clustered
|
||||
|
||||
# Lint telegraf
|
||||
# vale --config=$docspath/.vale.ini --output=line --relative --minAlertLevel=error $contentpath/telegraf
|
|
@ -1,7 +1,5 @@
|
|||
StylesPath = ".ci/vale/styles"
|
||||
|
||||
Vocab = InfluxData
|
||||
|
||||
MinAlertLevel = warning
|
||||
|
||||
Packages = Google, Hugo
|
||||
|
|
|
@ -10,7 +10,7 @@ What constitutes a "substantial" change is at the discretion of InfluxData docum
|
|||
|
||||
_**Note:** Typo and broken link fixes are greatly appreciated and do not require signing the CLA._
|
||||
|
||||
*If you're new to contributing or you're looking for an easy update, check out our [good-first-issues](https://github.com/influxdata/docs-v2/issues?q=is%3Aissue+is%3Aopen+label%3Agood-first-issue).*
|
||||
*If you're new to contributing or you're looking for an easy update, see [`docs-v2` good-first-issues](https://github.com/influxdata/docs-v2/issues?q=is%3Aissue+is%3Aopen+label%3Agood-first-issue).*
|
||||
|
||||
## Make suggested updates
|
||||
|
||||
|
@ -22,16 +22,39 @@ _**Note:** Typo and broken link fixes are greatly appreciated and do not require
|
|||
To run the documentation locally, follow the instructions provided in the README.
|
||||
|
||||
### Install and run Vale
|
||||
Use the [Vale](https://vale.sh/) style linter to check spelling and enforce style guidelines.
|
||||
To install Vale, follow the instructions to install the [Vale CLI](https://vale.sh/docs/vale-cli/installation/) for your system and the [integration](https://vale.sh/docs/integrations/guide/) for your editor.
|
||||
|
||||
The `docs-v2` repository contains `.vale.ini` files that configure InfluxData spelling and style rules used by the [Vale CLI](https://vale.sh/docs/vale-cli/installation/) and editor extensions, such as [Vale VSCode](https://marketplace.visualstudio.com/items?itemName=ChrisChinchilla.vale-vscode).
|
||||
When run (with the CLI or an editor extension) Vale searches for a `.vale.ini` file in the directory of the file being linted.
|
||||
Use the [Vale](https://vale.sh/) style linter for spellchecking and enforcing style guidelines.
|
||||
The docs-v2 `package.json` includes a Vale dependency that installs the Vale binary when you run `yarn`.
|
||||
After you use `yarn` to install Vale, you can run `npx vale` to execute Vale commands.
|
||||
|
||||
To lint multiple directories with specified configuration files and generate a report, run the `.ci/vale/vale.sh` script.
|
||||
_To install Vale globally or use a different package manager, follow the [Vale CLI installation](https://vale.sh/docs/vale-cli/installation/) for your system._
|
||||
|
||||
#### Integrate with your editor
|
||||
|
||||
To integrate Vale with VSCode:
|
||||
|
||||
1. Install the [Vale VSCode](https://marketplace.visualstudio.com/items?itemName=ChrisChinchilla.vale-vscode) extension.
|
||||
2. In the extension settings, set the `Vale:Vale CLI:Path` value to the path of your Vale binary.
|
||||
Use the path `${workspaceFolder}/node_modules/.bin/vale` for the Vale binary that you installed with Yarn.
|
||||
|
||||
To use with an editor other than VSCode, see the [Vale integration guide](https://vale.sh/docs/integrations/guide/).
|
||||
|
||||
#### Lint product directories
|
||||
|
||||
The `docs-v2` repository includes a shell script that lints product directories using the `InfluxDataDocs` style rules and product-specific vocabularies, and then generates a report.
|
||||
To run the script, enter the following command in your terminal:
|
||||
|
||||
```sh
|
||||
sh .ci/vale/vale.sh
|
||||
```
|
||||
|
||||
#### Configure style rules
|
||||
|
||||
The `docs-v2` repository contains `.vale.ini` files that configure a custom `InfluxDataDocs` style with spelling and style rules.
|
||||
When you run `vale <file path>` (from the CLI or an editor extension), it searches for a `.vale.ini` file in the directory of the file being linted.
|
||||
|
||||
`docs-v2` style rules are located at `.ci/vale/styles/`.
|
||||
The easiest way to add accepted or rejected spellings is to enter your terms (or regular expression patterns) into the Vocabulary files at `.ci/vale/styles/Vocab`.
|
||||
The easiest way to add accepted or rejected spellings is to enter your terms (or regular expression patterns) into the Vocabulary files at `.ci/vale/styles/config/vocabularies`.
|
||||
|
||||
To learn more about configuration and rules, see [Vale configuration](https://vale.sh/docs/topics/config).
|
||||
|
||||
|
@ -46,17 +69,17 @@ Push your changes up to your forked repository, then [create a new pull request]
|
|||
### Markdown
|
||||
All of our documentation is written in [Markdown](https://en.wikipedia.org/wiki/Markdown).
|
||||
|
||||
### Semantic Linefeeds
|
||||
Use [semantic linefeeds](http://rhodesmill.org/brandon/2012/one-sentence-per-line/).
|
||||
### Semantic line feeds
|
||||
Use [semantic line feeds](http://rhodesmill.org/brandon/2012/one-sentence-per-line/).
|
||||
Separating each sentence with a new line makes it easy to parse diffs with the human eye.
|
||||
|
||||
**Diff without semantic linefeeds:**
|
||||
**Diff without semantic line feeds:**
|
||||
``` diff
|
||||
-Data is taking off. This data is time series. You need a database that specializes in time series. You should check out InfluxDB.
|
||||
+Data is taking off. This data is time series. You need a database that specializes in time series. You need InfluxDB.
|
||||
```
|
||||
|
||||
**Diff with semantic linefeeds:**
|
||||
**Diff with semantic line feeds:**
|
||||
``` diff
|
||||
Data is taking off.
|
||||
This data is time series.
|
||||
|
@ -397,8 +420,7 @@ Provide the following arguments:
|
|||
```
|
||||
|
||||
### Tabbed Content
|
||||
Shortcodes are available for creating "tabbed" content (content that is changed by a users' selection).
|
||||
Ther following three must be used:
|
||||
To create "tabbed" content (content that is changed by a users' selection), use the following three shortcodes in combination:
|
||||
|
||||
`{{< tabs-wrapper >}}`
|
||||
This shortcode creates a wrapper or container for the tabbed content.
|
||||
|
@ -774,8 +796,8 @@ This is useful for maintaining and referencing sample code variants in their
|
|||
/shared/text/example1/example.py
|
||||
```
|
||||
|
||||
2. Include the files, e.g. in code tabs
|
||||
````md
|
||||
2. Include the files--for example, in code tabs:
|
||||
````md
|
||||
{{% code-tabs-wrapper %}}
|
||||
{{% code-tabs %}}
|
||||
[Javascript](#js)
|
||||
|
@ -792,7 +814,7 @@ This is useful for maintaining and referencing sample code variants in their
|
|||
```
|
||||
{{% /code-tab-content %}}
|
||||
{{% /code-tabs-wrapper %}}
|
||||
````
|
||||
````
|
||||
|
||||
#### Include specific files from the same directory
|
||||
To include the text from one file in another file in the same
|
||||
|
@ -861,8 +883,8 @@ The following table shows which children types use which frontmatter properties:
|
|||
### Inline icons
|
||||
The `icon` shortcode allows you to inject icons in paragraph text.
|
||||
It's meant to clarify references to specific elements in the InfluxDB user interface.
|
||||
This shortcode supports clockface (the UI) v2 and v3.
|
||||
Specify the version to use as the 2nd argument. The default version is `v3`.
|
||||
This shortcode supports Clockface (the UI) v2 and v3.
|
||||
Specify the version to use as the second argument. The default version is `v3`.
|
||||
|
||||
```
|
||||
{{< icon "icon-name" "v2" >}}
|
||||
|
@ -935,8 +957,8 @@ Below is a list of available icons (some are aliases):
|
|||
### InfluxDB UI left navigation icons
|
||||
In many cases, documentation references an item in the left nav of the InfluxDB UI.
|
||||
Provide a visual example of the navigation item using the `nav-icon` shortcode.
|
||||
This shortcode supports clockface (the UI) v2 and v3.
|
||||
Specify the version to use as the 2nd argument. The default version is `v3`.
|
||||
This shortcode supports Clockface (the UI) v2 and v3.
|
||||
Specify the version to use as the second argument. The default version is `v3`.
|
||||
|
||||
```
|
||||
{{< nav-icon "tasks" "v2" >}}
|
||||
|
@ -988,15 +1010,15 @@ The following options are available:
|
|||
- quarter
|
||||
|
||||
### Tooltips
|
||||
Use the `{{< tooltips >}}` shortcode to add tooltips to text.
|
||||
The **1st** argument is the text shown in the tooltip.
|
||||
The **2nd** argument is the highlighted text that triggers the tooltip.
|
||||
Use the `{{< tooltip >}}` shortcode to add tooltips to text.
|
||||
The **first** argument is the text shown in the tooltip.
|
||||
The **second** argument is the highlighted text that triggers the tooltip.
|
||||
|
||||
```md
|
||||
I like {{< tooltip "Butterflies are awesome!" "butterflies" >}}.
|
||||
```
|
||||
|
||||
The example above renders as "I like butterflies" with "butterflies" highlighted.
|
||||
The rendered output is "I like butterflies" with "butterflies" highlighted.
|
||||
When you hover over "butterflies," a tooltip appears with the text: "Butterflies are awesome!"
|
||||
|
||||
### Flux sample data tables
|
||||
|
|
|
@ -26,3 +26,7 @@ h2,h3,h4,h5,h6 {
|
|||
opacity: 1;
|
||||
}
|
||||
}
|
||||
|
||||
#query-plan-diagram + .caption {
|
||||
margin-top: 0;
|
||||
}
|
||||
|
|
|
@ -457,6 +457,82 @@ table tr.point{
|
|||
}
|
||||
}
|
||||
|
||||
////////////////////////////// QUERY PLAN DIAGRAM //////////////////////////////
|
||||
|
||||
#query-plan-diagram {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
font-size: 1rem;
|
||||
margin: 3rem 0 3.5rem;
|
||||
max-width: 800px;
|
||||
|
||||
.plan-column {
|
||||
padding: 0 .5rem;
|
||||
}
|
||||
|
||||
.plan-block {
|
||||
background: $article-code-bg;
|
||||
color: $article-code;
|
||||
text-align: center;
|
||||
padding: 1rem 1.5rem;
|
||||
border-radius: $radius * 2;
|
||||
}
|
||||
.plan-arrow {
|
||||
margin: .5rem auto;
|
||||
height: 1.5rem;
|
||||
width: 1px;
|
||||
border-left: 1px solid $article-code;
|
||||
position: relative;
|
||||
|
||||
&:before {
|
||||
content: "\25B2";
|
||||
position: absolute;
|
||||
top: .25rem;
|
||||
left: -.32rem;
|
||||
color: $article-code;
|
||||
line-height: 0;
|
||||
}
|
||||
&.split {
|
||||
width: 50%;
|
||||
margin-top: 2rem;
|
||||
height: 1rem;
|
||||
display: flex;
|
||||
justify-content: center;
|
||||
border-width: 1px 1px 0 1px;
|
||||
border-style: solid;
|
||||
border-color: $article-code;
|
||||
|
||||
&:before {
|
||||
position: relative;
|
||||
top: -1.25rem;
|
||||
left: -0.26rem;
|
||||
width: 0;
|
||||
margin-left: .2rem;
|
||||
}
|
||||
&:after {
|
||||
content: "";
|
||||
display: block;
|
||||
height: 1rem;
|
||||
width: 0;
|
||||
border-left: 1px solid $article-code;
|
||||
margin: -1rem 0;
|
||||
}
|
||||
}
|
||||
}
|
||||
.plan-single-column {
|
||||
display: flex;
|
||||
justify-content: center;
|
||||
}
|
||||
.plan-double-column {
|
||||
display: flex;
|
||||
justify-content: space-around;
|
||||
|
||||
.plan-column {
|
||||
// width: 50%;
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
///////////////////////////////// MEDIA QUERIES ////////////////////////////////
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
StylesPath = "../../../.ci/vale/styles"
|
||||
|
||||
Vocab = InfluxData, Cloud-Dedicated
|
||||
Vocab = Cloud-Dedicated
|
||||
|
||||
MinAlertLevel = warning
|
||||
|
||||
|
|
|
@ -1,442 +0,0 @@
|
|||
---
|
||||
title: Optimize queries
|
||||
description: >
|
||||
Optimize your SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
|
||||
weight: 401
|
||||
menu:
|
||||
influxdb_cloud_dedicated:
|
||||
name: Optimize queries
|
||||
parent: Execute queries
|
||||
influxdb/cloud-dedicated/tags: [query, sql, influxql]
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/query-data/sql/
|
||||
- /influxdb/cloud-dedicated/query-data/influxql/
|
||||
- /influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot/
|
||||
- /influxdb/cloud-dedicated/reference/client-libraries/v3/
|
||||
---
|
||||
|
||||
Use the following tools to help you identify performance bottlenecks and troubleshoot problems in queries:
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [EXPLAIN and ANALYZE](#explain-and-analyze)
|
||||
- [Enable trace logging](#enable-trace-logging)
|
||||
- [Avoid unnecessary tracing](#avoid-unnecessary-tracing)
|
||||
- [Syntax](#syntax)
|
||||
- [Example](#example)
|
||||
- [Tracing response header](#tracing-response-header)
|
||||
- [Trace response header syntax](#trace-response-header-syntax)
|
||||
- [Inspect Flight response headers](#inspect-flight-response-headers)
|
||||
- [Retrieve query information](#retrieve-query-information)
|
||||
|
||||
<!-- /TOC -->
|
||||
|
||||
## EXPLAIN and ANALYZE
|
||||
|
||||
To view the query engine's execution plan and metrics for an SQL or InfluxQL query, prepend [`EXPLAIN`](/influxdb/cloud-dedicated/reference/sql/explain/) or [`EXPLAIN ANALYZE`](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze) to the query.
|
||||
The report can reveal query bottlenecks such as a large number of table scans or parquet files, and can help triage the question, "Is the query slow due to the amount of work required or due to a problem with the schema, compactor, etc.?"
|
||||
|
||||
The following example shows how to use the InfluxDB v3 Python client library and pandas to view `EXPLAIN` and `EXPLAIN ANALYZE` results for a query:
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
|
||||
{{% code-placeholders "DATABASE_(NAME|TOKEN)" %}}
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
```python
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import pandas as pd
|
||||
import tabulate # Required for pandas.to_markdown()
|
||||
|
||||
# Instantiate an InfluxDB client.
|
||||
client = InfluxDBClient3(token = f"DATABASE_TOKEN",
|
||||
host = f"{{< influxdb/host >}}",
|
||||
database = f"DATABASE_NAME")
|
||||
|
||||
sql_explain = '''EXPLAIN
|
||||
SELECT temp
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '90 days'
|
||||
AND room = 'Kitchen'
|
||||
ORDER BY time'''
|
||||
|
||||
table = client.query(sql_explain)
|
||||
df = table.to_pandas()
|
||||
print(df.to_markdown(index=False))
|
||||
|
||||
assert df.shape == (2, 2), f'Expect {df.shape} to have 2 columns, 2 rows'
|
||||
assert 'physical_plan' in df.plan_type.values, "Expect physical_plan"
|
||||
assert 'logical_plan' in df.plan_type.values, "Expect logical_plan"
|
||||
```
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View EXPLAIN example results" %}}
|
||||
| plan_type | plan |
|
||||
|:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| logical_plan | Projection: home.temp |
|
||||
| | Sort: home.time ASC NULLS LAST |
|
||||
| | Projection: home.temp, home.time |
|
||||
| | TableScan: home projection=[room, temp, time], full_filters=[home.time >= TimestampNanosecond(1688676582918581320, None), home.room = Dictionary(Int32, Utf8("Kitchen"))] |
|
||||
| physical_plan | ProjectionExec: expr=[temp@0 as temp] |
|
||||
| | SortExec: expr=[time@1 ASC NULLS LAST] |
|
||||
| | EmptyExec: produce_one_row=false |
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
```python
|
||||
sql_explain_analyze = '''EXPLAIN ANALYZE
|
||||
SELECT *
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '90 days'
|
||||
ORDER BY time'''
|
||||
|
||||
table = client.query(sql_explain_analyze)
|
||||
df = table.to_pandas()
|
||||
print(df.to_markdown(index=False))
|
||||
|
||||
assert df.shape == (1,2)
|
||||
assert 'Plan with Metrics' in df.plan_type.values, "Expect plan metrics"
|
||||
|
||||
client.close()
|
||||
```
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database
|
||||
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-dedicated/admin/tokens/) with sufficient permissions to the specified database
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View EXPLAIN ANALYZE example results" %}}
|
||||
| plan_type | plan |
|
||||
|:------------------|:-----------------------------------------------------------------------------------------------------------------------|
|
||||
| Plan with Metrics | ProjectionExec: expr=[temp@0 as temp], metrics=[output_rows=0, elapsed_compute=1ns] |
|
||||
| | SortExec: expr=[time@1 ASC NULLS LAST], metrics=[output_rows=0, elapsed_compute=1ns, spill_count=0, spilled_bytes=0] |
|
||||
| | EmptyExec: produce_one_row=false, metrics=[]
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
## Enable trace logging
|
||||
|
||||
When you enable trace logging for a query, InfluxDB propagates your _trace ID_ through system processes and collects additional log information.
|
||||
|
||||
InfluxDB Support can then use the trace ID that you provide to filter, collate, and analyze log information for the query run.
|
||||
The tracing system follows the [OpenTelemetry traces](https://opentelemetry.io/docs/concepts/signals/traces/) model for providing observability into a request.
|
||||
|
||||
{{% warn %}}
|
||||
#### Avoid unnecessary tracing
|
||||
|
||||
Only enable tracing for a query when you need to request troubleshooting help from InfluxDB Support.
|
||||
To manage resources, InfluxDB has an upper limit for the number of trace requests.
|
||||
Too many traces can cause InfluxDB to evict log information.
|
||||
{{% /warn %}}
|
||||
|
||||
To enable tracing for a query, include the `influx-trace-id` header in your query request.
|
||||
|
||||
### Syntax
|
||||
|
||||
Use the following syntax for the `influx-trace-id` header:
|
||||
|
||||
```http
|
||||
influx-trace-id: TRACE_ID:1112223334445:0:1
|
||||
```
|
||||
|
||||
In the header value, replace the following:
|
||||
|
||||
- `TRACE_ID`: a unique string, 8-16 bytes long, encoded as hexadecimal (32 maximum hex characters).
|
||||
The trace ID should uniquely identify the query run.
|
||||
- `:1112223334445:0:1`: InfluxDB constant values (required, but ignored)
|
||||
|
||||
### Example
|
||||
|
||||
The following examples show how to create and pass a trace ID to enable query tracing in InfluxDB:
|
||||
|
||||
{{< tabs-wrapper >}}
|
||||
{{% tabs %}}
|
||||
[Python with FlightCallOptions](#)
|
||||
[Python with FlightClientMiddleware](#python-with-flightclientmiddleware)
|
||||
{{% /tabs %}}
|
||||
{{% tab-content %}}
|
||||
<!---- BEGIN PYTHON WITH FLIGHTCALLOPTIONS ---->
|
||||
Use the `InfluxDBClient3` InfluxDB Python client and pass the `headers` argument in the
|
||||
`query()` method.
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
|
||||
{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}}
|
||||
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
```python
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import secrets
|
||||
|
||||
def use_flightcalloptions_trace_header():
|
||||
print('# Use FlightCallOptions to enable tracing.')
|
||||
client = InfluxDBClient3(token=f"DATABASE_TOKEN",
|
||||
host=f"{{< influxdb/host >}}",
|
||||
database=f"DATABASE_NAME")
|
||||
|
||||
# Generate a trace ID for the query:
|
||||
# 1. Generate a random 8-byte value as bytes.
|
||||
# 2. Encode the value as hexadecimal.
|
||||
random_bytes = secrets.token_bytes(8)
|
||||
trace_id = random_bytes.hex()
|
||||
|
||||
# Append required constants to the trace ID.
|
||||
trace_value = f"{trace_id}:1112223334445:0:1"
|
||||
|
||||
# Encode the header key and value as bytes.
|
||||
# Create a list of header tuples.
|
||||
headers = [((b"influx-trace-id", trace_value.encode('utf-8')))]
|
||||
|
||||
sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'"
|
||||
influxql = "SELECT * FROM home WHERE time >= -90d"
|
||||
|
||||
# Use the query() headers argument to pass the list as FlightCallOptions.
|
||||
client.query(sql, headers=headers)
|
||||
|
||||
client.close()
|
||||
|
||||
use_flightcalloptions_trace_header()
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
<!---- END PYTHON WITH FLIGHTCALLOPTIONS ---->
|
||||
{{% /tab-content %}}
|
||||
{{% tab-content %}}
|
||||
<!---- BEGIN PYTHON WITH MIDDLEWARE ---->
|
||||
Use the `InfluxDBClient3` InfluxDB Python client and `flight.ClientMiddleware` to pass and inspect headers.
|
||||
|
||||
### Tracing response header
|
||||
|
||||
With tracing enabled and a valid trace ID in the request, InfluxDB's `DoGet` action response contains a header with the trace ID that you sent.
|
||||
|
||||
#### Trace response header syntax
|
||||
|
||||
```http
|
||||
trace-id: TRACE_ID
|
||||
```
|
||||
|
||||
### Inspect Flight response headers
|
||||
|
||||
To inspect Flight response headers when using a client library, pass a `FlightClientMiddleware` instance.
|
||||
that defines a middleware callback function for the `onHeadersReceived` event (the particular function name you use depends on the client library language).
|
||||
|
||||
The following example uses Python client middleware that adds request headers and extracts the trace ID from the `DoGet` response headers:
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
|
||||
{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}}
|
||||
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
```python
|
||||
import pyarrow.flight as flight
|
||||
|
||||
class TracingClientMiddleWareFactory(flight.ClientMiddleware):
|
||||
# Defines a custom middleware factory that returns a middleware instance.
|
||||
def __init__(self):
|
||||
self.request_headers = []
|
||||
self.response_headers = []
|
||||
self.traces = []
|
||||
|
||||
def addRequestHeader(self, header):
|
||||
self.request_headers.append(header)
|
||||
|
||||
def addResponseHeader(self, header):
|
||||
self.response_headers.append(header)
|
||||
|
||||
def addTrace(self, traceid):
|
||||
self.traces.append(traceid)
|
||||
|
||||
def createTrace(self, traceid):
|
||||
# Append InfluxDB constants to the trace ID.
|
||||
trace = f"{traceid}:1112223334445:0:1"
|
||||
|
||||
# To the list of request headers,
|
||||
# add a tuple with the header key and value as bytes.
|
||||
self.addRequestHeader((b"influx-trace-id", trace.encode('utf-8')))
|
||||
|
||||
def start_call(self, info):
|
||||
return TracingClientMiddleware(info.method, self)
|
||||
|
||||
class TracingClientMiddleware(flight.ClientMiddleware):
|
||||
# Defines middleware with client event callback methods.
|
||||
def __init__(self, method, callback_obj):
|
||||
self._method = method
|
||||
self.callback = callback_obj
|
||||
|
||||
def call_completed(self, exception):
|
||||
print('callback: call_completed')
|
||||
if(exception):
|
||||
print(f" ...with exception: {exception}")
|
||||
|
||||
def sending_headers(self):
|
||||
print('callback: sending_headers: ', self.callback.request_headers)
|
||||
if len(self.callback.request_headers) > 0:
|
||||
return dict(self.callback.request_headers)
|
||||
|
||||
def received_headers(self, headers):
|
||||
self.callback.addResponseHeader(headers)
|
||||
# For the DO_GET action, extract the trace ID from the response headers.
|
||||
if str(self._method) == "FlightMethod.DO_GET" and "trace-id" in headers:
|
||||
trace_id = headers["trace-id"][0]
|
||||
self.callback.addTrace(trace_id)
|
||||
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import secrets
|
||||
|
||||
def use_middleware_trace_header():
|
||||
print('# Use Flight client middleware to enable tracing.')
|
||||
|
||||
# Instantiate the middleware.
|
||||
res = TracingClientMiddleWareFactory()
|
||||
|
||||
# Instantiate the client, passing in the middleware instance that provides
|
||||
# event callbacks for the request.
|
||||
client = InfluxDBClient3(token=f"DATABASE_TOKEN",
|
||||
host=f"{{< influxdb/host >}}",
|
||||
database=f"DATABASE_NAME",
|
||||
flight_client_options={"middleware": (res,)})
|
||||
|
||||
# Generate a trace ID for the query:
|
||||
# 1. Generate a random 8-byte value as bytes.
|
||||
# 2. Encode the value as hexadecimal.
|
||||
random_bytes = secrets.token_bytes(8)
|
||||
trace_id = random_bytes.hex()
|
||||
|
||||
res.createTrace(trace_id)
|
||||
|
||||
sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'"
|
||||
|
||||
client.query(sql)
|
||||
client.close()
|
||||
assert trace_id in res.traces[0], "Expect trace ID in DoGet response."
|
||||
|
||||
use_middleware_trace_header()
|
||||
```
|
||||
{{% /code-placeholders %}}
|
||||
<!---- END PYTHON WITH MIDDLEWARE ---->
|
||||
{{% /tab-content %}}
|
||||
{{< /tabs-wrapper >}}
|
||||
|
||||
Replace the following:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database
|
||||
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-dedicated/admin/tokens/) with sufficient permissions to the specified database
|
||||
|
||||
{{% note %}}
|
||||
Store or log your query trace ID to ensure you can provide it to InfluxDB Support for troubleshooting.
|
||||
{{% /note %}}
|
||||
|
||||
After you run your query with tracing enabled, do the following:
|
||||
|
||||
- Remove the tracing header from subsequent runs of the query (to [avoid unnecessary tracing](#avoid-unnecessary-tracing)).
|
||||
- Provide the trace ID in a request to InfluxDB Support.
|
||||
|
||||
## Retrieve query information
|
||||
|
||||
In addition to the SQL standard `information_schema`, {{% product-name %}} contains _system_ tables that provide access to
|
||||
InfluxDB-specific information.
|
||||
The information in each system table is scoped to the namespace you're querying;
|
||||
you can only retrieve system information for that particular instance.
|
||||
|
||||
To get information about queries you've run on the current instance, use SQL to query the [`system.queries` table](/influxdb/cloud-dedicated/reference/internals/system-tables/#systemqueries-measurement), which contains information from the querier instance currently handling queries.
|
||||
If you [enabled trace logging for the query](#enable-trace-logging-for-a-query), the `trace-id` appears in the `system.queries.trace_id` column for the query.
|
||||
|
||||
The `system.queries` table is an InfluxDB v3 **debug feature**.
|
||||
To enable the feature and query `system.queries`, include an `"iox-debug"` header set to `"true"` and use SQL to query the table.
|
||||
|
||||
The following sample code shows how to use the Python client library to do the following:
|
||||
|
||||
1. Enable tracing for a query.
|
||||
2. Retrieve the trace ID record from `system.queries`.
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
|
||||
{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}}
|
||||
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
```python
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import secrets
|
||||
import pandas
|
||||
|
||||
def get_query_information():
|
||||
print('# Get query information')
|
||||
|
||||
client = InfluxDBClient3(token = f"DATABASE_TOKEN",
|
||||
host = f"{{< influxdb/host >}}",
|
||||
database = f"DATABASE_NAME")
|
||||
|
||||
random_bytes = secrets.token_bytes(16)
|
||||
trace_id = random_bytes.hex()
|
||||
trace_value = (f"{trace_id}:1112223334445:0:1").encode('utf-8')
|
||||
sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'"
|
||||
|
||||
try:
|
||||
client.query(sql, headers=[(b'influx-trace-id', trace_value)])
|
||||
client.close()
|
||||
except Exception as e:
|
||||
print("Query error: ", e)
|
||||
|
||||
client = InfluxDBClient3(token = f"DATABASE_TOKEN",
|
||||
host = f"{{< influxdb/host >}}",
|
||||
database = f"DATABASE_NAME")
|
||||
|
||||
import time
|
||||
df = pandas.DataFrame()
|
||||
|
||||
for i in range(0, 5):
|
||||
time.sleep(1)
|
||||
# Use SQL
|
||||
# To query the system.queries table for your trace ID, pass the following:
|
||||
# - the iox-debug: true request header
|
||||
# - an SQL query for the trace_id column
|
||||
reader = client.query(f'''SELECT compute_duration, query_type, query_text,
|
||||
success, trace_id
|
||||
FROM system.queries
|
||||
WHERE issue_time >= now() - INTERVAL '1 day'
|
||||
AND trace_id = '{trace_id}'
|
||||
ORDER BY issue_time DESC
|
||||
''',
|
||||
headers=[(b"iox-debug", b"true")],
|
||||
mode="reader")
|
||||
|
||||
df = reader.read_all().to_pandas()
|
||||
if df.shape[0]:
|
||||
break
|
||||
|
||||
assert df.shape == (1, 5), f"Expect a row for the query trace ID."
|
||||
print(df)
|
||||
|
||||
get_query_information()
|
||||
```
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
The output is similar to the following:
|
||||
|
||||
```text
|
||||
compute_duration query_type query_text success trace_id
|
||||
0 days sql SELECT compute_duration, quer... True 67338...
|
||||
```
|
|
@ -0,0 +1,23 @@
|
|||
---
|
||||
title: Troubleshoot and optimize queries
|
||||
description: >
|
||||
Troubleshoot errors and optimize performance for SQL and InfluxQL queries in InfluxDB.
|
||||
Use observability tools to view query execution and metrics.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_cloud_dedicated:
|
||||
name: Troubleshoot and optimize queries
|
||||
parent: Query data
|
||||
influxdb/cloud-dedicated/tags: [query, performance, observability, errors, sql, influxql]
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/query-data/sql/
|
||||
- /influxdb/cloud-dedicated/query-data/influxql/
|
||||
aliases:
|
||||
- /influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot/
|
||||
|
||||
---
|
||||
|
||||
Troubleshoot errors and optimize performance for SQL and InfluxQL queries in {{% product-name %}}.
|
||||
Use observability tools to view query execution and metrics.
|
||||
|
||||
{{< children >}}
|
|
@ -0,0 +1,771 @@
|
|||
---
|
||||
title: Analyze a query plan
|
||||
description: >
|
||||
Learn how to read and analyze a query plan to
|
||||
understand how a query is executed and find performance bottlenecks.
|
||||
weight: 401
|
||||
menu:
|
||||
influxdb_cloud_dedicated:
|
||||
name: Analyze a query plan
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/cloud-dedicated/tags: [query, sql, influxql, observability, query plan]
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/query-data/sql/
|
||||
- /influxdb/cloud-dedicated/query-data/influxql/
|
||||
- /influxdb/cloud-dedicated/reference/internals/query-plans/
|
||||
- /influxdb/cloud-dedicated/reference/internals/storage-engine
|
||||
---
|
||||
|
||||
Learn how to read and analyze a [query plan](/influxdb/cloud-dedicated/reference/glossary/#query-plan) to
|
||||
understand query execution steps and data organization, and find performance bottlenecks.
|
||||
|
||||
When you query InfluxDB v3, the Querier devises a query plan for executing the query.
|
||||
The engine tries to determine the optimal plan for the query structure and data.
|
||||
By learning how to generate and interpret reports for the query plan,
|
||||
you can better understand how the query is executed and identify bottlenecks that affect the performance of your query.
|
||||
|
||||
For example, if the query plan reveals that your query reads a large number of Parquet files,
|
||||
you can then take steps to [optimize your query](/influxdb/cloud-dedicated/query-data/optimize-queries/), such as add filters to read less data or
|
||||
configure your cluster to store fewer and larger files.
|
||||
|
||||
- [Use EXPLAIN keywords to view a query plan](#use-explain-keywords-to-view-a-query-plan)
|
||||
- [Read an EXPLAIN report](#read-an-explain-report)
|
||||
- [Read a query plan](#read-a-query-plan)
|
||||
- [Example physical plan for a SELECT - ORDER BY query](#example-physical-plan-for-a-select---order-by-query)
|
||||
- [Example `EXPLAIN` report for an empty result set](#example-explain-report-for-an-empty-result-set)
|
||||
- [Analyze a query plan for leading edge data](#analyze-a-query-plan-for-leading-edge-data)
|
||||
- [Sample data](#sample-data)
|
||||
- [Sample query](#sample-query)
|
||||
- [EXPLAIN report for the leading edge data query](#explain-report-for-the-leading-edge-data-query)
|
||||
- [Locate the physical plan](#locate-the-physical-plan)
|
||||
- [Read the physical plan](#read-the-physical-plan)
|
||||
- [Data scanning nodes (ParquetExec and RecordBatchesExec)](#data-scanning-nodes-parquetexec-and-recordbatchesexec)
|
||||
- [Analyze branch structures](#analyze-branch-structures)
|
||||
|
||||
## Use EXPLAIN keywords to view a query plan
|
||||
|
||||
Use the `EXPLAIN` keyword (and the optional [`ANALYZE`](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze) and [`VERBOSE`](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze-verbose) keywords) to view the query plans for a query.
|
||||
|
||||
{{% expand-wrapper %}}
|
||||
{{% expand "Use Python and pandas to view an EXPLAIN report" %}}
|
||||
|
||||
The following example shows how to use the InfluxDB v3 Python client library and pandas to view the `EXPLAIN` report for a query:
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
{{% code-placeholders "DATABASE_(NAME|TOKEN)" %}}
|
||||
|
||||
```python
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import pandas as pd
|
||||
import tabulate # Required for pandas.to_markdown()
|
||||
|
||||
# Instantiate an InfluxDB client.
|
||||
client = InfluxDBClient3(token = f"TOKEN",
|
||||
host = f"{{< influxdb/host >}}",
|
||||
database = f"DATABASE_NAME")
|
||||
|
||||
sql_explain = '''EXPLAIN
|
||||
SELECT temp
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '7 days'
|
||||
AND room = 'Kitchen'
|
||||
ORDER BY time'''
|
||||
|
||||
table = client.query(sql_explain)
|
||||
df = table.to_pandas()
|
||||
print(df.to_markdown(index=False))
|
||||
|
||||
assert df.shape == (2, 2), f'Expect {df.shape} to have 2 columns, 2 rows'
|
||||
assert 'physical_plan' in df.plan_type.values, "Expect physical_plan"
|
||||
assert 'logical_plan' in df.plan_type.values, "Expect logical_plan"
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database
|
||||
- {{% code-placeholder-key %}}`TOKEN`{{% /code-placeholder-key %}}: a [token](/influxdb/cloud-dedicated/admin/tokens/) with sufficient permissions to the specified database
|
||||
|
||||
{{% /expand %}}
|
||||
{{% /expand-wrapper %}}
|
||||
|
||||
## Read an EXPLAIN report
|
||||
|
||||
When you [use `EXPLAIN` keywords to view a query plan](#use-explain-keywords-to-view-a-query-plan), the report contains the following:
|
||||
|
||||
- two columns: `plan_type` and `plan`
|
||||
- one row for the [logical plan](/influxdb/cloud-dedicated/reference/internals/query-plans/#logical-plan) (`logical_plan`)
|
||||
- one row for the [physical plan](/influxdb/cloud-dedicated/reference/internals/query-plans/#physical-plan) (`physical_plan`)
|
||||
|
||||
## Read a query plan
|
||||
|
||||
Plans are in _tree format_--each plan is an upside-down tree in which
|
||||
execution and data flow from _leaf nodes_, the innermost steps in the plan, to outer _branch nodes_.
|
||||
Whether reading a logical or physical plan, keep the following in mind:
|
||||
|
||||
- Start at the the _leaf nodes_ and read upward.
|
||||
- At the top of the plan, the _root node_ represents the final, encompassing step.
|
||||
|
||||
In a [physical plan](/influxdb/cloud-dedicated/reference/internals/query-plan/#physical-plan), each step is an [`ExecutionPlan` node](/influxdb/cloud-dedicated/reference/internals/query-plan/#execution-plan-nodes) that receives expressions for input data and output requirements, and computes a partition of data.
|
||||
|
||||
Use the following steps to analyze a query plan and estimate how much work is required to complete the query.
|
||||
The same steps apply regardless of how large or complex the plan might seem.
|
||||
|
||||
1. Start from the furthest indented steps (the _leaf nodes_), and read upward.
|
||||
2. Understand the job of each [`ExecutionPlan` node](/influxdb/cloud-dedicated/reference/internals/query-plan/#executionplan-nodes)--for example, a [`UnionExec`](/influxdb/cloud-dedicated/reference/internals/query-plan/#unionexec) node encompassing the leaf nodes means that the `UnionExec` concatenates the output of all the leaves.
|
||||
3. For each expression, answer the following questions:
|
||||
- What is the shape and size of data input to the plan?
|
||||
- What is the shape and size of data output from the plan?
|
||||
|
||||
The remainder of this guide walks you through analyzing a physical plan.
|
||||
Understanding the sequence, role, input, and output of nodes in your query plan can help you estimate the overall workload and find potential bottlenecks in the query.
|
||||
|
||||
### Example physical plan for a SELECT - ORDER BY query
|
||||
|
||||
The following example shows how to read an `EXPLAIN` report and a physical query plan.
|
||||
|
||||
Given `h20` measurement data and the following query:
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;
|
||||
```
|
||||
|
||||
The output is similar to the following:
|
||||
|
||||
#### EXPLAIN report
|
||||
|
||||
```sql
|
||||
| plan_type | plan |
|
||||
+---------------+--------------------------------------------------------------------------+
|
||||
| logical_plan | Sort: h2o.city ASC NULLS LAST, h2o.time DESC NULLS FIRST |
|
||||
| | TableScan: h2o projection=[city, min_temp, time] |
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | UnionExec |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
Output from `EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;`
|
||||
{{% /caption %}}
|
||||
|
||||
Each step, or _node_, in the physical plan is an `ExecutionPlan` name and the key-value _expressions_ that contain relevant parts of the query--for example, the first node in the [`EXPLAIN` report](#explain-report) physical plan is a `ParquetExec` execution plan:
|
||||
|
||||
```text
|
||||
ParquetExec: file_groups={...}, projection=[city, min_temp, time]
|
||||
```
|
||||
|
||||
Because `ParquetExec` and `RecordBatchesExec` nodes retrieve and scan data in InfluxDB queries, every query plan starts with one or more of these nodes.
|
||||
|
||||
#### Physical plan data flow
|
||||
|
||||
Data flows _up_ in a query plan.
|
||||
|
||||
The following diagram shows the data flow and sequence of nodes in the [`EXPLAIN` report](#explain-report) physical plan:
|
||||
|
||||
<!-- BEGIN Query plan diagram -->
|
||||
{{< html-diagram/query-plan >}}
|
||||
<!-- END Query plan diagram -->
|
||||
|
||||
{{% caption %}}
|
||||
Execution and data flow in the [`EXPLAIN` report](#explain-report) physical plan.
|
||||
`ParquetExec` nodes execute in parallel and `UnionExec` combines their output.
|
||||
{{% /caption %}}
|
||||
|
||||
The following steps summarize the [physical plan execution and data flow](#physical-plan-data-flow):
|
||||
|
||||
1. Two `ParquetExec` plans, in parallel, read data from Parquet files:
|
||||
- Each `ParquetExec` node processes one or more _file groups_.
|
||||
- Each file group contains one or more Parquet file paths.
|
||||
- A `ParquetExec` node processes its groups in parallel, reading each group's files sequentially.
|
||||
- The output is a stream of data to the corresponding `SortExec` node.
|
||||
2. The `SortExec` nodes, in parallel, sort the data by `city` (ascending) and `time` (descending). Sorting is required by the `SortPreservingMergeExec` plan.
|
||||
3. The `UnionExec` node concatenates the streams to union the output of the parallel `SortExec` nodes.
|
||||
4. The `SortPreservingMergeExec` node merges the previously sorted and unioned data from `UnionExec`.
|
||||
|
||||
### Example `EXPLAIN` report for an empty result set
|
||||
|
||||
If your table doesn't contain data for the time range in your query, the physical plan starts with an `EmptyExec` leaf node--for example:
|
||||
|
||||
{{% code-callout "EmptyExec"%}}
|
||||
|
||||
```sql
|
||||
ProjectionExec: expr=[temp@0 as temp]
|
||||
SortExec: expr=[time@1 ASC NULLS LAST]
|
||||
EmptyExec: produce_one_row=false
|
||||
```
|
||||
|
||||
{{% /code-callout %}}
|
||||
|
||||
## Analyze a query plan for leading edge data
|
||||
|
||||
The following sections guide you through analyzing a physical query plan for a typical time series use case--aggregating recently written (_leading edge_) data.
|
||||
Although the query and plan are more complex than in the [preceding example](#example-physical-plan-for-a-select---order-by-query), you'll follow the same [steps to read the query plan](#read-a-query-plan).
|
||||
After learning how to read the query plan, you'll have an understanding of `ExecutionPlans`, data flow, and potential query bottlenecks.
|
||||
|
||||
### Sample data
|
||||
|
||||
Consider the following `h20` data, represented as "chunks" of line protocol, written to InfluxDB:
|
||||
|
||||
```text
|
||||
// h20 data
|
||||
// The following data represents 5 batches, or "chunks", of line protocol
|
||||
// written to InfluxDB.
|
||||
// - Chunks 1-4 are ingested and each is persisted to a separate partition file in storage.
|
||||
// - Chunk 5 is ingested and not yet persisted to storage.
|
||||
// - Chunks 1 and 2 cover short windows of time that don't overlap times in other chunks.
|
||||
// - Chunks 3 and 4 cover larger windows of time and the time ranges overlap each other.
|
||||
// - Chunk 5 contains the largest time range and overlaps with chunk 4, the Parquet file with the largest time-range.
|
||||
// - In InfluxDB, a chunk never duplicates its own data.
|
||||
//
|
||||
// Chunk 1: stored Parquet file
|
||||
// - time range: 50-249
|
||||
// - no duplicates in its own chunk
|
||||
// - no overlap with any other chunks
|
||||
[
|
||||
"h2o,state=MA,city=Bedford min_temp=71.59 150",
|
||||
"h2o,state=MA,city=Boston min_temp=70.4, 50",
|
||||
"h2o,state=MA,city=Andover max_temp=69.2, 249",
|
||||
],
|
||||
|
||||
// Chunk 2: stored Parquet file
|
||||
// - time range: 250-349
|
||||
// - no duplicates in its own chunk
|
||||
// - no overlap with any other chunks
|
||||
// - adds a new field (area)
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=79.0,max_temp=87.2,area=500u 300",
|
||||
"h2o,state=CA,city=SJ min_temp=75.5,max_temp=84.08 349",
|
||||
"h2o,state=MA,city=Bedford max_temp=78.75,area=742u 300",
|
||||
"h2o,state=MA,city=Boston min_temp=65.4 250",
|
||||
],
|
||||
|
||||
// Chunk 3: stored Parquet file
|
||||
// - time range: 350-500
|
||||
// - no duplicates in its own chunk
|
||||
// - overlaps chunk 4
|
||||
[
|
||||
"h2o,state=CA,city=SJ min_temp=77.0,max_temp=90.7 450",
|
||||
"h2o,state=CA,city=SJ min_temp=69.5,max_temp=88.2 500",
|
||||
"h2o,state=MA,city=Boston min_temp=68.4 350",
|
||||
],
|
||||
|
||||
// Chunk 4: stored Parquet file
|
||||
// - time range: 400-600
|
||||
// - no duplicates in its own chunk
|
||||
// - overlaps chunk 3
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600",
|
||||
"h2o,state=CA,city=SJ min_temp=69.5,max_temp=89.2 600", // duplicates row 3 in chunk 5
|
||||
"h2o,state=MA,city=Bedford max_temp=80.75,area=742u 400", // overlaps chunk 3
|
||||
"h2o,state=MA,city=Boston min_temp=65.40,max_temp=82.67 400", // overlaps chunk 3
|
||||
],
|
||||
|
||||
// Chunk 5: Ingester data
|
||||
// - time range: 550-700
|
||||
// - overlaps and duplicates data in chunk 4
|
||||
[
|
||||
"h2o,state=MA,city=Bedford max_temp=88.75,area=742u 600", // overlaps chunk 4
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 650",
|
||||
"h2o,state=CA,city=SJ min_temp=68.5,max_temp=90.0 600", // duplicates row 2 in chunk 4
|
||||
"h2o,state=CA,city=SJ min_temp=75.5,max_temp=84.08 700",
|
||||
"h2o,state=MA,city=Boston min_temp=67.4 550", // overlaps chunk 4
|
||||
]
|
||||
```
|
||||
|
||||
The following query selects all the data:
|
||||
|
||||
```sql
|
||||
SELECT state, city, min_temp, max_temp, area, time
|
||||
FROM h2o
|
||||
ORDER BY state asc, city asc, time desc;
|
||||
```
|
||||
|
||||
The output is the following:
|
||||
|
||||
```sql
|
||||
+-------+---------+----------+----------+------+--------------------------------+
|
||||
| state | city | min_temp | max_temp | area | time |
|
||||
+-------+---------+----------+----------+------+--------------------------------+
|
||||
| CA | SF | 68.4 | 85.7 | 500 | 1970-01-01T00:00:00.000000650Z |
|
||||
| CA | SF | 68.4 | 85.7 | 500 | 1970-01-01T00:00:00.000000600Z |
|
||||
| CA | SF | 79.0 | 87.2 | 500 | 1970-01-01T00:00:00.000000300Z |
|
||||
| CA | SJ | 75.5 | 84.08 | | 1970-01-01T00:00:00.000000700Z |
|
||||
| CA | SJ | 68.5 | 90.0 | | 1970-01-01T00:00:00.000000600Z |
|
||||
| CA | SJ | 69.5 | 88.2 | | 1970-01-01T00:00:00.000000500Z |
|
||||
| CA | SJ | 77.0 | 90.7 | | 1970-01-01T00:00:00.000000450Z |
|
||||
| CA | SJ | 75.5 | 84.08 | | 1970-01-01T00:00:00.000000349Z |
|
||||
| MA | Andover | | 69.2 | | 1970-01-01T00:00:00.000000249Z |
|
||||
| MA | Bedford | | 88.75 | 742 | 1970-01-01T00:00:00.000000600Z |
|
||||
| MA | Bedford | | 80.75 | 742 | 1970-01-01T00:00:00.000000400Z |
|
||||
| MA | Bedford | | 78.75 | 742 | 1970-01-01T00:00:00.000000300Z |
|
||||
| MA | Bedford | 71.59 | | | 1970-01-01T00:00:00.000000150Z |
|
||||
| MA | Boston | 67.4 | | | 1970-01-01T00:00:00.000000550Z |
|
||||
| MA | Boston | 65.4 | 82.67 | | 1970-01-01T00:00:00.000000400Z |
|
||||
| MA | Boston | 68.4 | | | 1970-01-01T00:00:00.000000350Z |
|
||||
| MA | Boston | 65.4 | | | 1970-01-01T00:00:00.000000250Z |
|
||||
| MA | Boston | 70.4 | | | 1970-01-01T00:00:00.000000050Z |
|
||||
+-------+---------+----------+----------+------+--------------------------------+
|
||||
```
|
||||
|
||||
### Sample query
|
||||
|
||||
The following query selects leading edge data from the [sample data](#sample-data):
|
||||
|
||||
```sql
|
||||
SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
The output is the following:
|
||||
|
||||
```sql
|
||||
+---------+-----------------+
|
||||
| city | COUNT(Int64(1)) |
|
||||
+---------+-----------------+
|
||||
| Andover | 1 |
|
||||
| Bedford | 3 |
|
||||
| Boston | 4 |
|
||||
+---------+-----------------+
|
||||
```
|
||||
|
||||
### EXPLAIN report for the leading edge data query
|
||||
|
||||
The following query generates the `EXPLAIN` report for the preceding [sample query](#sample-query):
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "EXPLAIN report for a leading edge data query" %}}
|
||||
|
||||
```sql
|
||||
| plan_type | plan |
|
||||
| logical_plan | Sort: h2o.city ASC NULLS LAST |
|
||||
| | Aggregate: groupBy=[[h2o.city]], aggr=[[COUNT(Int64(1))]] |
|
||||
| | TableScan: h2o projection=[city], full_filters=[h2o.time >= TimestampNanosecond(200, None), h2o.time < TimestampNanosecond(700, None), h2o.state = Dictionary(Int32, Utf8("MA"))] |
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST] |
|
||||
| | AggregateExec: mode=FinalPartitioned, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | RepartitionExec: partitioning=Hash([city@0], 4), input_partitions=4 |
|
||||
| | AggregateExec: mode=Partial, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=3 |
|
||||
| | UnionExec |
|
||||
| | ProjectionExec: expr=[city@0 as city] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | FilterExec: time@2 >= 200 AND time@2 < 700 AND state@1 = MA |
|
||||
| | ParquetExec: file_groups={2 groups: [[1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/243db601-f3f1-401b-afda-82160d8cc1a8.Parquet], [1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/f5fb7c7d-16ac-49ba-a811-69578d05843f.Parquet]]}, projection=[city, state, time], output_ordering=[state@1 ASC, city@0 ASC, time@2 ASC], predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA, pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3 |
|
||||
| | ProjectionExec: expr=[city@1 as city] |
|
||||
| | DeduplicateExec: [state@2 ASC,city@1 ASC,time@3 ASC] |
|
||||
| | SortPreservingMergeExec: [state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC] |
|
||||
| | UnionExec |
|
||||
| | SortExec: expr=[state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA |
|
||||
| | RecordBatchesExec: chunks=1, projection=[__chunk_order, city, state, time] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA |
|
||||
| | ParquetExec: file_groups={2 groups: [[1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/2cbb3992-4607-494d-82e4-66c480123189.Parquet], [1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/9255eb7f-2b51-427b-9c9b-926199c85bdf.Parquet]]}, projection=[__chunk_order, city, state, time], output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC], predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA, pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3 |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
`EXPLAIN` report for a typical leading edge data query
|
||||
{{% /caption %}}
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
The comments in the [sample data](#sample-data) tell you which data chunks _overlap_ or duplicate data in other chunks.
|
||||
Two chunks of data overlap if there are portions of time for which data exists in both chunks.
|
||||
_You'll learn how to [recognize overlapping and duplicate data](#recognize-overlapping-and-duplicate-data) in a query plan later in this guide._
|
||||
|
||||
Unlike the sample data, your data likely doesn't tell you where overlaps or duplicates exist.
|
||||
A physical plan can reveal overlaps and duplicates in your data and how they affect your queries--for example, after learning how to read a physical plan, you might summarize the data scanning steps as follows:
|
||||
|
||||
- Query execution starts with two `ParquetExec` and one `RecordBatchesExec` execution plans that run in parallel.
|
||||
- The first `ParquetExec` node reads two files that don't overlap any other files and don't duplicate data; the files don't require deduplication.
|
||||
- The second `ParquetExec` node reads two files that overlap each other and overlap the ingested data scanned in the `RecordBatchesExec` node; the query plan must include the deduplication process for these nodes before completing the query.
|
||||
|
||||
The remaining sections analyze `ExecutionPlan` node structure and arguments in the example physical plan.
|
||||
The example includes DataFusion and InfluxDB-specific [`ExecutionPlan` nodes](/influxdb/cloud-dedicated/reference/internals/query-plans/#executionplan-nodes).
|
||||
|
||||
### Locate the physical plan
|
||||
|
||||
To begin analyzing the physical plan for the query, find the row in the [`EXPLAIN` report](#explain-report-for-the-leading-edge-data-query) where the `plan_type` column has the value `physical_plan`.
|
||||
The `plan` column for the row contains the physical plan.
|
||||
|
||||
### Read the physical plan
|
||||
|
||||
The following sections follow the steps to [read a query plan](#read-a-query-plan) and examine the physical plan nodes and their input and output.
|
||||
|
||||
{{% note %}}
|
||||
To [read the execution flow of a query plan](#read-a-query-plan), always start from the innermost (leaf) nodes and read up toward the top outermost root node.
|
||||
{{% /note %}}
|
||||
|
||||
#### Physical plan leaf nodes
|
||||
|
||||
<img src="/img/influxdb/3-0-query-plan-tree.png" alt="Query physical plan leaf node structures" />
|
||||
|
||||
{{% caption %}}
|
||||
Leaf node structures in the physical plan
|
||||
{{% /caption %}}
|
||||
|
||||
### Data scanning nodes (ParquetExec and RecordBatchesExec)
|
||||
|
||||
The [example physical plan](#physical-plan-leaf-nodes) contains three [leaf nodes](#physical-plan-leaf-nodes)--the innermost nodes where the execution flow begins:
|
||||
|
||||
- [`ParquetExec`](/influxdb/cloud-dedicated/reference/internals/query-plan/#parquetexec) nodes retrieve and scan data from Parquet files in the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store)
|
||||
- a [`RecordBatchesExec`](/influxdb/cloud-dedicated/reference/internals/query-plan/#recordbatchesexec) node retrieves recently written, yet-to-be-persisted data from the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester)
|
||||
|
||||
Because `ParquetExec` and `RecordBatchesExec` retrieve and scan data for a query, every query plan starts with one or more of these nodes.
|
||||
|
||||
The number of `ParquetExec` and `RecordBatchesExec` nodes and their parameter values can tell you which data (and how much) is retrieved for your query, and how efficiently the plan handles the organization (for example, partitioning and deduplication) of your data.
|
||||
|
||||
For convenience, this guide uses the names _ParquetExec_A_ and _ParquetExec_B_ for the `ParquetExec` nodes in the [example physical plan](#physical-plan-leaf-nodes) .
|
||||
Reading from the top of the physical plan, **ParquetExec_A** is the first leaf node in the physical plan and **ParquetExec_B** is the last (bottom) leaf node.
|
||||
|
||||
_The names indicate the nodes' locations in the report, not their order of execution._
|
||||
|
||||
- [ParquetExec_A](#parquetexec_a)
|
||||
- [RecordBatchesExec](#recordbatchesexec)
|
||||
- [ParquetExec_B](#parquetexec_b)
|
||||
|
||||
#### ParquetExec_A
|
||||
|
||||
```sql
|
||||
ParquetExec: file_groups={2 groups: [[1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/243db601-f3f1-401b-afda-82160d8cc1a8.Parquet], [1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/f5fb7c7d-16ac-49ba-a811-69578d05843f.Parquet]]}, projection=[city, state, time], output_ordering=[state@1 ASC, city@0 ASC, time@2 ASC], predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA, pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3 |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
ParquetExec_A, the first ParquetExec node
|
||||
{{% /caption %}}
|
||||
|
||||
ParquetExec_A has the following traits:
|
||||
|
||||
##### `file_groups`
|
||||
|
||||
A _file group_ is a list of files for the operator to read.
|
||||
Files are referenced by path:
|
||||
|
||||
- `1/1/b862a7e9b.../243db601-....parquet`
|
||||
- `1/1/b862a7e9b.../f5fb7c7d-....parquet`
|
||||
|
||||
The path structure represents how your data is organized.
|
||||
You can use the file paths to gather more information about the query--for example:
|
||||
|
||||
- to find file information (for example: size and number of rows) in the catalog
|
||||
- to download the Parquet file from the Object store for debugging
|
||||
- to find how many partitions the query reads
|
||||
|
||||
A path has the following structure:
|
||||
|
||||
```text
|
||||
<namespace_id>/<table_id>/<partition_hash_id>/<uuid_of_the_file>.Parquet
|
||||
1 / 1 /b862a7e9b329ee6a4.../243db601-f3f1-4....Parquet
|
||||
```
|
||||
|
||||
- `namespace_id`: the namespace (database) being queried
|
||||
- `table_id`: the table (measurement) being queried
|
||||
- `partition_hash_id`: the partition this file belongs to.
|
||||
You can count partition IDs to find how many partitions the query reads.
|
||||
- `uuid_of_the_file`: the file identifier.
|
||||
|
||||
`ParquetExec` processes groups in parallel and reads the files in each group sequentially.
|
||||
|
||||
```text
|
||||
file_groups={2 groups: [[1/1/b862a7e9b329ee6a4/243db601....parquet], [1/1/b862a7e9b329ee6a4/f5fb7c7d....parquet]]}
|
||||
```
|
||||
|
||||
- `{2 groups: [[file], [file]}`: ParquetExec_A receives two groups with one file per group.
|
||||
Therefore, ParquetExec_A reads two files in parallel.
|
||||
|
||||
##### `projection`
|
||||
|
||||
`projection` lists the table columns for the `ExecutionPlan` to read and output.
|
||||
|
||||
```text
|
||||
projection=[city, state, time]
|
||||
```
|
||||
|
||||
- `[city, state, time]`: the [sample data](#sample-data) contains many columns, but the [sample query](#sample-query) requires the Querier to read only three
|
||||
|
||||
##### `output_ordering`
|
||||
|
||||
`output_ordering` specifies the sort order for the `ExecutionPlan` output.
|
||||
The Query planner passes the parameter if the output should be ordered and if the planner knows the order.
|
||||
|
||||
```text
|
||||
output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC]
|
||||
```
|
||||
|
||||
When storing data to Parquet files, InfluxDB sorts the data to improve storage compression and query efficiency and the planner tries to preserve that order for as long as possible.
|
||||
Generally, the `output_ordering` value that `ParquetExec` receives is the ordering (or a subset of the ordering) of stored data.
|
||||
|
||||
_By design, [`RecordBatchesExec`](#recordbatchesexec) data isn't sorted._
|
||||
|
||||
In the example, the planner specifies that ParquetExec_A use the existing sort order `state ASC, city ASC, time ASC,` for output.
|
||||
|
||||
{{% note %}}
|
||||
To view the sort order of your stored data, generate an `EXPLAIN` report for a `SELECT ALL` query--for example:
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT * FROM TABLE_NAME WHERE time > now() - interval '1 hour'
|
||||
```
|
||||
|
||||
Reduce the time range if the query returns too much data.
|
||||
{{% /note %}}
|
||||
|
||||
##### `predicate`
|
||||
|
||||
`predicate` is the data filter specified in the query.
|
||||
|
||||
```text
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA
|
||||
```
|
||||
|
||||
##### `pruning predicate`
|
||||
|
||||
`pruning_predicate` is created from the [`predicate`](#predicate) value and is the predicate actually used for pruning data and files from the chosen partitions.
|
||||
The default filters files by `time`.
|
||||
|
||||
```text
|
||||
pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3
|
||||
```
|
||||
|
||||
_Before the physical plan is generated, an additional `partition pruning` step uses predicates on partitioning columns to prune partitions._
|
||||
|
||||
#### `RecordBatchesExec`
|
||||
|
||||
```sql
|
||||
RecordBatchesExec: chunks=1, projection=[__chunk_order, city, state, time]
|
||||
```
|
||||
|
||||
{{% caption %}}RecordBatchesExec{{% /caption %}}
|
||||
|
||||
[`RecordBatchesExec`](/influxdb/cloud-dedicated/reference/internals/query-plans/#recordbatchesexec) is an InfluxDB-specific `ExecutionPlan` implementation that retrieves recently written, yet-to-be-persisted data from the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester).
|
||||
|
||||
In the example, `RecordBatchesExec` contains the following expressions:
|
||||
|
||||
##### `chunks`
|
||||
|
||||
`chunks` is the number of data chunks received from the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester).
|
||||
|
||||
```text
|
||||
chunks=1
|
||||
```
|
||||
|
||||
- `chunks=1`: `RecordBatchesExec` receives one data chunk.
|
||||
|
||||
##### `projection`
|
||||
|
||||
The `projection` list specifies the columns or expressions for the node to read and output.
|
||||
|
||||
```text
|
||||
[__chunk_order, city, state, time]
|
||||
```
|
||||
|
||||
- `__chunk_order`: orders chunks and files for deduplication
|
||||
- `city, state, time`: the same columns specified in [`ParquetExec_A projection`](#projection-1)
|
||||
|
||||
{{% note %}}
|
||||
The presence of `__chunk_order` in data scanning nodes indicates that data overlaps, and is possibly duplicated, among the nodes.
|
||||
{{% /note %}}
|
||||
|
||||
#### ParquetExec_B
|
||||
|
||||
The bottom leaf node in the [example physical plan](#physical-plan-leaf-nodes) is another `ParquetExec` operator, _ParquetExec_B_.
|
||||
|
||||
##### ParquetExec_B expressions
|
||||
|
||||
```sql
|
||||
ParquetExec:
|
||||
file_groups={2 groups: [[1/1/b862a7e9b.../2cbb3992-....Parquet],
|
||||
[1/1/b862a7e9b.../9255eb7f-....Parquet]]},
|
||||
projection=[__chunk_order, city, state, time],
|
||||
output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC],
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA,
|
||||
pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3
|
||||
```
|
||||
|
||||
{{% caption %}}ParquetExec_B, the second ParquetExec{{% /caption %}}
|
||||
|
||||
Because ParquetExec_B has overlaps, the `projection` and `output_ordering` expressions use the `__chunk_order` column used in [`RecordBatchesExec` `projection`](#projection-1).
|
||||
|
||||
{{% note %}}
|
||||
The presence of `__chunk_order` in data scanning nodes indicates that data overlaps, and is possibly duplicated, among the nodes.
|
||||
{{% /note %}}
|
||||
|
||||
The remaining ParquetExec_B expressions are similar to those in [ParquetExec_A](#parquetexec_a).
|
||||
|
||||
##### How a query plan distributes data for scanning
|
||||
|
||||
If you compare [`file_group`](#file_groups) paths in [ParquetExec_A](#parquetexec_a) to those in [ParquetExec_B](#parquetexec_b), you'll notice that both contain files from the same partition:
|
||||
|
||||
{{% code-callout "b862a7e9b329ee6a4..." %}}
|
||||
|
||||
```text
|
||||
1/1/b862a7e9b329ee6a4.../...
|
||||
```
|
||||
|
||||
{{% /code-callout %}}
|
||||
|
||||
The planner may distribute files from the same partition to different scan nodes for several reasons, including optimizations for handling [overlaps](#how-a-query-plan-distributes-data-for-scanning)--for example:
|
||||
|
||||
- to separate non-overlapped files from overlapped files to minimize work required for deduplication (which is the case in this example)
|
||||
- to distribute non-overlapped files to increase parallel execution
|
||||
|
||||
### Analyze branch structures
|
||||
|
||||
After data is output from a data scanning node, it flows up to the next parent (outer) node.
|
||||
|
||||
In the example plan:
|
||||
|
||||
- Each leaf node is the first step in a branch of nodes planned for processing the scanned data.
|
||||
- The three branches execute in parallel.
|
||||
- After the leaf node, each branch contains the following similar node structure:
|
||||
|
||||
```sql
|
||||
...
|
||||
CoalesceBatchesExec: target_batch_size=8192
|
||||
FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA
|
||||
...
|
||||
```
|
||||
|
||||
- `FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA`: filters data for the condition `time@3 >= 200 AND time@3 < 700 AND state@2 = MA`, and guarantees that all data is pruned.
|
||||
- `CoalesceBatchesExec: target_batch_size=8192`: combines small batches into larger batches. See the DataFusion [`CoalesceBatchesExec`] documentation.
|
||||
|
||||
#### Sorting yet-to-be-persisted data
|
||||
|
||||
In the `RecordBatchesExec` branch, the node that follows `CoalesceBatchesExec` is a `SortExec` node:
|
||||
|
||||
```sql
|
||||
SortExec: expr=[state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC]
|
||||
```
|
||||
|
||||
The node uses the specified expression `state ASC, city ASC, time ASC, __chunk_order ASC` to sort the yet-to-be-persisted data.
|
||||
Neither ParquetExec_A nor ParquetExec_B contain a similar node because data in the Object store is already sorted (by the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester) or the [Compactor](/influxdb/cloud-dedicated/reference/internals/storage-engine/#compactor)) in the given order; the query plan only needs to sort data that arrives from the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester).
|
||||
|
||||
#### Recognize overlapping and duplicate data
|
||||
|
||||
In the example physical plan, the ParquetExec_B and `RecordBatchesExec` nodes share the following parent nodes:
|
||||
|
||||
```sql
|
||||
...
|
||||
DeduplicateExec: [state@2 ASC,city@1 ASC,time@3 ASC]
|
||||
SortPreservingMergeExec: [state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC]
|
||||
UnionExec
|
||||
...
|
||||
```
|
||||
|
||||
{{% caption %}}Overlapped data node structure{{% /caption %}}
|
||||
|
||||
1. `UnionExec`: unions multiple streams of input data by concatenating the partitions. `UnionExec` doesn't do any merging and is fast to execute.
|
||||
2. `SortPreservingMergeExec: [state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC]`: merges already sorted data; indicates that preceding data (from nodes below it) is already sorted. The output data is a single sorted stream.
|
||||
3. `DeduplicateExec: [state@2 ASC,city@1 ASC,time@3 ASC]`: deduplicates an input stream of sorted data.
|
||||
Because `SortPreservingMergeExec` ensures a single sorted stream, it often, but not always, precedes `DeduplicateExec`.
|
||||
|
||||
A `DeduplicateExec` node indicates that encompassed nodes have [_overlapped data_](/influxdb/cloud-dedicated/reference/internals/query-plan/#overlapping-data-and-deduplication)--data in a file or batch have timestamps in the same range as data in another file or batch.
|
||||
Due to how InfluxDB organizes data, data is never duplicated _within_ a file.
|
||||
|
||||
In the example, the `DeduplicateExec` node encompasses ParquetExec_B and the `RecordBatchesExec` node, which indicates that ParquetExec_B [file group](#file_groups) files overlap the yet-to-be persisted data.
|
||||
|
||||
The following [sample data](#sample-data) excerpt shows overlapping data between a file and [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester) data:
|
||||
|
||||
```text
|
||||
// Chunk 4: stored Parquet file
|
||||
// - time range: 400-600
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600",
|
||||
],
|
||||
|
||||
// Chunk 5: Ingester data
|
||||
// - time range: 550-700
|
||||
// - overlaps and duplicates data in chunk 4
|
||||
[
|
||||
"h2o,state=MA,city=Bedford max_temp=88.75,area=742u 600", // overlaps chunk 4
|
||||
...
|
||||
"h2o,state=MA,city=Boston min_temp=67.4 550", // overlaps chunk 4
|
||||
]
|
||||
```
|
||||
|
||||
If files or ingested data overlap, the Querier must include the `DeduplicateExec` in the query plan to remove any duplicates.
|
||||
`DeduplicateExec` doesn't necessarily indicate that data is duplicated.
|
||||
If a plan reads many files and performs deduplication on all of them, it might be for the following reasons:
|
||||
|
||||
- the files contain duplicate data
|
||||
- the Object store has many small overlapped files that the Compactor hasn't compacted yet. After compaction, your query may perform better because it has fewer files to read
|
||||
- the Compactor isn't keeping up. If the data isn't duplicated and you still have many small overlapping files after compaction, then you might want to review the Compactor's workload and add more resources as needed
|
||||
|
||||
A leaf node that doesn't have a `DeduplicateExec` node in its branch doesn't require deduplication and doesn't overlap other files or [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester) data--for example, ParquetExec_A has no overlaps:
|
||||
|
||||
```sql
|
||||
ProjectionExec:...
|
||||
CoalesceBatchesExec:...
|
||||
FilterExec:...
|
||||
ParquetExec:...
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
The absence of a `DeduplicateExec` node means that files don't overlap.
|
||||
{{% /caption %}}
|
||||
|
||||
##### Data scan output
|
||||
|
||||
`ProjectionExec` nodes filter columns so that only the `city` column remains in the output:
|
||||
|
||||
```sql
|
||||
`ProjectionExec: expr=[city@0 as city]`
|
||||
```
|
||||
|
||||
##### Final processing
|
||||
|
||||
After deduplicating and filtering data in each leaf node, the plan combines the output and then applies aggregation and sorting operators for the final result:
|
||||
|
||||
```sql
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST] |
|
||||
| | AggregateExec: mode=FinalPartitioned, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | RepartitionExec: partitioning=Hash([city@0], 4), input_partitions=4 |
|
||||
| | AggregateExec: mode=Partial, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=3 |
|
||||
| | UnionExec
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
Operator structure for aggregating, sorting, and final output.
|
||||
{{% /caption %}}
|
||||
|
||||
- `UnionExec`: unions data streams. Note that the number of output streams is the same as the number of input streams--the `UnionExec` node is an intermediate step to downstream operators that actually merge or split data streams.
|
||||
- `RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=3`: Splits three input streams into four output streams in round-robin fashion. The plan splits streams to increase parallel execution.
|
||||
- `AggregateExec: mode=Partial, gby=[city@0 as city], aggr=[COUNT(Int64(1))]`: Groups data as specified in the [query](#sample-query): `city, count(1)`.
|
||||
This node aggregates each of the four streams separately, and then outputs four streams, indicated by `mode=Partial`--the data isn't fully aggregated.
|
||||
- `RepartitionExec: partitioning=Hash([city@0], 4), input_partitions=4`: Repartitions data on `Hash([city])` and into four streams--each stream contains data for one city.
|
||||
- `AggregateExec: mode=FinalPartitioned, gby=[city@0 as city], aggr=[COUNT(Int64(1))]`: Applies the final aggregation (`aggr=[COUNT(Int64(1))]`) to the data. `mode=FinalPartitioned` indicates that the data has already been partitioned (by city) and doesn't need further grouping by `AggregateExec`.
|
||||
- `SortExec: expr=[city@0 ASC NULLS LAST]`: Sorts the four streams of data, each on `city`, as specified in the query.
|
||||
- `SortPreservingMergeExec: [city@0 ASC NULLS LAST]`: Merges and sorts the four sorted streams for the final output.
|
||||
|
||||
In the preceding examples, the `EXPLAIN` report shows the query plan without executing the query.
|
||||
To view runtime metrics, such as execution time for a plan and its operators, use [`EXPLAIN ANALYZE`](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze) to generate the report and [tracing](/influxdb/cloud-dedicated/query-data/optimize-queries/#enable-trace-logging) for further debugging, if necessary.
|
|
@ -6,28 +6,22 @@ weight: 401
|
|||
menu:
|
||||
influxdb_cloud_dedicated:
|
||||
name: Understand Flight responses
|
||||
parent: Execute queries
|
||||
influxdb/cloud-dedicated/tags: [query, sql, influxql]
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/cloud-dedicated/tags: [query, errors, flight]
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/query-data/sql/
|
||||
- /influxdb/cloud-dedicated/query-data/influxql/
|
||||
- /influxdb/cloud-dedicated/reference/client-libraries/v3/
|
||||
---
|
||||
|
||||
Learn how to handle responses and troubleshoot errors encountered when querying {{% product-name %}} with Flight+gRPC and Arrow Flight clients.
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [InfluxDB Flight responses](#influxdb-flight-responses)
|
||||
- [Stream](#stream)
|
||||
- [Schema](#schema)
|
||||
- [Example](#example)
|
||||
- [RecordBatch](#recordbatch)
|
||||
- [InfluxDB status and error codes](#influxdb-status-and-error-codes)
|
||||
- [Troubleshoot errors](#troubleshoot-errors)
|
||||
- [Internal Error: Received RST_STREAM](#internal-error-received-rst_stream)
|
||||
- [Internal Error: stream terminated by RST_STREAM with NO_ERROR](#internal-error-stream-terminated-by-rst_stream-with-no_error)
|
||||
- [Invalid Argument: Invalid ticket](#invalid-argument-invalid-ticket)
|
||||
- [Timeout: Deadline exceeded](#timeout-deadline-exceeded)
|
||||
- [Unauthenticated: Unauthenticated](#unauthenticated-unauthenticated)
|
||||
- [Unauthorized: Permission denied](#unauthorized-permission-denied)
|
||||
- [FlightUnavailableError: Could not get default pem root certs](#flightunavailableerror-could-not-get-default-pem-root-certs)
|
||||
|
||||
## InfluxDB Flight responses
|
||||
|
||||
|
@ -42,7 +36,7 @@ For example, if you use the [`influxdb3-python` Python client library](/influxdb
|
|||
InfluxDB responds with one of the following:
|
||||
|
||||
- A [stream](#stream) in Arrow IPC streaming format
|
||||
- An [error status code](#influxdb-error-codes) and an optional `details` field that contains the status and a message that describes the error
|
||||
- An [error status code](#influxdb-status-and-error-codes) and an optional `details` field that contains the status and a message that describes the error
|
||||
|
||||
### Stream
|
||||
|
||||
|
@ -129,7 +123,7 @@ In gRPC, every call returns a status object that contains an integer code and a
|
|||
During a request, the gRPC client and server may each return a status--for example:
|
||||
|
||||
- The server fails to process the query; responds with status `internal error` and gRPC status `13`.
|
||||
- The request is missing a database token; the server responds with status `unauthenticated` and gRPC status `16`.
|
||||
- The request is missing a [token](/influxdb/cloud-dedicated/admin/tokens/); the server responds with status `unauthenticated` and gRPC status `16`.
|
||||
- The server responds with a stream, but the client loses the connection due to a network failure and returns status `unavailable`.
|
||||
|
||||
gRPC defines the integer [status codes](https://grpc.github.io/grpc/core/status_8h.html) and definitions for servers and clients and
|
||||
|
@ -170,7 +164,6 @@ _For a list of gRPC codes that servers and clients may return, see [Status codes
|
|||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
|
||||
### Troubleshoot errors
|
||||
|
||||
#### Internal Error: Received RST_STREAM
|
||||
|
@ -188,8 +181,6 @@ Flight returned internal error, with message: Received RST_STREAM with error cod
|
|||
- Server might have closed the connection due to an internal error.
|
||||
- The client exceeded the server's maximum number of concurrent streams.
|
||||
|
||||
<!-- END -->
|
||||
|
||||
#### Internal Error: stream terminated by RST_STREAM with NO_ERROR
|
||||
|
||||
**Example**:
|
||||
|
@ -205,8 +196,6 @@ pyarrow._flight.FlightInternalError: Flight returned internal error, with messag
|
|||
- Possible network disruption, even if it's temporary.
|
||||
- The server might have reached its maximum capacity or other internal limits.
|
||||
|
||||
<!-- END -->
|
||||
|
||||
#### Invalid Argument: Invalid ticket
|
||||
|
||||
**Example**:
|
||||
|
@ -221,8 +210,6 @@ pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message:
|
|||
- The request is missing the database name or some other required metadata value.
|
||||
- The request contains bad query syntax.
|
||||
|
||||
<!-- END -->
|
||||
|
||||
#### Timeout: Deadline exceeded
|
||||
|
||||
<!--pytest.mark.skip-->
|
||||
|
@ -249,8 +236,6 @@ Flight returned unauthenticated error, with message: unauthenticated. gRPC clien
|
|||
- Token is missing from the request.
|
||||
- The specified token doesn't exist for the specified organization.
|
||||
|
||||
<!-- END -->
|
||||
|
||||
#### Unauthorized: Permission denied
|
||||
|
||||
**Example**:
|
||||
|
@ -264,8 +249,6 @@ pyarrow._flight.FlightUnauthorizedError: Flight returned unauthorized error, wit
|
|||
|
||||
- The specified token doesn't have read permission for the specified database.
|
||||
|
||||
<!-- END -->
|
||||
|
||||
#### FlightUnavailableError: Could not get default pem root certs
|
||||
|
||||
**Example**:
|
|
@ -0,0 +1,64 @@
|
|||
---
|
||||
title: Optimize queries
|
||||
description: >
|
||||
Optimize queries to improve performance and reduce their memory and compute (CPU) requirements in InfluxDB.
|
||||
Learn how to use observability tools to analyze query execution and view metrics.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_cloud_dedicated:
|
||||
name: Optimize queries
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/cloud-dedicated/tags: [query, performance, observability, errors, sql, influxql]
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/query-data/sql/
|
||||
- /influxdb/cloud-dedicated/query-data/influxql/
|
||||
- /influxdb/cloud-dedicated/query-data/execute-queries/analyze-query-plan/
|
||||
aliases:
|
||||
- /influxdb/cloud-dedicated/query-data/execute-queries/optimize-queries/
|
||||
---
|
||||
|
||||
Optimize SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
|
||||
Learn how to use observability tools to analyze query execution and view metrics.
|
||||
|
||||
- [Why is my query slow?](#why-is-my-query-slow)
|
||||
- [Strategies for improving query performance](#strategies-for-improving-query-performance)
|
||||
- [Analyze and troubleshoot queries](#analyze-and-troubleshoot-queries)
|
||||
|
||||
## Why is my query slow?
|
||||
|
||||
Query performance depends on time range and complexity.
|
||||
If a query is slower than you expect, it might be due to the following reasons:
|
||||
|
||||
- It queries data from a large time range.
|
||||
- It includes intensive operations, such as querying many string values or `ORDER BY` sorting or re-sorting large amounts of data.
|
||||
|
||||
## Strategies for improving query performance
|
||||
|
||||
The following design strategies generally improve query performance and resource use:
|
||||
|
||||
- Follow [schema design best practices](/influxdb/cloud-dedicated/write-data/best-practices/schema-design/) to make querying easier and more performant.
|
||||
- Query only the data you need--for example, include a [`WHERE` clause](/influxdb/cloud-dedicated/reference/sql/where/) that filters data by a time range.
|
||||
InfluxDB v3 stores data in a Parquet file for each measurement and day, and retrieves files from the Object store to answer a query.
|
||||
The smaller the time range in your query, the fewer files InfluxDB needs to retrieve from the Object store.
|
||||
- [Downsample data](/influxdb/cloud-dedicated/process-data/downsample/) to reduce the amount of data you need to query.
|
||||
|
||||
Some bottlenecks may be out of your control and are the result of a suboptimal execution plan, such as:
|
||||
|
||||
- Applying the same sort (`ORDER BY`) to already sorted data.
|
||||
- Retrieving many Parquet files from the Object store--the same query performs better if it retrieves fewer - though, larger - files.
|
||||
- Querying many overlapped Parquet files.
|
||||
- Performing a large number of table scans.
|
||||
|
||||
{{% note %}}
|
||||
#### Analyze query plans to view metrics and recognize bottlenecks
|
||||
|
||||
To view runtime metrics for a query, such as the number of files scanned, use the [`EXPLAIN ANALYZE` keywords](/influxdb/cloud-dedicated/reference/sql/explain/#explain-analyze) and learn how to [analyze a query plan](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/analyze-query-plan/).
|
||||
{{% /note %}}
|
||||
|
||||
## Analyze and troubleshoot queries
|
||||
|
||||
Use the following tools to analyze and troubleshoot queries and find performance bottlenecks:
|
||||
|
||||
- [Analyze a query plan](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/analyze-query-plan/)
|
||||
- [Enable trace logging for a query](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/trace/)
|
||||
- [Retrieve `system.queries` information for a query](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/system-information/)
|
|
@ -0,0 +1,108 @@
|
|||
---
|
||||
title: Retrieve system information for a query
|
||||
description: >
|
||||
Learn how to use the system.queries debug feature to retrieve system information for a query in InfluxDB Cloud Dedicated.
|
||||
weight: 401
|
||||
menu:
|
||||
influxdb_cloud_dedicated:
|
||||
name: Retrieve system information
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/cloud-dedicated/tags: [query, observability]
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/query-data/sql/
|
||||
- /influxdb/cloud-dedicated/query-data/influxql/
|
||||
- /influxdb/cloud-dedicated/reference/client-libraries/v3/
|
||||
---
|
||||
|
||||
Learn how to retrieve system information for a query in {{% product-name %}}.
|
||||
|
||||
In addition to the SQL standard `information_schema`, {{% product-name %}} contains _system_ tables that provide access to
|
||||
InfluxDB-specific information.
|
||||
The information in each system table is scoped to the namespace you're querying;
|
||||
you can only retrieve system information for that particular instance.
|
||||
|
||||
To get information about queries you've run on the current instance, use SQL to query the [`system.queries` table](/influxdb/cloud-dedicated/reference/internals/system-tables/#systemqueries-measurement), which contains information from the Querier instance currently handling queries.
|
||||
If you [enabled trace logging](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/trace/) for the query, the `trace-id` appears in the `system.queries.trace_id` column for the query.
|
||||
|
||||
The `system.queries` table is an InfluxDB v3 **debug feature**.
|
||||
To enable the feature and query `system.queries`, include an `"iox-debug"` header set to `"true"` and use SQL to query the table.
|
||||
|
||||
The following sample code shows how to use the Python client library to do the following:
|
||||
|
||||
1. Enable tracing for a query.
|
||||
2. Retrieve the trace ID record from `system.queries`.
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
|
||||
{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}}
|
||||
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
```python
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import secrets
|
||||
import pandas
|
||||
|
||||
def get_query_information():
|
||||
print('# Get query information')
|
||||
|
||||
client = InfluxDBClient3(token = f"DATABASE_TOKEN",
|
||||
host = f"{{< influxdb/host >}}",
|
||||
database = f"DATABASE_NAME")
|
||||
|
||||
random_bytes = secrets.token_bytes(16)
|
||||
trace_id = random_bytes.hex()
|
||||
trace_value = (f"{trace_id}:1112223334445:0:1").encode('utf-8')
|
||||
sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'"
|
||||
|
||||
try:
|
||||
client.query(sql, headers=[(b'influx-trace-id', trace_value)])
|
||||
client.close()
|
||||
except Exception as e:
|
||||
print("Query error: ", e)
|
||||
|
||||
client = InfluxDBClient3(token = f"DATABASE_TOKEN",
|
||||
host = f"{{< influxdb/host >}}",
|
||||
database = f"DATABASE_NAME")
|
||||
|
||||
import time
|
||||
df = pandas.DataFrame()
|
||||
|
||||
for i in range(0, 5):
|
||||
time.sleep(1)
|
||||
# Use SQL
|
||||
# To query the system.queries table for your trace ID, pass the following:
|
||||
# - the iox-debug: true request header
|
||||
# - an SQL query for the trace_id column
|
||||
reader = client.query(f'''SELECT compute_duration, query_type, query_text,
|
||||
success, trace_id
|
||||
FROM system.queries
|
||||
WHERE issue_time >= now() - INTERVAL '1 day'
|
||||
AND trace_id = '{trace_id}'
|
||||
ORDER BY issue_time DESC
|
||||
''',
|
||||
headers=[(b"iox-debug", b"true")],
|
||||
mode="reader")
|
||||
|
||||
df = reader.read_all().to_pandas()
|
||||
if df.shape[0]:
|
||||
break
|
||||
|
||||
assert df.shape == (1, 5), f"Expect a row for the query trace ID."
|
||||
print(df)
|
||||
|
||||
get_query_information()
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
The output is similar to the following:
|
||||
|
||||
```text
|
||||
compute_duration query_type query_text success trace_id
|
||||
0 days sql SELECT compute_duration, quer... True 67338...
|
||||
```
|
|
@ -0,0 +1,247 @@
|
|||
---
|
||||
title: Enable trace logging
|
||||
description: >
|
||||
Enable trace logging for a query in InfluxDB Cloud Dedicated.
|
||||
weight: 401
|
||||
menu:
|
||||
influxdb_cloud_dedicated:
|
||||
name: Enable trace logging
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/cloud-dedicated/tags: [query, observability]
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/query-data/sql/
|
||||
- /influxdb/cloud-dedicated/query-data/influxql/
|
||||
- /influxdb/cloud-dedicated/reference/client-libraries/v3/
|
||||
---
|
||||
|
||||
Learn how to enable trace logging to help you identify performance bottlenecks and troubleshoot problems in queries.
|
||||
|
||||
When you enable trace logging for a query, InfluxDB propagates your _trace ID_ through system processes and collects additional log information.
|
||||
InfluxData Support can then use the trace ID that you provide to filter, collate, and analyze log information for the query run.
|
||||
The tracing system follows the [OpenTelemetry traces](https://opentelemetry.io/docs/concepts/signals/traces/) model for providing observability into a request.
|
||||
|
||||
{{% warn %}}
|
||||
|
||||
#### Avoid unnecessary tracing
|
||||
|
||||
Only enable tracing for a query when you need to request troubleshooting help from InfluxDB Support.
|
||||
To manage resources, InfluxDB has an upper limit for the number of trace requests.
|
||||
Too many traces can cause InfluxDB to evict log information.
|
||||
{{% /warn %}}
|
||||
|
||||
To enable tracing for a query, include the `influx-trace-id` header in your query request.
|
||||
|
||||
#### Syntax
|
||||
|
||||
Use the following syntax for the `influx-trace-id` header:
|
||||
|
||||
```http
|
||||
influx-trace-id: TRACE_ID:1112223334445:0:1
|
||||
```
|
||||
|
||||
In the header value, replace the following:
|
||||
|
||||
- `TRACE_ID`: a unique string, 8-16 bytes long, encoded as hexadecimal (32 maximum hex characters).
|
||||
The trace ID should uniquely identify the query run.
|
||||
- `:1112223334445:0:1`: InfluxDB constant values (required, but ignored)
|
||||
|
||||
#### Example
|
||||
|
||||
The following examples show how to create and pass a trace ID to enable query tracing in InfluxDB:
|
||||
|
||||
{{< tabs-wrapper >}}
|
||||
{{% tabs %}}
|
||||
[Python with FlightCallOptions](#)
|
||||
[Python with FlightClientMiddleware](#python-with-flightclientmiddleware)
|
||||
{{% /tabs %}}
|
||||
{{% tab-content %}}
|
||||
<!---- BEGIN PYTHON WITH FLIGHTCALLOPTIONS ---->
|
||||
Use the `InfluxDBClient3` InfluxDB Python client and pass the `headers` argument in the
|
||||
`query()` method.
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
|
||||
{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}}
|
||||
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
```python
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import secrets
|
||||
|
||||
def use_flightcalloptions_trace_header():
|
||||
print('# Use FlightCallOptions to enable tracing.')
|
||||
client = InfluxDBClient3(token=f"DATABASE_TOKEN",
|
||||
host=f"{{< influxdb/host >}}",
|
||||
database=f"DATABASE_NAME")
|
||||
|
||||
# Generate a trace ID for the query:
|
||||
# 1. Generate a random 8-byte value as bytes.
|
||||
# 2. Encode the value as hexadecimal.
|
||||
random_bytes = secrets.token_bytes(8)
|
||||
trace_id = random_bytes.hex()
|
||||
|
||||
# Append required constants to the trace ID.
|
||||
trace_value = f"{trace_id}:1112223334445:0:1"
|
||||
|
||||
# Encode the header key and value as bytes.
|
||||
# Create a list of header tuples.
|
||||
headers = [((b"influx-trace-id", trace_value.encode('utf-8')))]
|
||||
|
||||
sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'"
|
||||
influxql = "SELECT * FROM home WHERE time >= -90d"
|
||||
|
||||
# Use the query() headers argument to pass the list as FlightCallOptions.
|
||||
client.query(sql, headers=headers)
|
||||
|
||||
client.close()
|
||||
|
||||
use_flightcalloptions_trace_header()
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
<!---- END PYTHON WITH FLIGHTCALLOPTIONS ---->
|
||||
{{% /tab-content %}}
|
||||
{{% tab-content %}}
|
||||
<!---- BEGIN PYTHON WITH MIDDLEWARE ---->
|
||||
Use the `InfluxDBClient3` InfluxDB Python client and `flight.ClientMiddleware` to pass and inspect headers.
|
||||
|
||||
#### Tracing response header
|
||||
|
||||
With tracing enabled and a valid trace ID in the request, InfluxDB's `DoGet` action response contains a header with the trace ID that you sent.
|
||||
|
||||
##### Trace response header syntax
|
||||
|
||||
```http
|
||||
trace-id: TRACE_ID
|
||||
```
|
||||
|
||||
#### Inspect Flight response headers
|
||||
|
||||
To inspect Flight response headers when using a client library, pass a `FlightClientMiddleware` instance
|
||||
that defines a middleware callback function for the `onHeadersReceived` event (the particular function name you use depends on the client library language).
|
||||
|
||||
The following example uses Python client middleware that adds request headers and extracts the trace ID from the `DoGet` response headers:
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
|
||||
{{% code-placeholders "DATABASE_(NAME|TOKEN)|APP_REQUEST_ID" %}}
|
||||
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
```python
|
||||
import pyarrow.flight as flight
|
||||
|
||||
class TracingClientMiddleWareFactory(flight.ClientMiddleware):
|
||||
# Defines a custom middleware factory that returns a middleware instance.
|
||||
def __init__(self):
|
||||
self.request_headers = []
|
||||
self.response_headers = []
|
||||
self.traces = []
|
||||
|
||||
def addRequestHeader(self, header):
|
||||
self.request_headers.append(header)
|
||||
|
||||
def addResponseHeader(self, header):
|
||||
self.response_headers.append(header)
|
||||
|
||||
def addTrace(self, traceid):
|
||||
self.traces.append(traceid)
|
||||
|
||||
def createTrace(self, traceid):
|
||||
# Append InfluxDB constants to the trace ID.
|
||||
trace = f"{traceid}:1112223334445:0:1"
|
||||
|
||||
# To the list of request headers,
|
||||
# add a tuple with the header key and value as bytes.
|
||||
self.addRequestHeader((b"influx-trace-id", trace.encode('utf-8')))
|
||||
|
||||
def start_call(self, info):
|
||||
return TracingClientMiddleware(info.method, self)
|
||||
|
||||
class TracingClientMiddleware(flight.ClientMiddleware):
|
||||
# Defines middleware with client event callback methods.
|
||||
def __init__(self, method, callback_obj):
|
||||
self._method = method
|
||||
self.callback = callback_obj
|
||||
|
||||
def call_completed(self, exception):
|
||||
print('callback: call_completed')
|
||||
if(exception):
|
||||
print(f" ...with exception: {exception}")
|
||||
|
||||
def sending_headers(self):
|
||||
print('callback: sending_headers: ', self.callback.request_headers)
|
||||
if len(self.callback.request_headers) > 0:
|
||||
return dict(self.callback.request_headers)
|
||||
|
||||
def received_headers(self, headers):
|
||||
self.callback.addResponseHeader(headers)
|
||||
# For the DO_GET action, extract the trace ID from the response headers.
|
||||
if str(self._method) == "FlightMethod.DO_GET" and "trace-id" in headers:
|
||||
trace_id = headers["trace-id"][0]
|
||||
self.callback.addTrace(trace_id)
|
||||
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import secrets
|
||||
|
||||
def use_middleware_trace_header():
|
||||
print('# Use Flight client middleware to enable tracing.')
|
||||
|
||||
# Instantiate the middleware.
|
||||
res = TracingClientMiddleWareFactory()
|
||||
|
||||
# Instantiate the client, passing in the middleware instance that provides
|
||||
# event callbacks for the request.
|
||||
client = InfluxDBClient3(token=f"DATABASE_TOKEN",
|
||||
host=f"{{< influxdb/host >}}",
|
||||
database=f"DATABASE_NAME",
|
||||
flight_client_options={"middleware": (res,)})
|
||||
|
||||
# Generate a trace ID for the query:
|
||||
# 1. Generate a random 8-byte value as bytes.
|
||||
# 2. Encode the value as hexadecimal.
|
||||
random_bytes = secrets.token_bytes(8)
|
||||
trace_id = random_bytes.hex()
|
||||
|
||||
res.createTrace(trace_id)
|
||||
|
||||
sql = "SELECT * FROM home WHERE time >= now() - INTERVAL '30 days'"
|
||||
|
||||
client.query(sql)
|
||||
client.close()
|
||||
assert trace_id in res.traces[0], "Expect trace ID in DoGet response."
|
||||
|
||||
use_middleware_trace_header()
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
<!---- END PYTHON WITH MIDDLEWARE ---->
|
||||
{{% /tab-content %}}
|
||||
{{< /tabs-wrapper >}}
|
||||
|
||||
Replace the following:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database
|
||||
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-dedicated/admin/tokens/) with sufficient permissions to the specified database
|
||||
|
||||
{{% note %}}
|
||||
Store or log your query trace ID to ensure you can provide it to InfluxData Support for troubleshooting.
|
||||
{{% /note %}}
|
||||
|
||||
After you run your query with tracing enabled, do the following:
|
||||
|
||||
- Remove the tracing header from subsequent runs of the query (to [avoid unnecessary tracing](#avoid-unnecessary-tracing)).
|
||||
- Provide the trace ID in a request to InfluxData Support.
|
||||
|
||||
### Retrieve system information for a query
|
||||
|
||||
If you enable trace logging for a query, the `trace-id` appears in the [`system.queries` table](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/system-information).
|
|
@ -0,0 +1,44 @@
|
|||
---
|
||||
title: Troubleshoot queries
|
||||
description: >
|
||||
Troubleshoot SQL and InfluxQL queries in InfluxDB.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_cloud_dedicated:
|
||||
name: Troubleshoot queries
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/cloud-dedicated/tags: [query, performance, observability, errors, sql, influxql]
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/query-data/sql/
|
||||
- /influxdb/cloud-dedicated/query-data/influxql/
|
||||
- /influxdb/cloud-dedicated/reference/client-libraries/v3/
|
||||
aliases:
|
||||
- /influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot/
|
||||
---
|
||||
|
||||
Troubleshoot SQL and InfluxQL queries that return unexpected results.
|
||||
|
||||
- [Why doesn't my query return data?](#why-doesnt-my-query-return-data)
|
||||
- [Optimize slow or expensive queries](#optimize-slow-or-expensive-queries)
|
||||
|
||||
## Why doesn't my query return data?
|
||||
|
||||
If a query doesn't return any data, it might be due to the following:
|
||||
|
||||
- Your data falls outside the time range (or other conditions) in the query--for example, the InfluxQL `SHOW TAG VALUES` command uses a default time range of 1 day.
|
||||
- The query (InfluxDB server) timed out.
|
||||
- The query client timed out.
|
||||
|
||||
If a query times out or returns an error, it might be due to the following:
|
||||
|
||||
- a bad request
|
||||
- a server or network problem
|
||||
- it queries too much data
|
||||
|
||||
[Understand Arrow Flight responses](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/flight-responses/) and error messages for queries.
|
||||
|
||||
## Optimize slow or expensive queries
|
||||
|
||||
If a query is slow or uses too many compute resources, limit the amount of data that it queries.
|
||||
|
||||
See how to [optimize queries](/influxdb/cloud-dedicated/query-data/troubleshoot-and-optimize/optimize-queries/) and use tools to view runtime metrics, identify bottlenecks, and debug queries.
|
|
@ -23,7 +23,7 @@ prepend:
|
|||
[**Compare tools you can use**](/influxdb/cloud-dedicated/get-started/#tools-to-use) to interact with {{% product-name %}}.
|
||||
---
|
||||
|
||||
Arduino is an open-source hardware and software platform used for building electronics projects.
|
||||
Arduino is an open source hardware and software platform used for building electronics projects.
|
||||
|
||||
The documentation for this client library is available on GitHub.
|
||||
|
||||
|
|
|
@ -22,7 +22,7 @@ prepend:
|
|||
[**Compare tools you can use**](/influxdb/cloud-dedicated/get-started/#tools-to-use) to interact with {{% product-name %}}.
|
||||
---
|
||||
|
||||
Kotlin is an open-source programming language that runs on the Java Virtual Machine (JVM).
|
||||
Kotlin is an open source programming language that runs on the Java Virtual Machine (JVM).
|
||||
|
||||
The documentation for this client library is available on GitHub.
|
||||
|
||||
|
|
|
@ -77,7 +77,7 @@ InfluxData typically recommends batch sizes of 5,000-10,000 points.
|
|||
In some use cases, performance may improve with significantly smaller or larger batches.
|
||||
|
||||
Related entries:
|
||||
[line protocol](#line-protocol),
|
||||
[line protocol](#line-protocol-lp),
|
||||
[point](#point)
|
||||
|
||||
### batch size
|
||||
|
@ -260,7 +260,7 @@ Aggregating high resolution data into lower resolution data to preserve disk spa
|
|||
|
||||
### duration
|
||||
|
||||
A data type that represents a duration of time (1s, 1m, 1h, 1d).
|
||||
A data type that represents a duration of time--for example, `1s`, `1m`, `1h`, `1d`.
|
||||
Retention periods are set using durations.
|
||||
|
||||
Related entries:
|
||||
|
@ -337,9 +337,6 @@ Related entries:
|
|||
|
||||
A file block is a fixed-length chunk of data read into memory when requested by an application.
|
||||
|
||||
Related entries:
|
||||
[block](#block)
|
||||
|
||||
### float
|
||||
|
||||
A real number written with a decimal point dividing the integer and fractional parts (`1.0`, `3.14`, `-20.1`).
|
||||
|
@ -408,7 +405,6 @@ Related entries:
|
|||
[measurement](#measurement),
|
||||
[tag key](#tag-key),
|
||||
|
||||
|
||||
### influx
|
||||
|
||||
`influx` is a command line interface (CLI) that interacts with the InfluxDB v1.x and v2.x server.
|
||||
|
@ -426,7 +422,7 @@ and other required processes.
|
|||
|
||||
### InfluxDB
|
||||
|
||||
An open-source time series database (TSDB) developed by InfluxData.
|
||||
An open source time series database (TSDB) developed by InfluxData.
|
||||
Written in Go and optimized for fast, high-availability storage and retrieval of
|
||||
time series data in fields such as operations monitoring, application metrics,
|
||||
Internet of Things sensor data, and real-time analytics.
|
||||
|
@ -467,7 +463,7 @@ The IOx storage engine (InfluxDB v3 storage engine) is a real-time, columnar
|
|||
database optimized for time series data built in Rust on top of
|
||||
[Apache Arrow](https://arrow.apache.org/) and
|
||||
[DataFusion](https://arrow.apache.org/datafusion/user-guide/introduction.html).
|
||||
IOx replaces the [TSM](#tsm) storage engine.
|
||||
IOx replaces the [TSM (Time Structured Merge tree)](#tsm-time-structured-merge-tree) storage engine.
|
||||
|
||||
## J
|
||||
|
||||
|
@ -497,11 +493,13 @@ and array data types.
|
|||
### keyword
|
||||
|
||||
A keyword is reserved by a program because it has special meaning.
|
||||
Every programming language has a set of keywords (reserved names) that cannot be used as an identifier.
|
||||
Every programming language has a set of keywords (reserved names) that cannot be used as identifiers--for example,
|
||||
you can't use `SELECT` (an SQL keyword) as a variable name in an SQL query.
|
||||
|
||||
See a list of [SQL keywords](/influxdb/cloud-dedicated/reference/sql/#keywords).
|
||||
See keyword lists:
|
||||
|
||||
<!-- TODO: Add a link to InfluxQL keywords -->
|
||||
- [SQL keywords](/influxdb/cloud-dedicated/reference/sql/#keywords)
|
||||
- [InfluxQL keywords](/influxdb/cloud-dedicated/reference/influxql/#keywords)
|
||||
|
||||
## L
|
||||
|
||||
|
@ -571,7 +569,6 @@ Related entries:
|
|||
[cluster](#cluster),
|
||||
[server](#server)
|
||||
|
||||
|
||||
### now
|
||||
|
||||
The local server's nanosecond timestamp.
|
||||
|
@ -612,7 +609,7 @@ Owners have read/write permissions.
|
|||
Users can have owner roles for databases and other resources.
|
||||
|
||||
Role permissions are separate from API token permissions. For additional
|
||||
information on API tokens, see [token](#tokens).
|
||||
information on API tokens, see [token](#token).
|
||||
|
||||
### output plugin
|
||||
|
||||
|
@ -719,6 +716,15 @@ An InfluxDB query returns time series data.
|
|||
|
||||
See [Query data in InfluxDB](/influxdb/cloud-dedicated/query-data/).
|
||||
|
||||
### query plan
|
||||
|
||||
A sequence of steps (_nodes_) that the InfluxDB Querier devises and executes to calculate the result of the query in the least amount of time.
|
||||
A _logical plan_ is a high level representation of a query and doesn't consider cluster configuration or data organization.
|
||||
A _physical plan_ represents the query execution plan and data flow through plan nodes that read (_scan_), deduplicate, merge, filter, and sort data.
|
||||
A physical plan is optimized for the cluster configuration and data organization.
|
||||
|
||||
See [Query plans](/influxdb/cloud-dedicated/reference/internals/query-plans/).
|
||||
|
||||
## R
|
||||
|
||||
### REPL
|
||||
|
@ -744,8 +750,7 @@ relative to [now](#now).
|
|||
The minimum retention period is **one hour**.
|
||||
|
||||
Related entries:
|
||||
[bucket](#bucket),
|
||||
[shard group duration](#shard-group-duration)
|
||||
[bucket](#bucket)
|
||||
|
||||
### retention policy (RP)
|
||||
|
||||
|
@ -786,6 +791,18 @@ Related entries:
|
|||
[timestamp](#timestamp),
|
||||
[unix timestamp](#unix-timestamp)
|
||||
|
||||
### row
|
||||
|
||||
A row in a [table](#table) represents a specific record or instance of data.
|
||||
[Column](#column) values in a row represent specific attributes or properties of the instance.
|
||||
Each row has a [primary key](/#primary-key) that makes the row unique from other rows in the table.
|
||||
|
||||
Related entries:
|
||||
[column](#column),
|
||||
[primary key](#primary-key),
|
||||
[series](#series),
|
||||
[table](#table)
|
||||
|
||||
## S
|
||||
|
||||
### schema
|
||||
|
@ -805,7 +822,7 @@ Related entries:
|
|||
### secret
|
||||
|
||||
Secrets are key-value pairs that contain information you want to control access
|
||||
o, such as API keys, passwords, or certificates.
|
||||
to, such as API keys, passwords, or certificates.
|
||||
|
||||
### selector
|
||||
|
||||
|
@ -942,7 +959,6 @@ Related entries:
|
|||
The key of a tag key-value pair.
|
||||
Tag keys are strings and store metadata.
|
||||
|
||||
|
||||
Related entries:
|
||||
[field key](#field-key),
|
||||
[tag](#tag),
|
||||
|
@ -1018,6 +1034,14 @@ There are different types of API tokens:
|
|||
Related entries:
|
||||
[Manage token](/influxdb/cloud-dedicated/admin/tokens/)
|
||||
|
||||
### transformation
|
||||
|
||||
Data transformation refers to the process of converting or modifying input data from one format, value, or structure to another.
|
||||
|
||||
InfluxQL [transformation functions](/influxdb/cloud-dedicated/reference/influxql/functions/transformations/) modify and return values in each row of queried data, but do not return an aggregated value across those rows.
|
||||
|
||||
Related entries: [aggregate](#aggregate), [function](#function), [selector](#selector)
|
||||
|
||||
### TSM (Time Structured Merge tree)
|
||||
|
||||
The InfluxDB v1 and v2 data storage format that allows greater compaction and
|
||||
|
@ -1078,7 +1102,7 @@ InfluxDB users are granted permission to access to InfluxDB.
|
|||
|
||||
### values per second
|
||||
|
||||
The preferred measurement of the rate at which data are persisted to InfluxDB.
|
||||
The preferred measurement of the rate at which data is persisted to InfluxDB.
|
||||
Write speeds are generally quoted in values per second.
|
||||
|
||||
To calculate the values per second rate, multiply the number of points written
|
||||
|
|
|
@ -0,0 +1,392 @@
|
|||
---
|
||||
title: Query plans
|
||||
description: >
|
||||
A query plan is a sequence of steps that the InfluxDB Querier devises and executes to calculate the result of a query in the least amount of time.
|
||||
InfluxDB query plans include DataFusion and InfluxDB logical plan and execution plan nodes for scanning, deduplicating, filtering, merging, and sorting data.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_cloud_dedicated:
|
||||
name: Query plans
|
||||
parent: InfluxDB internals
|
||||
influxdb/cloud-dedicated/tags: [query, sql, influxql]
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/query-data/sql/
|
||||
- /influxdb/cloud-dedicated/query-data/influxql/
|
||||
- /influxdb/cloud-dedicated/query-data/execute-queries/analyze-query-plan/
|
||||
- /influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot/
|
||||
- /influxdb/cloud-dedicated/reference/internals/storage-engine/
|
||||
---
|
||||
|
||||
A query plan is a sequence of steps that the InfluxDB v3 [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) devises and executes to calculate the result of a query.
|
||||
The Querier uses DataFusion and Arrow to build and execute query plans
|
||||
that call DataFusion and InfluxDB-specific operators that read data from the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store), and the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester), and apply query transformations, such as deduplicating, filtering, aggregating, merging, projecting, and sorting to calculate the final result.
|
||||
|
||||
Like many other databases, the [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) contains a Query Optimizer.
|
||||
After it parses an incoming query, the [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) builds a _logical plan_--a sequence of high-level steps such as scanning, filtering, and sorting, required for the query.
|
||||
Following the logical plan, the [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) then builds the optimal _physical plan_ to calculate the correct result in the least amount of time.
|
||||
The plan takes advantage of data partitioning by the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester) to parallelize plan operations and prune unnecessary data before executing the plan.
|
||||
The [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) also applies common techniques of predicate and projection pushdown to further prune data as early as possible.
|
||||
|
||||
- [Display syntax](#display-syntax)
|
||||
- [Example logical and physical plan](#example-logical-and-physical-plan)
|
||||
- [Data flow](#data-flow)
|
||||
- [Logical plan](#logical-plan)
|
||||
- [`LogicalPlan` nodes](#logicalplan-nodes)
|
||||
- [`TableScan`](#tablescan)
|
||||
- [`Projection`](#projection)
|
||||
- [`Filter`](#filter)
|
||||
- [`Sort`](#sort)
|
||||
- [Physical plan](#physical-plan)
|
||||
- [`ExecutionPlan` nodes](#executionplan-nodes)
|
||||
- [`DeduplicateExec`](#deduplicateexec)
|
||||
- [`EmptyExec`](#emptyexec)
|
||||
- [`FilterExec`](#filterexec)
|
||||
- [`ParquetExec`](#parquetexec)
|
||||
- [`ProjectionExec`](#projectionexec)
|
||||
- [`RecordBatchesExec`](#recordbatchesexec)
|
||||
- [`SortExec`](#sortexec)
|
||||
- [`SortPreservingMergeExec`](#sortpreservingmergeexec)
|
||||
- [Overlapping data and deduplication](#overlapping-data-and-deduplication)
|
||||
- [Example of overlapping data](#example-of-overlapping-data)
|
||||
- [DataFusion query plans](#datafusion-query-plans)
|
||||
|
||||
## Display syntax
|
||||
|
||||
[Logical](#logical-plan) and [physical query plans](#physical-plan) are represented (for example, in an `EXPLAIN` report) in _tree syntax_.
|
||||
|
||||
- Each plan is represented as an upside-down tree composed of _nodes_.
|
||||
- A parent node awaits the output of its child nodes.
|
||||
- Data flows up from the bottom innermost nodes of the tree to the outermost _root node_ at the top.
|
||||
|
||||
### Example logical and physical plan
|
||||
|
||||
The following query generates an `EXPLAIN` report that includes a logical and a physical plan:
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;
|
||||
```
|
||||
|
||||
The output is the following:
|
||||
|
||||
#### Figure 1. EXPLAIN report
|
||||
|
||||
```sql
|
||||
| plan_type | plan |
|
||||
+---------------+--------------------------------------------------------------------------+
|
||||
| logical_plan | Sort: h2o.city ASC NULLS LAST, h2o.time DESC NULLS FIRST |
|
||||
| | TableScan: h2o projection=[city, min_temp, time] |
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | UnionExec |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
Output from `EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;`
|
||||
{{% /caption %}}
|
||||
|
||||
The leaf nodes in the [Figure 1](#figure-1-explain-report) physical plan are parallel `ParquetExec` nodes:
|
||||
|
||||
```text
|
||||
ParquetExec: file_groups={...}, projection=[city, min_temp, time]
|
||||
...
|
||||
ParquetExec: file_groups={...}, projection=[city, min_temp, time]
|
||||
```
|
||||
|
||||
## Data flow
|
||||
|
||||
A [physical plan](#physical-plan) node represents a specific implementation of `ExecutionPlan` that receives an input stream, applies expressions for filtering and sorting, and then yields an output stream to its parent node.
|
||||
|
||||
The following diagram shows the data flow and sequence of `ExecutionPlan` nodes in the [Figure 1](#figure-1-explain-report) physical plan:
|
||||
|
||||
<!-- BEGIN Query plan diagram -->
|
||||
{{< html-diagram/query-plan >}}
|
||||
<!-- END Query plan diagram -->
|
||||
|
||||
{{% product-name %}} includes the following plan expressions:
|
||||
|
||||
## Logical plan
|
||||
|
||||
A logical plan for a query:
|
||||
|
||||
- is a high-level plan that expresses the "intent" of a query and the steps required for calculating the result.
|
||||
- requires information about the data schema
|
||||
- is independent of the [physical execution](#physical-plan), cluster configuration, data source (Ingester or Object store), or how data is organized or partitioned
|
||||
- is displayed as a tree of [DataFusion `LogicalPlan` nodes](#logical-plan-nodes)
|
||||
|
||||
## `LogicalPlan` nodes
|
||||
|
||||
Each node in an {{% product-name %}} logical plan tree represents a [`LogicalPlan` implementation](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html#variants) that receives criteria extracted from the query and applies relational operators and optimizations for transforming input data to an output table.
|
||||
|
||||
The following are some `LogicalPlan` nodes used in InfluxDB logical plans.
|
||||
|
||||
### `TableScan`
|
||||
|
||||
[`Tablescan`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.TableScan.html) retrieves rows from a table provider by reference or from the context.
|
||||
|
||||
### `Projection`
|
||||
|
||||
[`Projection`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Projection.html) evaluates an arbitrary list of expressions on the input; equivalent to an SQL `SELECT` statement with an expression list.
|
||||
|
||||
### `Filter`
|
||||
|
||||
[`Filter`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Filter.html) filters rows from the input that do not satisfy the specified expression; equivalent to an SQL `WHERE` clause with a predicate expression.
|
||||
|
||||
### `Sort`
|
||||
|
||||
[`Sort`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Sort.html) sorts the input according to a list of sort expressions; used to implement SQL `ORDER BY`.
|
||||
|
||||
For details and a list of `LogicalPlan` implementations, see [`Enum datafusion::logical_expr::LogicalPlan` Variants](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html#variants) in the DataFusion documentation.
|
||||
|
||||
## Physical plan
|
||||
|
||||
A physical plan, or _execution plan_, for a query:
|
||||
|
||||
- is an optimized plan that derives from the [logical plan](#logical-plan) and contains the low-level steps for query execution.
|
||||
- considers the cluster configuration (for example, CPU and memory allocation) and data organization (for example: partitions, the number of files, and whether files overlap)--for example:
|
||||
- If you run the same query with the same data on different clusters with different configurations, each cluster may generate a different physical plan for the query.
|
||||
- If you run the same query on the same cluster at different times, the physical plan may differ each time, depending on the data at query time.
|
||||
- if generated using `ANALYZE`, includes runtime metrics sampled during query execution
|
||||
- is displayed as a tree of [`ExecutionPlan` nodes](#execution-plan-nodes)
|
||||
|
||||
## `ExecutionPlan` nodes
|
||||
|
||||
Each node in an {{% product-name %}} physical plan represents a call to a specific implementation of the [DataFusion `ExecutionPlan`](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html)
|
||||
that receives input data, query criteria expressions, and an output schema.
|
||||
|
||||
The following are some `ExecutionPlan` nodes used in InfluxDB physical plans.
|
||||
|
||||
### `DeduplicateExec`
|
||||
|
||||
InfluxDB `DeduplicateExec` takes an input stream of `RecordBatch` sorted on `sort_key` and applies InfluxDB-specific deduplication logic.
|
||||
The output is dependent on the order of the input rows that have the same key.
|
||||
|
||||
### `EmptyExec`
|
||||
|
||||
DataFusion [`EmptyExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/empty/struct.EmptyExec.html) is an execution plan for an empty relation and indicates that the table doesn't contain data for the time range of the query.
|
||||
|
||||
### `FilterExec`
|
||||
|
||||
The execution plan for the [`Filter`](#filter) `LogicalPlan`.
|
||||
|
||||
DataFusion [`FilterExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/filter/struct.FilterExec.html) evaluates a boolean predicate against all input batches to determine which rows to include in the output batches.
|
||||
|
||||
### `ParquetExec`
|
||||
|
||||
DataFusion [`ParquetExec`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/parquet/struct.ParquetExec.html) scans one or more Parquet partitions.
|
||||
|
||||
#### `ParquetExec` expressions
|
||||
|
||||
##### `file_groups`
|
||||
|
||||
A _file group_ is a list of files to scan.
|
||||
Files are referenced by path:
|
||||
|
||||
- `1/1/b862a7e9b.../243db601-....parquet`
|
||||
- `1/1/b862a7e9b.../f5fb7c7d-....parquet`
|
||||
|
||||
In InfluxDB v3, the path structure represents how data is organized.
|
||||
|
||||
A path has the following structure:
|
||||
|
||||
```text
|
||||
<namespace_id>/<table_id>/<partition_hash_id>/<uuid_of_the_file>.parquet
|
||||
1 / 1 /b862a7e9b329ee6a4.../243db601-f3f1-4....parquet
|
||||
```
|
||||
|
||||
- `namespace_id`: the namespace (database) being queried
|
||||
- `table_id`: the table (measurement) being queried
|
||||
- `partition_hash_id`: the partition this file belongs to.
|
||||
You can count partition IDs to find how many partitions the query reads.
|
||||
- `uuid_of_the_file`: the file identifier.
|
||||
|
||||
`ParquetExec` processes groups in parallel and reads the files in each group sequentially.
|
||||
|
||||
##### `projection`
|
||||
|
||||
`projection` lists the table columns that the query plan needs to read to execute the query.
|
||||
The parameter name `projection` refers to _projection pushdown_, the action of filtering columns.
|
||||
|
||||
Consider the following sample data that contains many columns:
|
||||
|
||||
```text
|
||||
h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600
|
||||
```
|
||||
|
||||
| table | state | city | min_temp | max_temp | area | time |
|
||||
|:-----:|:-----:|:----:|:--------:|:--------:|:----:|:----:|
|
||||
| h2o | CA | SF | 68.4 | 85.7 | 500u | 600 |
|
||||
|
||||
However, the following SQL query specifies only three columns (`city`, `state`, and `time`):
|
||||
|
||||
```sql
|
||||
SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
When processing the query, the [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) specifies the three required columns in the projection and the projection is "pushed down" to leaf nodes--columns not specified are pruned as early as possible during query execution.
|
||||
|
||||
```text
|
||||
projection=[city, state, time]
|
||||
```
|
||||
|
||||
##### `output_ordering`
|
||||
|
||||
`output_ordering` specifies the sort order for the output.
|
||||
The Querier specifies `output_ordering` if the output should be ordered and if the [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) knows the order.
|
||||
|
||||
When storing data to Parquet files, InfluxDB sorts the data to improve storage compression and query efficiency and the planner tries to preserve that order for as long as possible.
|
||||
Generally, the `output_ordering` value that `ParquetExec` receives is the ordering (or a subset of the ordering) of stored data.
|
||||
|
||||
_By design, [`RecordBatchesExec`](#recordbatchesexec) data isn't sorted._
|
||||
|
||||
In the following example, the query planner specifies the output sort order `state ASC, city ASC, time ASC,`:
|
||||
|
||||
```text
|
||||
output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC]
|
||||
```
|
||||
|
||||
##### `predicate`
|
||||
|
||||
`predicate` is the data filter specified in the query and used for row filtering when scanning Parquet files.
|
||||
|
||||
For example, given the following SQL query:
|
||||
|
||||
```sql
|
||||
SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
The `predicate` value is the boolean expression in the `WHERE` statement:
|
||||
|
||||
```text
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA
|
||||
```
|
||||
|
||||
##### `pruning predicate`
|
||||
|
||||
`pruning_predicate` is created from the [`predicate`](#predicate) value and is used for pruning data and files from the chosen partitions.
|
||||
|
||||
For example, given the following `predicate` parsed from the SQL:
|
||||
|
||||
```text
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA,
|
||||
```
|
||||
|
||||
The Querier creates the following `pruning_predicate`:
|
||||
|
||||
```text
|
||||
pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3
|
||||
```
|
||||
|
||||
The default filters files by `time`.
|
||||
|
||||
_Before the physical plan is generated, an additional `partition pruning` step uses predicates on partitioning columns to prune partitions._
|
||||
|
||||
### `ProjectionExec`
|
||||
|
||||
DataFusion [`ProjectionExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/projection/struct.ProjectionExec.html) evaluates an arbitrary list of expressions on the input; the execution plan for the [`Projection`](#projection) `LogicalPlan`.
|
||||
|
||||
### `RecordBatchesExec`
|
||||
|
||||
The InfluxDB `RecordBatchesExec` implementation retrieves and scans recently written, yet-to-be-persisted, data from the InfluxDB v3 [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester).
|
||||
|
||||
When generating the plan, the [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) sends the query criteria, such as database, table, and columns, to the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester) to retrieve data not yet persisted to Parquet files.
|
||||
If the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester) has data that meets the criteria (the chunk size is non-zero), then the plan includes `RecordBatchesExec`.
|
||||
|
||||
#### `RecordBatchesExec` attributes
|
||||
|
||||
##### `chunks`
|
||||
|
||||
`chunks` is the number of data chunks from the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester).
|
||||
Often one (`1`), but it can be many.
|
||||
|
||||
##### `projection`
|
||||
|
||||
`projection` specifies a list of columns to read and output.
|
||||
|
||||
`__chunk_order` in a list of columns is an InfluxDB-generated column used to keep the chunks and files ordered for deduplication--for example:
|
||||
|
||||
```text
|
||||
projection=[__chunk_order, city, state, time]
|
||||
```
|
||||
|
||||
For details and other DataFusion `ExecutionPlan` implementations, see [`Struct datafusion::datasource::physical_plan` implementors](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html) in the DataFusion documentation.
|
||||
|
||||
### `SortExec`
|
||||
|
||||
The execution plan for the [`Sort`](#sort) `LogicalPlan`.
|
||||
|
||||
DataFusion [`SortExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/sorts/sort/struct.SortExec.html) supports sorting datasets that are larger than the memory allotted by the memory manager, by spilling to disk.
|
||||
|
||||
### `SortPreservingMergeExec`
|
||||
|
||||
DataFusion [`SortPreservingMergeExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/sorts/sort_preserving_merge/struct.SortPreservingMergeExec.html) takes an input execution plan and a list of sort expressions and, provided each partition of the input plan is sorted with respect to these sort expressions, yields a single partition sorted with respect to them.
|
||||
|
||||
#### `UnionExec`
|
||||
|
||||
DataFusion [`UnionExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/union/struct.UnionExec.html) is the `UNION ALL` execution plan for combining multiple inputs that have the same schema.
|
||||
`UnionExec` concatenates the partitions and does not mix or copy data within or across partitions.
|
||||
|
||||
## Overlapping data and deduplication
|
||||
|
||||
_Overlapping data_ refers to files or batches in which the time ranges (represented by timestamps) intersect.
|
||||
Two _chunks_ of data overlap if both chunks contain data for the same portion of time.
|
||||
|
||||
### Example of overlapping data
|
||||
|
||||
For example, the following chunks represent line protocol written to InfluxDB:
|
||||
|
||||
```text
|
||||
// Chunk 4: stored parquet file
|
||||
// - time range: 400-600
|
||||
// - no duplicates in its own chunk
|
||||
// - overlaps chunk 3
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600",
|
||||
"h2o,state=CA,city=SJ min_temp=69.5,max_temp=89.2 600", // duplicates row 3 in chunk 5
|
||||
"h2o,state=MA,city=Bedford max_temp=80.75,area=742u 400", // overlaps chunk 3
|
||||
"h2o,state=MA,city=Boston min_temp=65.40,max_temp=82.67 400", // overlaps chunk 3
|
||||
],
|
||||
|
||||
// Chunk 5: Ingester data
|
||||
// - time range: 550-700
|
||||
// - overlaps & duplicates data in chunk 4
|
||||
[
|
||||
"h2o,state=MA,city=Bedford max_temp=88.75,area=742u 600", // overlaps chunk 4
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 650",
|
||||
"h2o,state=CA,city=SJ min_temp=68.5,max_temp=90.0 600", // duplicates row 2 in chunk 4
|
||||
"h2o,state=CA,city=SJ min_temp=75.5,max_temp=84.08 700",
|
||||
"h2o,state=MA,city=Boston min_temp=67.4 550", // overlaps chunk 4
|
||||
]
|
||||
```
|
||||
|
||||
- `Chunk 4` spans the time range `400-600` and represents data persisted to a Parquet file in the [Object store](/influxdb/cloud-dedicated/reference/internals/storage-engine/#object-store).
|
||||
- `Chunk 5` spans the time range `550-700` and represents yet-to-be persisted data from the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester).
|
||||
- The chunks overlap the range `550-600`.
|
||||
|
||||
If data overlaps at query time, the [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) must include the _deduplication_ process in the query plan, which uses the same multi-column sort-merge operators used by the [Ingester](/influxdb/cloud-dedicated/reference/internals/storage-engine/#ingester).
|
||||
Compared to an ingestion plan that uses sort-merge operators, a query plan is more complex and ensures that data streams through the plan after deduplication.
|
||||
|
||||
Because sort-merge operations used in deduplication have a non-trivial execution cost, InfluxDB v3 tries to avoid the need for deduplication.
|
||||
Due to how InfluxDB organizes data, a Parquet file never contains duplicates of the data it stores; only overlapped data can contain duplicates.
|
||||
During compaction, the [Compactor](/influxdb/cloud-dedicated/reference/internals/storage-engine/#compactor) sorts stored data to reduce overlaps and optimize query performance.
|
||||
For data that doesn't have overlaps, the [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) doesn't need to include the deduplication process and the query plan can further distribute non-overlapping data for parallel processing.
|
||||
|
||||
## DataFusion query plans
|
||||
|
||||
For more information about DataFusion query plans and the DataFusion API used in InfluxDB v3, see the following:
|
||||
|
||||
- [Query Planning and Execution Overview](https://docs.rs/datafusion/latest/datafusion/index.html#query-planning-and-execution-overview) in the DataFusion documentation.
|
||||
- [Plan representations](https://docs.rs/datafusion/latest/datafusion/#plan-representations) in the DataFusion documentation.
|
|
@ -1,31 +1,41 @@
|
|||
---
|
||||
title: EXPLAIN command
|
||||
description: >
|
||||
The `EXPLAIN` command shows the logical and physical execution plan for the
|
||||
specified SQL statement.
|
||||
The `EXPLAIN` command returns the logical and physical execution plans for the specified SQL statement.
|
||||
menu:
|
||||
influxdb_cloud_dedicated:
|
||||
name: EXPLAIN command
|
||||
parent: SQL reference
|
||||
weight: 207
|
||||
related:
|
||||
- /influxdb/cloud-dedicated/reference/internals/query-plan/
|
||||
- /influxdb/cloud-dedicated/query-data/execute-queries/analyze-query-plan/
|
||||
- /influxdb/cloud-dedicated/query-data/execute-queries/troubleshoot/
|
||||
---
|
||||
|
||||
The `EXPLAIN` command returns the logical and physical execution plan for the
|
||||
The `EXPLAIN` command returns the [logical plan](/influxdb/cloud-dedicated/reference/internals/query-plan/#logical-plan) and the [physical plan](/influxdb/cloud-dedicated/reference/internals/query-plan/#physical-plan) for the
|
||||
specified SQL statement.
|
||||
|
||||
```sql
|
||||
EXPLAIN [ANALYZE] [VERBOSE] statement
|
||||
```
|
||||
|
||||
- [EXPLAIN](#explain)
|
||||
- [EXPLAIN ANALYZE](#explain-analyze)
|
||||
- [`EXPLAIN`](#explain)
|
||||
- [Example `EXPLAIN`](#example-explain)
|
||||
- [`EXPLAIN ANALYZE`](#explain-analyze)
|
||||
- [Example `EXPLAIN ANALYZE`](#example-explain-analyze)
|
||||
- [`EXPLAIN ANALYZE VERBOSE`](#explain-analyze-verbose)
|
||||
- [Example `EXPLAIN ANALYZE VERBOSE`](#example-explain-analyze-verbose)
|
||||
|
||||
## EXPLAIN
|
||||
## `EXPLAIN`
|
||||
|
||||
Returns the execution plan of a statement.
|
||||
Returns the logical plan and physical (execution) plan of a statement.
|
||||
To output more details, use `EXPLAIN VERBOSE`.
|
||||
|
||||
##### Example EXPLAIN ANALYZE
|
||||
`EXPLAIN` doesn't execute the statement.
|
||||
To execute the statement and view runtime metrics, use [`EXPLAIN ANALYZE`](#explain-analyze).
|
||||
|
||||
### Example `EXPLAIN`
|
||||
|
||||
```sql
|
||||
EXPLAIN
|
||||
|
@ -39,20 +49,30 @@ GROUP BY room
|
|||
{{< expand-wrapper >}}
|
||||
{{% expand "View `EXPLAIN` example output" %}}
|
||||
|
||||
| plan_type | plan |
|
||||
| :------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| logical_plan | Projection: home.room, AVG(home.temp) AS temp Aggregate: groupBy=[[home.room]], aggr=[[AVG(home.temp)]] TableScan: home projection=[room, temp] |
|
||||
| physical_plan | ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp] AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)] CoalesceBatchesExec: target_batch_size=8192 RepartitionExec: partitioning=Hash([Column { name: "room", index: 0 }], 4), input_partitions=4 RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)] ParquetExec: limit=None, partitions={1 group: [[136/316/1120/1ede0031-e86e-06e5-12ba-b8e6fd76a202.parquet]]}, projection=[room, temp] |
|
||||
| | plan_type | plan |
|
||||
|---:|:--------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| 0 | logical_plan |<span style="white-space:pre-wrap;"> Projection: home.room, AVG(home.temp) AS temp </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> Aggregate: groupBy=[[home.room]], aggr=[[AVG(home.temp)]] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> TableScan: home projection=[room, temp] </span>|
|
||||
| 1 | physical_plan |<span style="white-space:pre-wrap;"> ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> CoalesceBatchesExec: target_batch_size=8192 </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> RepartitionExec: partitioning=Hash([room@0], 8), input_partitions=8 </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> ParquetExec: file_groups={8 groups: [[70434/116281/404d73cea0236530ea94f5470701eb814a8f0565c0e4bef5a2d2e33dfbfc3567/1be334e8-0af8-00da-2615-f67cd4be90f7.parquet, 70434/116281/b7a9e7c57fbfc3bba9427e4b3e35c89e001e2e618b0c7eb9feb4d50a3932f4db/d29370d4-262f-0d32-2459-fe7b099f682f.parquet], [70434/116281/c14418ba28a22a3abb693a1cb326a63b62dc611aec58c9bed438fdafd3bc5882/8b29ae98-761f-0550-2fe4-ee77503658e9.parquet], [70434/116281/fa677477eed622ae8123da1251aa7c351f801e2ee2f0bc28c0fe3002a30b3563/65bb4dc3-04e1-0e02-107a-90cee83c51b0.parquet], [70434/116281/db162bdd30261019960dd70da182e6ebd270284569ecfb5deffea7e65baa0df9/2505e079-67c5-06d9-3ede-89aca542dd18.parquet], [70434/116281/0c025dcccae8691f5fd70b0f131eea4ca6fafb95a02f90a3dc7bb015efd3ab4f/3f3e44c3-b71e-0ca4-3dc7-8b2f75b9ff86.parquet], ...]}, projection=[room, temp] </span>|
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
## EXPLAIN ANALYZE
|
||||
## `EXPLAIN ANALYZE`
|
||||
|
||||
Returns the execution plan and metrics of a statement.
|
||||
To output more information, use `EXPLAIN ANALYZE VERBOSE`.
|
||||
Executes a statement and returns the execution plan and runtime metrics of the statement.
|
||||
The report includes the [logical plan](/influxdb/cloud-dedicated/reference/internals/query-plan/#logical-plan) and the [physical plan](/influxdb/cloud-dedicated/reference/internals/query-plan/#physical-plan) annotated with execution counters, number of rows produced, and runtime metrics sampled during the query execution.
|
||||
|
||||
##### Example EXPLAIN ANALYZE
|
||||
If the plan requires reading lots of data files, `EXPLAIN` and `EXPLAIN ANALYZE` may truncate the list of files in the report.
|
||||
To output more information, including intermediate plans and paths for all scanned Parquet files, use [`EXPLAIN ANALYZE VERBOSE`](#explain-analyze-verbose).
|
||||
|
||||
### Example `EXPLAIN ANALYZE`
|
||||
|
||||
```sql
|
||||
EXPLAIN ANALYZE
|
||||
|
@ -60,15 +80,44 @@ SELECT
|
|||
room,
|
||||
avg(temp) AS temp
|
||||
FROM home
|
||||
WHERE time >= '2023-01-01' AND time <= '2023-12-31'
|
||||
GROUP BY room
|
||||
```
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View `EXPLAIN ANALYZE` example output" %}}
|
||||
|
||||
| plan_type | plan |
|
||||
| :---------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Plan with Metrics | CoalescePartitionsExec, metrics=[output_rows=2, elapsed_compute=8.892µs, spill_count=0, spilled_bytes=0, mem_used=0] ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp], metrics=[output_rows=2, elapsed_compute=3.608µs, spill_count=0, spilled_bytes=0, mem_used=0] AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)], metrics=[output_rows=2, elapsed_compute=121.771µs, spill_count=0, spilled_bytes=0, mem_used=0] CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=2, elapsed_compute=23.711µs, spill_count=0, spilled_bytes=0, mem_used=0] RepartitionExec: partitioning=Hash([Column { name: "room", index: 0 }], 4), input_partitions=4, metrics=[repart_time=25.117µs, fetch_time=1.614597ms, send_time=6.705µs] RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1, metrics=[repart_time=1ns, fetch_time=319.754µs, send_time=2.067µs] AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)], metrics=[output_rows=2, elapsed_compute=75.615µs, spill_count=0, spilled_bytes=0, mem_used=0] ParquetExec: limit=None, partitions={1 group: [[136/316/1120/1ede0031-e86e-06e5-12ba-b8e6fd76a202.parquet]]}, projection=[room, temp], metrics=[output_rows=26, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, pushdown_rows_filtered=0, bytes_scanned=290, row_groups_pruned=0, num_predicate_creation_errors=0, predicate_evaluation_errors=0, page_index_rows_filtered=0, time_elapsed_opening=100.37µs, page_index_eval_time=2ns, time_elapsed_scanning_total=157.086µs, time_elapsed_processing=226.644µs, pushdown_eval_time=2ns, time_elapsed_scanning_until_data=116.875µs] |
|
||||
| | plan_type | plan |
|
||||
|---:|:------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| 0 | Plan with Metrics |<span style="white-space:pre-wrap;"> ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp], metrics=[output_rows=2, elapsed_compute=4.768µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)], ordering_mode=Sorted, metrics=[output_rows=2, elapsed_compute=140.405µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=2, elapsed_compute=6.821µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> RepartitionExec: partitioning=Hash([room@0], 8), input_partitions=8, preserve_order=true, sort_exprs=room@0 ASC, metrics=[output_rows=2, elapsed_compute=18.408µs, repart_time=59.698µs, fetch_time=1.057882762s, send_time=5.83µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)], ordering_mode=Sorted, metrics=[output_rows=2, elapsed_compute=137.577µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=6, preserve_order=true, sort_exprs=room@0 ASC, metrics=[output_rows=46, elapsed_compute=26.637µs, repart_time=6ns, fetch_time=399.971411ms, send_time=6.658µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> ProjectionExec: expr=[room@0 as room, temp@2 as temp], metrics=[output_rows=46, elapsed_compute=3.102µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=46, elapsed_compute=25.585µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> FilterExec: time@1 >= 1672531200000000000 AND time@1 <= 1703980800000000000, metrics=[output_rows=46, elapsed_compute=26.51µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> ParquetExec: file_groups={6 groups: [[70434/116281/404d73cea0236530ea94f5470701eb814a8f0565c0e4bef5a2d2e33dfbfc3567/1be334e8-0af8-00da-2615-f67cd4be90f7.parquet], [70434/116281/c14418ba28a22a3abb693a1cb326a63b62dc611aec58c9bed438fdafd3bc5882/8b29ae98-761f-0550-2fe4-ee77503658e9.parquet], [70434/116281/fa677477eed622ae8123da1251aa7c351f801e2ee2f0bc28c0fe3002a30b3563/65bb4dc3-04e1-0e02-107a-90cee83c51b0.parquet], [70434/116281/db162bdd30261019960dd70da182e6ebd270284569ecfb5deffea7e65baa0df9/2505e079-67c5-06d9-3ede-89aca542dd18.parquet], [70434/116281/0c025dcccae8691f5fd70b0f131eea4ca6fafb95a02f90a3dc7bb015efd3ab4f/3f3e44c3-b71e-0ca4-3dc7-8b2f75b9ff86.parquet], ...]}, projection=[room, time, temp], output_ordering=[room@0 ASC, time@1 ASC], predicate=time@6 >= 1672531200000000000 AND time@6 <= 1703980800000000000, pruning_predicate=time_max@0 >= 1672531200000000000 AND time_min@1 <= 1703980800000000000, required_guarantees=[], metrics=[output_rows=46, elapsed_compute=6ns, predicate_evaluation_errors=0, bytes_scanned=3279, row_groups_pruned_statistics=0, file_open_errors=0, file_scan_errors=0, pushdown_rows_filtered=0, num_predicate_creation_errors=0, row_groups_pruned_bloom_filter=0, page_index_rows_filtered=0, time_elapsed_opening=398.462968ms, time_elapsed_processing=1.626106ms, time_elapsed_scanning_total=1.36822ms, page_index_eval_time=33.474µs, pushdown_eval_time=14.267µs, time_elapsed_scanning_until_data=1.27694ms] </span>|
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
## `EXPLAIN ANALYZE VERBOSE`
|
||||
|
||||
Executes a statement and returns the execution plan, runtime metrics, and additional details helpful for debugging the statement.
|
||||
|
||||
The report includes the following:
|
||||
|
||||
- the [logical plan](/influxdb/cloud-dedicated/reference/internals/query-plan/#logical-plan)
|
||||
- the [physical plan](/influxdb/cloud-dedicated/reference/internals/query-plan/#physical-plan) annotated with execution counters, number of rows produced, and runtime metrics sampled during the query execution
|
||||
- Information truncated in the `EXPLAIN` report--for example, the paths for all [Parquet files retrieved for the query](/influxdb/cloud-dedicated/reference/internals/query-plan/#file_groups).
|
||||
- All intermediate physical plans that DataFusion and the [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) generate before generating the final physical plan--helpful in debugging to see when an [`ExecutionPlan` node](/influxdb/cloud-dedicated/reference/internals/query-plan/#executionplan-nodes) is added or removed, and how InfluxDB optimizes the query.
|
||||
|
||||
### Example `EXPLAIN ANALYZE VERBOSE`
|
||||
|
||||
```SQL
|
||||
EXPLAIN ANALYZE VERBOSE SELECT temp FROM home
|
||||
WHERE time >= now() - INTERVAL '7 days' AND room = 'Kitchen'
|
||||
ORDER BY time
|
||||
```
|
||||
|
|
|
@ -13,7 +13,6 @@ menu:
|
|||
Use the following guidelines to design your [schema](/influxdb/cloud-dedicated/reference/glossary/#schema)
|
||||
for simpler and more performant queries.
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [InfluxDB data structure](#influxdb-data-structure)
|
||||
- [Primary keys](#primary-keys)
|
||||
|
@ -23,16 +22,13 @@ for simpler and more performant queries.
|
|||
- [Measurements can contain up to 250 columns](#measurements-can-contain-up-to-250-columns)
|
||||
- [Design for performance](#design-for-performance)
|
||||
- [Avoid wide schemas](#avoid-wide-schemas)
|
||||
- [Avoid too many tags](#avoid-too-many-tags)
|
||||
- [Avoid sparse schemas](#avoid-sparse-schemas)
|
||||
- [Writing individual fields with different timestamps](#writing-individual-fields-with-different-timestamps)
|
||||
- [Measurement schemas should be homogenous](#measurement-schemas-should-be-homogenous)
|
||||
- [Use the best data type for your data](#use-the-best-data-type-for-your-data)
|
||||
- [Design for query simplicity](#design-for-query-simplicity)
|
||||
- [Keep measurement names, tags, and fields simple](#keep-measurement-names-tags-and-fields-simple)
|
||||
- [Avoid keywords and special characters](#avoid-keywords-and-special-characters)
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
## InfluxDB data structure
|
||||
|
||||
The InfluxDB data model organizes time series data into buckets and measurements.
|
||||
|
@ -120,6 +116,7 @@ The following guidelines help to optimize query performance:
|
|||
- [Avoid wide schemas](#avoid-wide-schemas)
|
||||
- [Avoid sparse schemas](#avoid-sparse-schemas)
|
||||
- [Measurement schemas should be homogenous](#measurement-schemas-should-be-homogenous)
|
||||
- [Use the best data type for your data](#use-the-best-data-type-for-your-data)
|
||||
|
||||
|
||||
### Avoid wide schemas
|
||||
|
@ -208,7 +205,7 @@ different sources and each source returns data with different tag and field sets
|
|||
{{% /flex-content %}}
|
||||
{{< /flex >}}
|
||||
|
||||
These sets of data written to the same measurement will result in a measurement
|
||||
These sets of data written to the same measurement result in a measurement
|
||||
full of null values (also known as a _sparse schema_):
|
||||
|
||||
| time | source | src | code | currency | crypto | price | cost | volume |
|
||||
|
@ -225,6 +222,12 @@ full of null values (also known as a _sparse schema_):
|
|||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
### Use the best data type for your data
|
||||
|
||||
When writing data to a field, use the most appropriate [data type](/influxdb/cloud-dedicated/reference/glossary/#data-type) for your data--write integers as integers, decimals as floats, and booleans as booleans.
|
||||
A query against a field that stores integers outperforms a query against string data;
|
||||
querying over many long string values can negatively affect performance.
|
||||
|
||||
## Design for query simplicity
|
||||
|
||||
Naming conventions for measurements, tag keys, and field keys can simplify or
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
StylesPath = "../../../.ci/vale/styles"
|
||||
|
||||
Vocab = InfluxData, Cloud-Serverless
|
||||
Vocab = Cloud-Serverless
|
||||
|
||||
MinAlertLevel = warning
|
||||
|
||||
|
|
|
@ -1,105 +0,0 @@
|
|||
---
|
||||
title: Optimize queries
|
||||
description: >
|
||||
Optimize your SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
|
||||
weight: 401
|
||||
menu:
|
||||
influxdb_cloud_serverless:
|
||||
name: Optimize queries
|
||||
parent: Query data
|
||||
influxdb/cloud-serverless/tags: [query, sql, influxql]
|
||||
related:
|
||||
- /influxdb/cloud-serverless/query-data/sql/
|
||||
- /influxdb/cloud-serverless/query-data/influxql/
|
||||
- /influxdb/cloud-serverless/query-data/execute-queries/troubleshoot/
|
||||
- /influxdb/cloud-serverless/reference/client-libraries/v3/
|
||||
aliases:
|
||||
- /influxdb/cloud-serverless/query-data/execute-queries/optimize-queries/
|
||||
---
|
||||
|
||||
## Troubleshoot query performance
|
||||
|
||||
Use the following tools to help you identify performance bottlenecks and troubleshoot problems in queries:
|
||||
|
||||
- [Troubleshoot query performance](#troubleshoot-query-performance)
|
||||
- [EXPLAIN and ANALYZE](#explain-and-analyze)
|
||||
- [Enable trace logging](#enable-trace-logging)
|
||||
|
||||
### EXPLAIN and ANALYZE
|
||||
|
||||
To view the query engine's execution plan and metrics for an SQL query, prepend [`EXPLAIN`](/influxdb/cloud-serverless/reference/sql/explain/) or [`EXPLAIN ANALYZE`](/influxdb/cloud-serverless/reference/sql/explain/#explain-analyze) to the query.
|
||||
The report can reveal query bottlenecks such as a large number of table scans or parquet files, and can help triage the question, "Is the query slow due to the amount of work required or due to a problem with the schema, compactor, etc.?"
|
||||
|
||||
The following example shows how to use the InfluxDB v3 Python client library and pandas to view `EXPLAIN` and `EXPLAIN ANALYZE` results for a query:
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
{{% code-placeholders "BUCKET_NAME|API_TOKEN|APP_REQUEST_ID" %}}
|
||||
```python
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import pandas as pd
|
||||
import tabulate # Required for pandas.to_markdown()
|
||||
|
||||
def explain_and_analyze():
|
||||
print('Use SQL EXPLAIN and ANALYZE to view query plan information.')
|
||||
|
||||
# Instantiate an InfluxDB client.
|
||||
client = InfluxDBClient3(token = f"API_TOKEN",
|
||||
host = f"{{< influxdb/host >}}",
|
||||
database = f"BUCKET_NAME")
|
||||
|
||||
sql_explain = '''EXPLAIN SELECT *
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '90 days'
|
||||
ORDER BY time'''
|
||||
|
||||
table = client.query(sql_explain)
|
||||
df = table.to_pandas()
|
||||
|
||||
sql_explain_analyze = '''EXPLAIN ANALYZE SELECT *
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '90 days'
|
||||
ORDER BY time'''
|
||||
|
||||
table = client.query(sql_explain_analyze)
|
||||
|
||||
# Combine the Dataframes and output the plan information.
|
||||
df = pd.concat([df, table.to_pandas()])
|
||||
|
||||
assert df.shape == (3, 2) and df.columns.to_list() == ['plan_type', 'plan']
|
||||
print(df[['plan_type', 'plan']].to_markdown(index=False))
|
||||
|
||||
client.close()
|
||||
|
||||
explain_and_analyze()
|
||||
```
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following:
|
||||
|
||||
- {{% code-placeholder-key %}}`BUCKET_NAME`{{% /code-placeholder-key %}}: the [bucket](/influxdb/cloud-serverless/admin/buckets/) to query
|
||||
- {{% code-placeholder-key %}}`API_TOKEN`{{% /code-placeholder-key %}}: a [token](/influxdb/cloud-serverless/admin/tokens/) with sufficient permissions to the specified database
|
||||
|
||||
The output is similar to the following:
|
||||
|
||||
```markdown
|
||||
| plan_type | plan |
|
||||
|:------------------|:---------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| logical_plan | Sort: home.time ASC NULLS LAST |
|
||||
| | TableScan: home projection=[co, hum, room, sensor, temp, time], full_filters=[home.time >= TimestampNanosecond(1688491380936276013, None)] |
|
||||
| physical_plan | SortExec: expr=[time@5 ASC NULLS LAST] |
|
||||
| | EmptyExec: produce_one_row=false |
|
||||
| Plan with Metrics | SortExec: expr=[time@5 ASC NULLS LAST], metrics=[output_rows=0, elapsed_compute=1ns, spill_count=0, spilled_bytes=0] |
|
||||
| | EmptyExec: produce_one_row=false, metrics=[]
|
||||
```
|
||||
|
||||
### Enable trace logging
|
||||
|
||||
Customers with an {{% product-name %}} [annual or support contract](https://www.influxdata.com/influxdb-cloud-pricing/) can [contact InfluxData Support](https://support.influxdata.com/) to enable tracing and request help troubleshooting your query.
|
||||
With tracing enabled, InfluxDB Support can trace system processes and analyze log information for a query instance.
|
||||
The tracing system follows the [OpenTelemetry traces](https://opentelemetry.io/docs/concepts/signals/traces/) model for providing observability into a request.
|
|
@ -0,0 +1,23 @@
|
|||
---
|
||||
title: Troubleshoot and optimize queries
|
||||
description: >
|
||||
Troubleshoot errors and optimize performance for SQL and InfluxQL queries in InfluxDB.
|
||||
Use observability tools to view query execution and metrics.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_cloud_serverless:
|
||||
name: Troubleshoot and optimize queries
|
||||
parent: Query data
|
||||
influxdb/cloud-serverless/tags: [query, performance, observability, errors, sql, influxql]
|
||||
related:
|
||||
- /influxdb/cloud-serverless/query-data/sql/
|
||||
- /influxdb/cloud-serverless/query-data/influxql/
|
||||
aliases:
|
||||
- /influxdb/cloud-serverless/query-data/execute-queries/troubleshoot/
|
||||
|
||||
---
|
||||
|
||||
Troubleshoot errors and optimize performance for SQL and InfluxQL queries in {{% product-name %}}.
|
||||
Use observability tools to view query execution and metrics.
|
||||
|
||||
{{< children >}}
|
|
@ -0,0 +1,769 @@
|
|||
---
|
||||
title: Analyze a query plan
|
||||
description: >
|
||||
Learn how to read and analyze a query plan to
|
||||
understand how a query is executed and find performance bottlenecks.
|
||||
weight: 401
|
||||
menu:
|
||||
influxdb_cloud_serverless:
|
||||
name: Analyze a query plan
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/cloud-serverless/tags: [query, sql, influxql, observability, query plan]
|
||||
related:
|
||||
- /influxdb/cloud-serverless/query-data/sql/
|
||||
- /influxdb/cloud-serverless/query-data/influxql/
|
||||
- /influxdb/cloud-serverless/reference/internals/query-plans/
|
||||
---
|
||||
|
||||
Learn how to read and analyze a [query plan](/influxdb/cloud-serverless/reference/glossary/#query-plan) to
|
||||
understand query execution steps and data organization, and find performance bottlenecks.
|
||||
|
||||
When you query InfluxDB v3, the Querier devises a query plan for executing the query.
|
||||
The engine tries to determine the optimal plan for the query structure and data.
|
||||
By learning how to generate and interpret reports for the query plan,
|
||||
you can better understand how the query is executed and identify bottlenecks that affect the performance of your query.
|
||||
|
||||
For example, if the query plan reveals that your query reads a large number of Parquet files,
|
||||
you can then take steps to [optimize your query](/influxdb/cloud-serverless/query-data/optimize-queries/), such as add filters to read less data.
|
||||
|
||||
- [Use EXPLAIN keywords to view a query plan](#use-explain-keywords-to-view-a-query-plan)
|
||||
- [Read an EXPLAIN report](#read-an-explain-report)
|
||||
- [Read a query plan](#read-a-query-plan)
|
||||
- [Example physical plan for a SELECT - ORDER BY query](#example-physical-plan-for-a-select---order-by-query)
|
||||
- [Example `EXPLAIN` report for an empty result set](#example-explain-report-for-an-empty-result-set)
|
||||
- [Analyze a query plan for leading edge data](#analyze-a-query-plan-for-leading-edge-data)
|
||||
- [Sample data](#sample-data)
|
||||
- [Sample query](#sample-query)
|
||||
- [EXPLAIN report for the leading edge data query](#explain-report-for-the-leading-edge-data-query)
|
||||
- [Locate the physical plan](#locate-the-physical-plan)
|
||||
- [Read the physical plan](#read-the-physical-plan)
|
||||
- [Data scanning nodes (ParquetExec and RecordBatchesExec)](#data-scanning-nodes-parquetexec-and-recordbatchesexec)
|
||||
- [Analyze branch structures](#analyze-branch-structures)
|
||||
|
||||
## Use EXPLAIN keywords to view a query plan
|
||||
|
||||
Use the `EXPLAIN` keyword (and the optional [`ANALYZE`](/influxdb/cloud-serverless/reference/sql/explain/#explain-analyze) and [`VERBOSE`](/influxdb/cloud-serverless/reference/sql/explain/#explain-analyze-verbose) keywords) to view the query plans for a query.
|
||||
|
||||
{{% expand-wrapper %}}
|
||||
{{% expand "Use Python and pandas to view an EXPLAIN report" %}}
|
||||
|
||||
The following example shows how to use the InfluxDB v3 Python client library and pandas to view the `EXPLAIN` report for a query:
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
{{% code-placeholders "DATABASE_(NAME|TOKEN)" %}}
|
||||
|
||||
```python
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import pandas as pd
|
||||
import tabulate # Required for pandas.to_markdown()
|
||||
|
||||
# Instantiate an InfluxDB client.
|
||||
client = InfluxDBClient3(token = f"TOKEN",
|
||||
host = f"{{< influxdb/host >}}",
|
||||
database = f"BUCKET_NAME")
|
||||
|
||||
sql_explain = '''EXPLAIN
|
||||
SELECT temp
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '7 days'
|
||||
AND room = 'Kitchen'
|
||||
ORDER BY time'''
|
||||
|
||||
table = client.query(sql_explain)
|
||||
df = table.to_pandas()
|
||||
print(df.to_markdown(index=False))
|
||||
|
||||
assert df.shape == (2, 2), f'Expect {df.shape} to have 2 columns, 2 rows'
|
||||
assert 'physical_plan' in df.plan_type.values, "Expect physical_plan"
|
||||
assert 'logical_plan' in df.plan_type.values, "Expect logical_plan"
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following:
|
||||
|
||||
- {{% code-placeholder-key %}}`BUCKET_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} bucket
|
||||
- {{% code-placeholder-key %}}`TOKEN`{{% /code-placeholder-key %}}: a [token](/influxdb/cloud-serverless/admin/tokens/) with sufficient permissions to the specified bucket
|
||||
|
||||
{{% /expand %}}
|
||||
{{% /expand-wrapper %}}
|
||||
|
||||
## Read an EXPLAIN report
|
||||
|
||||
When you [use `EXPLAIN` keywords to view a query plan](#use-explain-keywords-to-view-a-query-plan), the report contains the following:
|
||||
|
||||
- two columns: `plan_type` and `plan`
|
||||
- one row for the [logical plan](/influxdb/cloud-serverless/reference/internals/query-plans/#logical-plan) (`logical_plan`)
|
||||
- one row for the [physical plan](/influxdb/cloud-serverless/reference/internals/query-plans/#physical-plan) (`physical_plan`)
|
||||
|
||||
## Read a query plan
|
||||
|
||||
Plans are in _tree format_--each plan is an upside-down tree in which
|
||||
execution and data flow from _leaf nodes_, the innermost steps in the plan, to outer _branch nodes_.
|
||||
Whether reading a logical or physical plan, keep the following in mind:
|
||||
|
||||
- Start at the the _leaf nodes_ and read upward.
|
||||
- At the top of the plan, the _root node_ represents the final, encompassing execution step.
|
||||
|
||||
In a [physical plan](/influxdb/cloud-serverless/reference/internals/query-plan/#physical-plan), each step is an [`ExecutionPlan` node](/influxdb/cloud-serverless/reference/internals/query-plan/#executionplan-nodes) that receives expressions for input data and output requirements, and computes a partition of data.
|
||||
|
||||
Use the following steps to analyze a query plan and estimate how much work is required to complete the query.
|
||||
The same steps apply regardless of how large or complex the plan might seem.
|
||||
|
||||
1. Start from the furthest indented steps (the _leaf nodes_), and read upward.
|
||||
2. Understand the job of each [`ExecutionPlan` node](/influxdb/cloud-serverless/reference/internals/query-plan/#executionplan-nodes)--for example, a [`UnionExec`](/influxdb/cloud-serverless/reference/internals/query-plan/#unionexec) node encompassing the leaf nodes means that the `UnionExec` concatenates the output of all the leaves.
|
||||
3. For each expression, answer the following questions:
|
||||
- What is the shape and size of data input to the plan?
|
||||
- What is the shape and size of data output from the plan?
|
||||
|
||||
The remainder of this guide walks you through analyzing a physical plan.
|
||||
Understanding the sequence, role, input, and output of nodes in your query plan can help you estimate the overall workload and find potential bottlenecks in the query.
|
||||
|
||||
### Example physical plan for a SELECT - ORDER BY query
|
||||
|
||||
The following example shows how to read an `EXPLAIN` report and a physical query plan.
|
||||
|
||||
Given `h20` measurement data and the following query:
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;
|
||||
```
|
||||
|
||||
The output is similar to the following:
|
||||
|
||||
#### EXPLAIN report
|
||||
|
||||
```sql
|
||||
| plan_type | plan |
|
||||
+---------------+--------------------------------------------------------------------------+
|
||||
| logical_plan | Sort: h2o.city ASC NULLS LAST, h2o.time DESC NULLS FIRST |
|
||||
| | TableScan: h2o projection=[city, min_temp, time] |
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | UnionExec |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
Output from `EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;`
|
||||
{{% /caption %}}
|
||||
|
||||
Each step, or _node_, in the physical plan is an `ExecutionPlan` name and the key-value _expressions_ that contain relevant parts of the query--for example, the first node in the [`EXPLAIN` report](#explain-report) physical plan is a `ParquetExec` execution plan:
|
||||
|
||||
```text
|
||||
ParquetExec: file_groups={...}, projection=[city, min_temp, time]
|
||||
```
|
||||
|
||||
Because `ParquetExec` and `RecordBatchesExec` nodes retrieve and scan data in InfluxDB queries, every query plan starts with one or more of these nodes.
|
||||
|
||||
#### Physical plan data flow
|
||||
|
||||
Data flows _up_ in a query plan.
|
||||
|
||||
The following diagram shows the data flow and sequence of nodes in the [`EXPLAIN` report](#explain-report) physical plan:
|
||||
|
||||
<!-- BEGIN Query plan diagram -->
|
||||
{{< html-diagram/query-plan >}}
|
||||
<!-- END Query plan diagram -->
|
||||
|
||||
{{% caption %}}
|
||||
Execution and data flow in the [`EXPLAIN` report](#explain-report) physical plan.
|
||||
`ParquetExec` nodes execute in parallel and `UnionExec` combines their output.
|
||||
{{% /caption %}}
|
||||
|
||||
The following steps summarize the [physical plan execution and data flow](#physical-plan-data-flow):
|
||||
|
||||
1. Two `ParquetExec` plans, in parallel, read data from Parquet files:
|
||||
- Each `ParquetExec` node processes one or more _file groups_.
|
||||
- Each file group contains one or more Parquet file paths.
|
||||
- A `ParquetExec` node processes its groups in parallel, reading each group's files sequentially.
|
||||
- The output is a stream of data to the corresponding `SortExec` node.
|
||||
2. The `SortExec` nodes, in parallel, sort the data by `city` (ascending) and `time` (descending). Sorting is required by the `SortPreservingMergeExec` plan.
|
||||
3. The `UnionExec` node concatenates the streams to union the output of the parallel `SortExec` nodes.
|
||||
4. The `SortPreservingMergeExec` node merges the previously sorted and unioned data from `UnionExec`.
|
||||
|
||||
### Example `EXPLAIN` report for an empty result set
|
||||
|
||||
If your table doesn't contain data for the time range in your query, the physical plan starts with an `EmptyExec` leaf node--for example:
|
||||
|
||||
{{% code-callout "EmptyExec"%}}
|
||||
|
||||
```sql
|
||||
ProjectionExec: expr=[temp@0 as temp]
|
||||
SortExec: expr=[time@1 ASC NULLS LAST]
|
||||
EmptyExec: produce_one_row=false
|
||||
```
|
||||
|
||||
{{% /code-callout %}}
|
||||
|
||||
## Analyze a query plan for leading edge data
|
||||
|
||||
The following sections guide you through analyzing a physical query plan for a typical time series use case--aggregating recently written (_leading edge_) data.
|
||||
Although the query and plan are more complex than in the [preceding example](#example-physical-plan-for-a-select---order-by-query), you'll follow the same [steps to read the query plan](#read-a-query-plan).
|
||||
After learning how to read the query plan, you'll have an understanding of `ExecutionPlans`, data flow, and potential query bottlenecks.
|
||||
|
||||
### Sample data
|
||||
|
||||
Consider the following `h20` data, represented as "chunks" of line protocol, written to InfluxDB:
|
||||
|
||||
```text
|
||||
// h20 data
|
||||
// The following data represents 5 batches, or "chunks", of line protocol
|
||||
// written to InfluxDB.
|
||||
// - Chunks 1-4 are ingested and each is persisted to a separate partition file in storage.
|
||||
// - Chunk 5 is ingested and not yet persisted to storage.
|
||||
// - Chunks 1 and 2 cover short windows of time that don't overlap times in other chunks.
|
||||
// - Chunks 3 and 4 cover larger windows of time and the time ranges overlap each other.
|
||||
// - Chunk 5 contains the largest time range and overlaps with chunk 4, the Parquet file with the largest time-range.
|
||||
// - In InfluxDB, a chunk never duplicates its own data.
|
||||
//
|
||||
// Chunk 1: stored Parquet file
|
||||
// - time range: 50-249
|
||||
// - no duplicates in its own chunk
|
||||
// - no overlap with any other chunks
|
||||
[
|
||||
"h2o,state=MA,city=Bedford min_temp=71.59 150",
|
||||
"h2o,state=MA,city=Boston min_temp=70.4, 50",
|
||||
"h2o,state=MA,city=Andover max_temp=69.2, 249",
|
||||
],
|
||||
|
||||
// Chunk 2: stored Parquet file
|
||||
// - time range: 250-349
|
||||
// - no duplicates in its own chunk
|
||||
// - no overlap with any other chunks
|
||||
// - adds a new field (area)
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=79.0,max_temp=87.2,area=500u 300",
|
||||
"h2o,state=CA,city=SJ min_temp=75.5,max_temp=84.08 349",
|
||||
"h2o,state=MA,city=Bedford max_temp=78.75,area=742u 300",
|
||||
"h2o,state=MA,city=Boston min_temp=65.4 250",
|
||||
],
|
||||
|
||||
// Chunk 3: stored Parquet file
|
||||
// - time range: 350-500
|
||||
// - no duplicates in its own chunk
|
||||
// - overlaps chunk 4
|
||||
[
|
||||
"h2o,state=CA,city=SJ min_temp=77.0,max_temp=90.7 450",
|
||||
"h2o,state=CA,city=SJ min_temp=69.5,max_temp=88.2 500",
|
||||
"h2o,state=MA,city=Boston min_temp=68.4 350",
|
||||
],
|
||||
|
||||
// Chunk 4: stored Parquet file
|
||||
// - time range: 400-600
|
||||
// - no duplicates in its own chunk
|
||||
// - overlaps chunk 3
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600",
|
||||
"h2o,state=CA,city=SJ min_temp=69.5,max_temp=89.2 600", // duplicates row 3 in chunk 5
|
||||
"h2o,state=MA,city=Bedford max_temp=80.75,area=742u 400", // overlaps chunk 3
|
||||
"h2o,state=MA,city=Boston min_temp=65.40,max_temp=82.67 400", // overlaps chunk 3
|
||||
],
|
||||
|
||||
// Chunk 5: Ingester data
|
||||
// - time range: 550-700
|
||||
// - overlaps and duplicates data in chunk 4
|
||||
[
|
||||
"h2o,state=MA,city=Bedford max_temp=88.75,area=742u 600", // overlaps chunk 4
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 650",
|
||||
"h2o,state=CA,city=SJ min_temp=68.5,max_temp=90.0 600", // duplicates row 2 in chunk 4
|
||||
"h2o,state=CA,city=SJ min_temp=75.5,max_temp=84.08 700",
|
||||
"h2o,state=MA,city=Boston min_temp=67.4 550", // overlaps chunk 4
|
||||
]
|
||||
```
|
||||
|
||||
The following query selects all the data:
|
||||
|
||||
```sql
|
||||
SELECT state, city, min_temp, max_temp, area, time
|
||||
FROM h2o
|
||||
ORDER BY state asc, city asc, time desc;
|
||||
```
|
||||
|
||||
The output is the following:
|
||||
|
||||
```sql
|
||||
+-------+---------+----------+----------+------+--------------------------------+
|
||||
| state | city | min_temp | max_temp | area | time |
|
||||
+-------+---------+----------+----------+------+--------------------------------+
|
||||
| CA | SF | 68.4 | 85.7 | 500 | 1970-01-01T00:00:00.000000650Z |
|
||||
| CA | SF | 68.4 | 85.7 | 500 | 1970-01-01T00:00:00.000000600Z |
|
||||
| CA | SF | 79.0 | 87.2 | 500 | 1970-01-01T00:00:00.000000300Z |
|
||||
| CA | SJ | 75.5 | 84.08 | | 1970-01-01T00:00:00.000000700Z |
|
||||
| CA | SJ | 68.5 | 90.0 | | 1970-01-01T00:00:00.000000600Z |
|
||||
| CA | SJ | 69.5 | 88.2 | | 1970-01-01T00:00:00.000000500Z |
|
||||
| CA | SJ | 77.0 | 90.7 | | 1970-01-01T00:00:00.000000450Z |
|
||||
| CA | SJ | 75.5 | 84.08 | | 1970-01-01T00:00:00.000000349Z |
|
||||
| MA | Andover | | 69.2 | | 1970-01-01T00:00:00.000000249Z |
|
||||
| MA | Bedford | | 88.75 | 742 | 1970-01-01T00:00:00.000000600Z |
|
||||
| MA | Bedford | | 80.75 | 742 | 1970-01-01T00:00:00.000000400Z |
|
||||
| MA | Bedford | | 78.75 | 742 | 1970-01-01T00:00:00.000000300Z |
|
||||
| MA | Bedford | 71.59 | | | 1970-01-01T00:00:00.000000150Z |
|
||||
| MA | Boston | 67.4 | | | 1970-01-01T00:00:00.000000550Z |
|
||||
| MA | Boston | 65.4 | 82.67 | | 1970-01-01T00:00:00.000000400Z |
|
||||
| MA | Boston | 68.4 | | | 1970-01-01T00:00:00.000000350Z |
|
||||
| MA | Boston | 65.4 | | | 1970-01-01T00:00:00.000000250Z |
|
||||
| MA | Boston | 70.4 | | | 1970-01-01T00:00:00.000000050Z |
|
||||
+-------+---------+----------+----------+------+--------------------------------+
|
||||
```
|
||||
|
||||
### Sample query
|
||||
|
||||
The following query selects leading edge data from the [sample data](#sample-data):
|
||||
|
||||
```sql
|
||||
SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
The output is the following:
|
||||
|
||||
```sql
|
||||
+---------+-----------------+
|
||||
| city | COUNT(Int64(1)) |
|
||||
+---------+-----------------+
|
||||
| Andover | 1 |
|
||||
| Bedford | 3 |
|
||||
| Boston | 4 |
|
||||
+---------+-----------------+
|
||||
```
|
||||
|
||||
### EXPLAIN report for the leading edge data query
|
||||
|
||||
The following query generates the `EXPLAIN` report for the preceding [sample query](#sample-query):
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "EXPLAIN report for a leading edge data query" %}}
|
||||
|
||||
```sql
|
||||
| plan_type | plan |
|
||||
| logical_plan | Sort: h2o.city ASC NULLS LAST |
|
||||
| | Aggregate: groupBy=[[h2o.city]], aggr=[[COUNT(Int64(1))]] |
|
||||
| | TableScan: h2o projection=[city], full_filters=[h2o.time >= TimestampNanosecond(200, None), h2o.time < TimestampNanosecond(700, None), h2o.state = Dictionary(Int32, Utf8("MA"))] |
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST] |
|
||||
| | AggregateExec: mode=FinalPartitioned, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | RepartitionExec: partitioning=Hash([city@0], 4), input_partitions=4 |
|
||||
| | AggregateExec: mode=Partial, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=3 |
|
||||
| | UnionExec |
|
||||
| | ProjectionExec: expr=[city@0 as city] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | FilterExec: time@2 >= 200 AND time@2 < 700 AND state@1 = MA |
|
||||
| | ParquetExec: file_groups={2 groups: [[1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/243db601-f3f1-401b-afda-82160d8cc1a8.Parquet], [1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/f5fb7c7d-16ac-49ba-a811-69578d05843f.Parquet]]}, projection=[city, state, time], output_ordering=[state@1 ASC, city@0 ASC, time@2 ASC], predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA, pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3 |
|
||||
| | ProjectionExec: expr=[city@1 as city] |
|
||||
| | DeduplicateExec: [state@2 ASC,city@1 ASC,time@3 ASC] |
|
||||
| | SortPreservingMergeExec: [state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC] |
|
||||
| | UnionExec |
|
||||
| | SortExec: expr=[state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA |
|
||||
| | RecordBatchesExec: chunks=1, projection=[__chunk_order, city, state, time] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA |
|
||||
| | ParquetExec: file_groups={2 groups: [[1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/2cbb3992-4607-494d-82e4-66c480123189.Parquet], [1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/9255eb7f-2b51-427b-9c9b-926199c85bdf.Parquet]]}, projection=[__chunk_order, city, state, time], output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC], predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA, pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3 |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
`EXPLAIN` report for a typical leading edge data query
|
||||
{{% /caption %}}
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
The comments in the [sample data](#sample-data) tell you which data chunks _overlap_ or duplicate data in other chunks.
|
||||
Two chunks of data overlap if there are portions of time for which data exists in both chunks.
|
||||
_You'll learn how to [recognize overlapping and duplicate data](#recognize-overlapping-and-duplicate-data) in a query plan later in this guide._
|
||||
|
||||
Unlike the sample data, your data likely doesn't tell you where overlaps or duplicates exist.
|
||||
A physical plan can reveal overlaps and duplicates in your data and how they affect your queries--for example, after learning how to read a physical plan, you might summarize the data scanning steps as follows:
|
||||
|
||||
- Query execution starts with two `ParquetExec` and one `RecordBatchesExec` execution plans that run in parallel.
|
||||
- The first `ParquetExec` node reads two files that don't overlap any other files and don't duplicate data; the files don't require deduplication.
|
||||
- The second `ParquetExec` node reads two files that overlap each other and overlap the ingested data scanned in the `RecordBatchesExec` node; the query plan must include the deduplication process for these nodes before completing the query.
|
||||
|
||||
The remaining sections analyze `ExecutionPlan` node structure and arguments in the example physical plan.
|
||||
The example includes DataFusion and InfluxDB-specific [`ExecutionPlan` nodes](/influxdb/cloud-dedicated/reference/internals/query-plans/#executionplan-nodes).
|
||||
|
||||
### Locate the physical plan
|
||||
|
||||
To begin analyzing the physical plan for the query, find the row in the [`EXPLAIN` report](#explain-report-for-the-leading-edge-data-query) where the `plan_type` column has the value `physical_plan`.
|
||||
The `plan` column for the row contains the physical plan.
|
||||
|
||||
### Read the physical plan
|
||||
|
||||
The following sections follow the steps to [read a query plan](#read-a-query-plan) and examine the physical plan nodes and their input and output.
|
||||
|
||||
{{% note %}}
|
||||
To [read the execution flow of a query plan](#read-a-query-plan), always start from the innermost (leaf) nodes and read up toward the top outermost root node.
|
||||
{{% /note %}}
|
||||
|
||||
#### Physical plan leaf nodes
|
||||
|
||||
<img src="/img/influxdb/3-0-query-plan-tree.png" alt="Query physical plan leaf node structures" />
|
||||
|
||||
{{% caption %}}
|
||||
Leaf node structures in the physical plan
|
||||
{{% /caption %}}
|
||||
|
||||
### Data scanning nodes (ParquetExec and RecordBatchesExec)
|
||||
|
||||
The [example physical plan](#physical-plan-leaf-nodes) contains three [leaf nodes](#physical-plan-leaf-nodes)--the innermost nodes where the execution flow begins:
|
||||
|
||||
- [`ParquetExec`](/influxdb/cloud-serverless/reference/internals/query-plans/#parquetexec) nodes retrieve and scan data from Parquet files in the [Object store](/influxdb/cloud-serverless/reference/internals/storage-engine/#object-store)
|
||||
- a [`RecordBatchesExec`](/influxdb/cloud-serverless/reference/internals/query-plans/#recordbatchesexec) node retrieves recently written, yet-to-be-persisted data from the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester)
|
||||
|
||||
Because `ParquetExec` and `RecordBatchesExec` retrieve and scan data for a query, every query plan starts with one or more of these nodes.
|
||||
|
||||
The number of `ParquetExec` and `RecordBatchesExec` nodes and their parameter values can tell you which data (and how much) is retrieved for your query, and how efficiently the plan handles the organization (for example, partitioning and deduplication) of your data.
|
||||
|
||||
For convenience, this guide uses the names _ParquetExec_A_ and _ParquetExec_B_ for the `ParquetExec` nodes in the [example physical plan](#physical-plan-leaf-nodes) .
|
||||
Reading from the top of the physical plan, **ParquetExec_A** is the first leaf node in the physical plan and **ParquetExec_B** is the last (bottom) leaf node.
|
||||
|
||||
_The names indicate the nodes' locations in the report, not their order of execution._
|
||||
|
||||
- [ParquetExec_A](#parquetexec_a)
|
||||
- [RecordBatchesExec](#recordbatchesexec)
|
||||
- [ParquetExec_B](#parquetexec_b)
|
||||
|
||||
#### ParquetExec_A
|
||||
|
||||
```sql
|
||||
ParquetExec: file_groups={2 groups: [[1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/243db601-f3f1-401b-afda-82160d8cc1a8.Parquet], [1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/f5fb7c7d-16ac-49ba-a811-69578d05843f.Parquet]]}, projection=[city, state, time], output_ordering=[state@1 ASC, city@0 ASC, time@2 ASC], predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA, pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3 |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
ParquetExec_A, the first ParquetExec node
|
||||
{{% /caption %}}
|
||||
|
||||
ParquetExec_A has the following traits:
|
||||
|
||||
##### `file_groups`
|
||||
|
||||
A _file group_ is a list of files for the operator to read.
|
||||
Files are referenced by path:
|
||||
|
||||
- `1/1/b862a7e9b.../243db601-....parquet`
|
||||
- `1/1/b862a7e9b.../f5fb7c7d-....parquet`
|
||||
|
||||
The path structure represents how your data is organized.
|
||||
You can use the file paths to gather more information about the query--for example:
|
||||
|
||||
- to find file information (for example: size and number of rows) in the catalog
|
||||
- to download the Parquet file from the Object store for debugging
|
||||
- to find how many partitions the query reads
|
||||
|
||||
A path has the following structure:
|
||||
|
||||
```text
|
||||
<namespace_id>/<table_id>/<partition_hash_id>/<uuid_of_the_file>.Parquet
|
||||
1 / 1 /b862a7e9b329ee6a4.../243db601-f3f1-4....Parquet
|
||||
```
|
||||
|
||||
- `namespace_id`: the namespace (database) being queried
|
||||
- `table_id`: the table (measurement) being queried
|
||||
- `partition_hash_id`: the partition this file belongs to.
|
||||
You can count partition IDs to find how many partitions the query reads.
|
||||
- `uuid_of_the_file`: the file identifier.
|
||||
|
||||
`ParquetExec` processes groups in parallel and reads the files in each group sequentially.
|
||||
|
||||
```text
|
||||
file_groups={2 groups: [[1/1/b862a7e9b329ee6a4/243db601....parquet], [1/1/b862a7e9b329ee6a4/f5fb7c7d....parquet]]}
|
||||
```
|
||||
|
||||
- `{2 groups: [[file], [file]}`: ParquetExec_A receives two groups with one file per group.
|
||||
Therefore, ParquetExec_A reads two files in parallel.
|
||||
|
||||
##### `projection`
|
||||
|
||||
`projection` lists the table columns for the `ExecutionPlan` to read and output.
|
||||
|
||||
```text
|
||||
projection=[city, state, time]
|
||||
```
|
||||
|
||||
- `[city, state, time]`: the [sample data](#sample-data) contains many columns, but the [sample query](#sample-query) requires the Querier to read only three
|
||||
|
||||
##### `output_ordering`
|
||||
|
||||
`output_ordering` specifies the sort order for the `ExecutionPlan` output.
|
||||
The Query planner passes the parameter if the output should be ordered and if the planner knows the order.
|
||||
|
||||
```text
|
||||
output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC]
|
||||
```
|
||||
|
||||
When storing data to Parquet files, InfluxDB sorts the data to improve storage compression and query efficiency and the planner tries to preserve that order for as long as possible.
|
||||
Generally, the `output_ordering` value that `ParquetExec` receives is the ordering (or a subset of the ordering) of stored data.
|
||||
|
||||
_By design, [`RecordBatchesExec`](#recordbatchesexec) data isn't sorted._
|
||||
|
||||
In the example, the planner specifies that ParquetExec_A use the existing sort order `state ASC, city ASC, time ASC,` for output.
|
||||
|
||||
{{% note %}}
|
||||
To view the sort order of your stored data, generate an `EXPLAIN` report for a `SELECT ALL` query--for example:
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT * FROM TABLE_NAME WHERE time > now() - interval '1 hour'
|
||||
```
|
||||
|
||||
Reduce the time range if the query returns too much data.
|
||||
{{% /note %}}
|
||||
|
||||
##### `predicate`
|
||||
|
||||
`predicate` is the data filter specified in the query.
|
||||
|
||||
```text
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA
|
||||
```
|
||||
|
||||
##### `pruning predicate`
|
||||
|
||||
`pruning_predicate` is created from the [`predicate`](#predicate) value and is the predicate actually used for pruning data and files from the chosen partitions.
|
||||
The default filters files by `time`.
|
||||
|
||||
```text
|
||||
pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3
|
||||
```
|
||||
|
||||
_Before the physical plan is generated, an additional `partition pruning` step uses predicates on partitioning columns to prune partitions._
|
||||
|
||||
#### `RecordBatchesExec`
|
||||
|
||||
```sql
|
||||
RecordBatchesExec: chunks=1, projection=[__chunk_order, city, state, time]
|
||||
```
|
||||
|
||||
{{% caption %}}RecordBatchesExec{{% /caption %}}
|
||||
|
||||
[`RecordBatchesExec`](/influxdb/cloud-serverless/reference/internals/query-plans/#recordbatchesexec) is an InfluxDB-specific `ExecutionPlan` implementation that retrieves recently written, yet-to-be-persisted data from the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester).
|
||||
|
||||
In the example, `RecordBatchesExec` contains the following expressions:
|
||||
|
||||
##### `chunks`
|
||||
|
||||
`chunks` is the number of data chunks received from the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester).
|
||||
|
||||
```text
|
||||
chunks=1
|
||||
```
|
||||
|
||||
- `chunks=1`: `RecordBatchesExec` receives one data chunk.
|
||||
|
||||
##### `projection`
|
||||
|
||||
The `projection` list specifies the columns or expressions for the node to read and output.
|
||||
|
||||
```text
|
||||
[__chunk_order, city, state, time]
|
||||
```
|
||||
|
||||
- `__chunk_order`: orders chunks and files for deduplication
|
||||
- `city, state, time`: the same columns specified in [`ParquetExec_A projection`](#projection-1)
|
||||
|
||||
{{% note %}}
|
||||
The presence of `__chunk_order` in data scanning nodes indicates that data overlaps, and is possibly duplicated, among the nodes.
|
||||
{{% /note %}}
|
||||
|
||||
#### ParquetExec_B
|
||||
|
||||
The bottom leaf node in the [example physical plan](#physical-plan-leaf-nodes) is another `ParquetExec` operator, _ParquetExec_B_.
|
||||
|
||||
##### ParquetExec_B expressions
|
||||
|
||||
```sql
|
||||
ParquetExec:
|
||||
file_groups={2 groups: [[1/1/b862a7e9b.../2cbb3992-....Parquet],
|
||||
[1/1/b862a7e9b.../9255eb7f-....Parquet]]},
|
||||
projection=[__chunk_order, city, state, time],
|
||||
output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC],
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA,
|
||||
pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3
|
||||
```
|
||||
|
||||
{{% caption %}}ParquetExec_B, the second ParquetExec{{% /caption %}}
|
||||
|
||||
Because ParquetExec_B has overlaps, the `projection` and `output_ordering` expressions use the `__chunk_order` column used in [`RecordBatchesExec` `projection`](#projection-1).
|
||||
|
||||
{{% note %}}
|
||||
The presence of `__chunk_order` in data scanning nodes indicates that data overlaps, and is possibly duplicated, among the nodes.
|
||||
{{% /note %}}
|
||||
|
||||
The remaining ParquetExec_B expressions are similar to those in [ParquetExec_A](#parquetexec_a).
|
||||
|
||||
##### How a query plan distributes data for scanning
|
||||
|
||||
If you compare [`file_group`](#file_groups) paths in [ParquetExec_A](#parquetexec_a) to those in [ParquetExec_B](#parquetexec_b), you'll notice that both contain files from the same partition:
|
||||
|
||||
{{% code-callout "b862a7e9b329ee6a4..." %}}
|
||||
|
||||
```text
|
||||
1/1/b862a7e9b329ee6a4.../...
|
||||
```
|
||||
|
||||
{{% /code-callout %}}
|
||||
|
||||
The planner may distribute files from the same partition to different scan nodes for several reasons, including optimizations for handling [overlaps](#how-a-query-plan-distributes-data-for-scanning)--for example:
|
||||
|
||||
- to separate non-overlapped files from overlapped files to minimize work required for deduplication (which is the case in this example)
|
||||
- to distribute non-overlapped files to increase parallel execution
|
||||
|
||||
### Analyze branch structures
|
||||
|
||||
After data is output from a data scanning node, it flows up to the next parent (outer) node.
|
||||
|
||||
In the example plan:
|
||||
|
||||
- Each leaf node is the first step in a branch of nodes planned for processing the scanned data.
|
||||
- The three branches execute in parallel.
|
||||
- After the leaf node, each branch contains the following similar node structure:
|
||||
|
||||
```sql
|
||||
...
|
||||
CoalesceBatchesExec: target_batch_size=8192
|
||||
FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA
|
||||
...
|
||||
```
|
||||
|
||||
- `FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA`: filters data for the condition `time@3 >= 200 AND time@3 < 700 AND state@2 = MA`, and guarantees that all data is pruned.
|
||||
- `CoalesceBatchesExec: target_batch_size=8192`: combines small batches into larger batches. See the DataFusion [`CoalesceBatchesExec`] documentation.
|
||||
|
||||
#### Sorting yet-to-be-persisted data
|
||||
|
||||
In the `RecordBatchesExec` branch, the node that follows `CoalesceBatchesExec` is a `SortExec` node:
|
||||
|
||||
```sql
|
||||
SortExec: expr=[state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC]
|
||||
```
|
||||
|
||||
The node uses the specified expression `state ASC, city ASC, time ASC, __chunk_order ASC` to sort the yet-to-be-persisted data.
|
||||
Neither ParquetExec_A nor ParquetExec_B contain a similar node because data in the Object store is already sorted (by the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester) or the [Compactor](/influxdb/cloud-serverless/reference/internals/storage-engine/#compactor)) in the given order; the query plan only needs to sort data that arrives from the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester).
|
||||
|
||||
#### Recognize overlapping and duplicate data
|
||||
|
||||
In the example physical plan, the ParquetExec_B and `RecordBatchesExec` nodes share the following parent nodes:
|
||||
|
||||
```sql
|
||||
...
|
||||
DeduplicateExec: [state@2 ASC,city@1 ASC,time@3 ASC]
|
||||
SortPreservingMergeExec: [state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC]
|
||||
UnionExec
|
||||
...
|
||||
```
|
||||
|
||||
{{% caption %}}Overlapped data node structure{{% /caption %}}
|
||||
|
||||
1. `UnionExec`: unions multiple streams of input data by concatenating the partitions. `UnionExec` doesn't do any merging and is fast to execute.
|
||||
2. `SortPreservingMergeExec: [state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC]`: merges already sorted data; indicates that preceding data (from nodes below it) is already sorted. The output data is a single sorted stream.
|
||||
3. `DeduplicateExec: [state@2 ASC,city@1 ASC,time@3 ASC]`: deduplicates an input stream of sorted data.
|
||||
Because `SortPreservingMergeExec` ensures a single sorted stream, it often, but not always, precedes `DeduplicateExec`.
|
||||
|
||||
A `DeduplicateExec` node indicates that encompassed nodes have [_overlapped data_](/influxdb/cloud-serverless/reference/internals/query-plans/#overlapping-data-and-deduplication)--data in a file or batch have timestamps in the same range as data in another file or batch.
|
||||
Due to how InfluxDB organizes data, data is never duplicated _within_ a file.
|
||||
|
||||
In the example, the `DeduplicateExec` node encompasses ParquetExec_B and the `RecordBatchesExec` node, which indicates that ParquetExec_B [file group](#file_groups) files overlap the yet-to-be persisted data.
|
||||
|
||||
The following [sample data](#sample-data) excerpt shows overlapping data between a file and Ingester data:
|
||||
|
||||
```text
|
||||
// Chunk 4: stored Parquet file
|
||||
// - time range: 400-600
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600",
|
||||
],
|
||||
|
||||
// Chunk 5: Ingester data
|
||||
// - time range: 550-700
|
||||
// - overlaps and duplicates data in chunk 4
|
||||
[
|
||||
"h2o,state=MA,city=Bedford max_temp=88.75,area=742u 600", // overlaps chunk 4
|
||||
...
|
||||
"h2o,state=MA,city=Boston min_temp=67.4 550", // overlaps chunk 4
|
||||
]
|
||||
```
|
||||
|
||||
If files or ingested data overlap, the Querier must include the `DeduplicateExec` in the query plan to remove any duplicates.
|
||||
`DeduplicateExec` doesn't necessarily indicate that data is duplicated.
|
||||
If a plan reads many files and performs deduplication on all of them, it might be for the following reasons:
|
||||
|
||||
- the files contain duplicate data
|
||||
- the Object store has many small overlapped files that the Compactor hasn't compacted yet. After compaction, your query may perform better because it has fewer files to read
|
||||
- the Compactor isn't keeping up
|
||||
|
||||
A leaf node that doesn't have a `DeduplicateExec` node in its branch doesn't require deduplication and doesn't overlap other files or [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester) data--for example, ParquetExec_A has no overlaps:
|
||||
|
||||
```sql
|
||||
ProjectionExec:...
|
||||
CoalesceBatchesExec:...
|
||||
FilterExec:...
|
||||
ParquetExec:...
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
The absence of a `DeduplicateExec` node means that files don't overlap.
|
||||
{{% /caption %}}
|
||||
|
||||
##### Data scan output
|
||||
|
||||
`ProjectionExec` nodes filter columns so that only the `city` column remains in the output:
|
||||
|
||||
```sql
|
||||
`ProjectionExec: expr=[city@0 as city]`
|
||||
```
|
||||
|
||||
##### Final processing
|
||||
|
||||
After deduplicating and filtering data in each leaf node, the plan combines the output and then applies aggregation and sorting operators for the final result:
|
||||
|
||||
```sql
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST] |
|
||||
| | AggregateExec: mode=FinalPartitioned, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | RepartitionExec: partitioning=Hash([city@0], 4), input_partitions=4 |
|
||||
| | AggregateExec: mode=Partial, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=3 |
|
||||
| | UnionExec
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
Operator structure for aggregating, sorting, and final output.
|
||||
{{% /caption %}}
|
||||
|
||||
- `UnionExec`: unions data streams. Note that the number of output streams is the same as the number of input streams--the `UnionExec` node is an intermediate step to downstream operators that actually merge or split data streams.
|
||||
- `RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=3`: Splits three input streams into four output streams in round-robin fashion. The plan splits streams to increase parallel execution.
|
||||
- `AggregateExec: mode=Partial, gby=[city@0 as city], aggr=[COUNT(Int64(1))]`: Groups data as specified in the [query](#sample-query): `city, count(1)`.
|
||||
This node aggregates each of the four streams separately, and then outputs four streams, indicated by `mode=Partial`--the data isn't fully aggregated.
|
||||
- `RepartitionExec: partitioning=Hash([city@0], 4), input_partitions=4`: Repartitions data on `Hash([city])` and into four streams--each stream contains data for one city.
|
||||
- `AggregateExec: mode=FinalPartitioned, gby=[city@0 as city], aggr=[COUNT(Int64(1))]`: Applies the final aggregation (`aggr=[COUNT(Int64(1))]`) to the data. `mode=FinalPartitioned` indicates that the data has already been partitioned (by city) and doesn't need further grouping by `AggregateExec`.
|
||||
- `SortExec: expr=[city@0 ASC NULLS LAST]`: Sorts the four streams of data, each on `city`, as specified in the query.
|
||||
- `SortPreservingMergeExec: [city@0 ASC NULLS LAST]`: Merges and sorts the four sorted streams for the final output.
|
||||
|
||||
In the preceding examples, the `EXPLAIN` report shows the query plan without executing the query.
|
||||
To view runtime metrics, such as execution time for a plan and its operators, use [`EXPLAIN ANALYZE`](/influxdb/cloud-serverless/reference/sql/explain/#explain-analyze) to generate the report and [tracing](/influxdb/cloud-serverless/query-data/optimize-queries/#enable-trace-logging) for further debugging, if necessary.
|
|
@ -6,28 +6,22 @@ weight: 401
|
|||
menu:
|
||||
influxdb_cloud_serverless:
|
||||
name: Understand Flight responses
|
||||
parent: Execute queries
|
||||
influxdb/cloud-serverless/tags: [query, sql, influxql]
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/cloud-serverless/tags: [query, errors, flight]
|
||||
related:
|
||||
- /influxdb/cloud-serverless/query-data/sql/
|
||||
- /influxdb/cloud-serverless/query-data/influxql/
|
||||
- /influxdb/cloud-serverless/reference/client-libraries/v3/
|
||||
---
|
||||
|
||||
Learn how to handle responses and troubleshoot errors encountered when querying {{% product-name %}} with Flight+gRPC and Arrow Flight clients.
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [InfluxDB Flight responses](#influxdb-flight-responses)
|
||||
- [Stream](#stream)
|
||||
- [Schema](#schema)
|
||||
- [Example](#example)
|
||||
- [RecordBatch](#recordbatch)
|
||||
- [InfluxDB status and error codes](#influxdb-status-and-error-codes)
|
||||
- [Troubleshoot errors](#troubleshoot-errors)
|
||||
- [Internal Error: Received RST_STREAM](#internal-error-received-rst_stream)
|
||||
- [Internal Error: stream terminated by RST_STREAM with NO_ERROR](#internal-error-stream-terminated-by-rst_stream-with-no_error)
|
||||
- [Invalid Argument Error: bucket <BUCKET_ID> not found](#invalid-argument-error-bucket-bucket_id-not-found)
|
||||
- [Invalid Argument: Invalid ticket](#invalid-argument-invalid-ticket)
|
||||
- [Unauthenticated: Unauthenticated](#unauthenticated-unauthenticated)
|
||||
- [Unauthenticated: read:<BUCKET_ID> is unauthorized](#unauthenticated-readbucket_id-is-unauthorized)
|
||||
- [FlightUnavailableError: Could not get default pem root certs](#flightunavailableerror-could-not-get-default-pem-root-certs)
|
||||
|
||||
## InfluxDB Flight responses
|
||||
|
||||
|
@ -42,7 +36,7 @@ For example, if you use the [`influxdb3-python` Python client library](/influxdb
|
|||
InfluxDB responds with one of the following:
|
||||
|
||||
- A [stream](#stream) in Arrow IPC streaming format
|
||||
- An [error status code](#influxdb-error-codes) and an optional `details` field that contains the status and a message that describes the error
|
||||
- An [error status code](#influxdb-status-and-error-codes) and an optional `details` field that contains the status and a message that describes the error
|
||||
|
||||
### Stream
|
||||
|
||||
|
@ -81,7 +75,8 @@ SELECT co, delete, hum, room, temp, time
|
|||
|
||||
The Python client library outputs the following schema representation:
|
||||
|
||||
```py
|
||||
<!--pytest.mark.skip-->
|
||||
```python
|
||||
Schema:
|
||||
co: int64
|
||||
-- field metadata --
|
||||
|
@ -128,7 +123,7 @@ In gRPC, every call returns a status object that contains an integer code and a
|
|||
During a request, the gRPC client and server may each return a status--for example:
|
||||
|
||||
- The server fails to process the query; responds with status `internal error` and gRPC status `13`.
|
||||
- The request is missing an API token; the server responds with status `unauthenticated` and gRPC status `16`.
|
||||
- The request is missing a [token](/influxdb/cloud-serverless/admin/tokens/); the server responds with status `unauthenticated` and gRPC status `16`.
|
||||
- The server responds with a stream, but the client loses the connection due to a network failure and returns status `unavailable`.
|
||||
|
||||
gRPC defines the integer [status codes](https://grpc.github.io/grpc/core/status_8h.html) and definitions for servers and clients and
|
||||
|
@ -169,14 +164,13 @@ _For a list of gRPC codes that servers and clients may return, see [Status codes
|
|||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
|
||||
### Troubleshoot errors
|
||||
|
||||
#### Internal Error: Received RST_STREAM
|
||||
|
||||
**Example**:
|
||||
|
||||
```sh
|
||||
```structuredtext
|
||||
Flight returned internal error, with message: Received RST_STREAM with error code 2. gRPC client debug context: UNKNOWN:Error received from peer ipv4:34.196.233.7:443 {grpc_message:"Received RST_STREAM with error code 2"}
|
||||
```
|
||||
|
||||
|
@ -187,12 +181,11 @@ Flight returned internal error, with message: Received RST_STREAM with error cod
|
|||
- Server might have closed the connection due to an internal error.
|
||||
- The client exceeded the server's maximum number of concurrent streams.
|
||||
|
||||
<!-- END -->
|
||||
|
||||
#### Internal Error: stream terminated by RST_STREAM with NO_ERROR
|
||||
|
||||
**Example**:
|
||||
|
||||
<!--pytest.mark.skip-->
|
||||
```sh
|
||||
pyarrow._flight.FlightInternalError: Flight returned internal error, with message: stream terminated by RST_STREAM with error code: NO_ERROR. gRPC client debug context: UNKNOWN:Error received from peer ipv4:3.123.149.45:443 {created_time:"2023-07-26T14:12:44.992317+02:00", grpc_status:13, grpc_message:"stream terminated by RST_STREAM with error code: NO_ERROR"}. Client context: OK
|
||||
```
|
||||
|
@ -203,12 +196,11 @@ pyarrow._flight.FlightInternalError: Flight returned internal error, with messag
|
|||
- Possible network disruption, even if it's temporary.
|
||||
- The server might have reached its maximum capacity or other internal limits.
|
||||
|
||||
<!-- END -->
|
||||
|
||||
#### Invalid Argument Error: bucket <BUCKET_ID> not found
|
||||
|
||||
**Example**:
|
||||
|
||||
<!--pytest.mark.skip-->
|
||||
```sh
|
||||
ArrowInvalid: Flight returned invalid argument error, with message: bucket "otel5" not found. gRPC client debug context: UNKNOWN:Error received from peer ipv4:3.123.149.45:443 {grpc_message:"bucket \"otel5\" not found", grpc_status:3, created_time:"2023-08-09T16:37:30.093946+01:00"}. Client context: IOError: Server never sent a data message. Detail: Internal
|
||||
```
|
||||
|
@ -217,12 +209,11 @@ ArrowInvalid: Flight returned invalid argument error, with message: bucket "otel
|
|||
|
||||
- The specified bucket doesn't exist.
|
||||
|
||||
<!-- END -->
|
||||
|
||||
#### Invalid Argument: Invalid ticket
|
||||
|
||||
**Example**:
|
||||
|
||||
<!--pytest.mark.skip-->
|
||||
```sh
|
||||
pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message: Invalid ticket. Error: Invalid ticket. gRPC client debug context: UNKNOWN:Error received from peer ipv4:54.158.68.83:443 {created_time:"2023-08-31T17:56:42.909129-05:00", grpc_status:3, grpc_message:"Invalid ticket. Error: Invalid ticket"}. Client context: IOError: Server never sent a data message. Detail: Internal
|
||||
```
|
||||
|
@ -232,12 +223,11 @@ pyarrow.lib.ArrowInvalid: Flight returned invalid argument error, with message:
|
|||
- The request is missing the bucket name or some other required metadata value.
|
||||
- The request contains bad query syntax.
|
||||
|
||||
<!-- END -->
|
||||
|
||||
##### Unauthenticated: Unauthenticated
|
||||
#### Unauthenticated: Unauthenticated
|
||||
|
||||
**Example**:
|
||||
|
||||
<!--pytest.mark.skip-->
|
||||
```sh
|
||||
Flight returned unauthenticated error, with message: unauthenticated. gRPC client debug context: UNKNOWN:Error received from peer ipv4:34.196.233.7:443 {grpc_message:"unauthenticated", grpc_status:16, created_time:"2023-08-28T15:38:33.380633-05:00"}. Client context: IOError: Server never sent a data message. Detail: Internal
|
||||
```
|
||||
|
@ -247,12 +237,11 @@ Flight returned unauthenticated error, with message: unauthenticated. gRPC clien
|
|||
- Token is missing from the request.
|
||||
- The specified token doesn't exist for the specified organization.
|
||||
|
||||
<!-- END -->
|
||||
|
||||
#### Unauthenticated: read:<BUCKET_ID> is unauthorized
|
||||
#### Unauthorized: Permission denied
|
||||
|
||||
**Example**:
|
||||
|
||||
<!--pytest.mark.skip-->
|
||||
```sh
|
||||
Flight returned unauthenticated error, with message: read:orgs/28d1f2f565460a6c/buckets/756fa4f8c8ba6913 is unauthorized. gRPC client debug context: UNKNOWN:Error received from peer ipv4:54.174.236.48:443 {grpc_message:"read:orgs/28d1f2f565460a6c/buckets/756fa4f8c8ba6913 is unauthorized", grpc_status:16, created_time:"2023-08-28T15:42:04.462655-05:00"}. Client context: IOError: Server never sent a data message. Detail: Internal
|
||||
```
|
||||
|
@ -261,14 +250,13 @@ Flight returned unauthenticated error, with message: read:orgs/28d1f2f565460a6c/
|
|||
|
||||
- The specified token doesn't have read permission for the specified bucket.
|
||||
|
||||
<!-- END -->
|
||||
|
||||
#### FlightUnavailableError: Could not get default pem root certs
|
||||
|
||||
**Example**:
|
||||
|
||||
If unable to locate a root certificate for _gRPC+TLS_, the Flight client returns errors similar to the following:
|
||||
|
||||
<!--pytest.mark.skip-->
|
||||
```sh
|
||||
UNKNOWN:Failed to load file... filename:"/usr/share/grpc/roots.pem",
|
||||
children:[UNKNOWN:No such file or directory
|
|
@ -0,0 +1,69 @@
|
|||
---
|
||||
title: Optimize queries
|
||||
description: >
|
||||
Optimize queries to improve performance and reduce their memory and compute (CPU) requirements in InfluxDB.
|
||||
Learn how to use observability tools to analyze query execution and view metrics.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_cloud_serverless:
|
||||
name: Optimize queries
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/cloud-serverless/tags: [query, performance, observability, errors, sql, influxql]
|
||||
related:
|
||||
- /influxdb/cloud-serverless/query-data/sql/
|
||||
- /influxdb/cloud-serverless/query-data/influxql/
|
||||
- /influxdb/cloud-serverless/query-data/execute-queries/analyze-query-plan/
|
||||
aliases:
|
||||
- /influxdb/cloud-serverless/query-data/execute-queries/optimize-queries/
|
||||
---
|
||||
|
||||
Optimize SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
|
||||
Learn how to use observability tools to analyze query execution and view metrics.
|
||||
|
||||
- [Why is my query slow?](#why-is-my-query-slow)
|
||||
- [Strategies for improving query performance](#strategies-for-improving-query-performance)
|
||||
- [Analyze and troubleshoot queries](#analyze-and-troubleshoot-queries)
|
||||
|
||||
## Why is my query slow?
|
||||
|
||||
Query performance depends on time range and complexity.
|
||||
If a query is slower than you expect, it might be due to the following reasons:
|
||||
|
||||
- It queries data from a large time range.
|
||||
- It includes intensive operations, such as querying many string values or `ORDER BY` sorting or re-sorting large amounts of data.
|
||||
|
||||
## Strategies for improving query performance
|
||||
|
||||
The following design strategies generally improve query performance and resource use:
|
||||
|
||||
- Follow [schema design best practices](/influxdb/cloud-serverless/write-data/best-practices/schema-design/) to make querying easier and more performant.
|
||||
- Query only the data you need--for example, include a [`WHERE` clause](/influxdb/cloud-serverless/reference/sql/where/) that filters data by a time range.
|
||||
InfluxDB v3 stores data in a Parquet file for each measurement and day, and retrieves files from the Object store to answer a query.
|
||||
The smaller the time range in your query, the fewer files InfluxDB needs to retrieve from the Object store.
|
||||
- [Downsample data](/influxdb/cloud-serverless/process-data/downsample/) to reduce the amount of data you need to query.
|
||||
|
||||
Some bottlenecks may be out of your control and are the result of a suboptimal execution plan, such as:
|
||||
|
||||
- Applying the same sort (`ORDER BY`) to already sorted data.
|
||||
- Retrieving many Parquet files from the Object store--the same query performs better if it retrieves fewer - though, larger - files.
|
||||
- Querying many overlapped Parquet files.
|
||||
- Performing a large number of table scans.
|
||||
|
||||
{{% note %}}
|
||||
#### Analyze query plans to view metrics and recognize bottlenecks
|
||||
|
||||
To view runtime metrics for a query, such as the number of files scanned, use the [`EXPLAIN ANALYZE` keywords](/influxdb/cloud-serverless/reference/sql/explain/#explain-analyze) and learn how to [analyze a query plan](/influxdb/cloud-serverless/query-data/troubleshoot-and-optimize/analyze-query-plan/).
|
||||
{{% /note %}}
|
||||
|
||||
## Analyze and troubleshoot queries
|
||||
|
||||
Use the following tools to analyze and troubleshoot queries and find performance bottlenecks:
|
||||
|
||||
- [Analyze a query plan](/influxdb/cloud-serverless/query-data/analyze-query-plan/)
|
||||
- [Enable trace logging for a query](#enable-trace-logging-for-a-query)
|
||||
|
||||
### Enable trace logging for a query
|
||||
|
||||
Customers with an {{% product-name %}} [annual or support contract](https://www.influxdata.com/influxdb-cloud-pricing/) can [contact InfluxData Support](https://support.influxdata.com/) to enable tracing and request help troubleshooting your query.
|
||||
With tracing enabled, InfluxData Support can trace system processes and analyze log information for a query instance.
|
||||
The tracing system follows the [OpenTelemetry traces](https://opentelemetry.io/docs/concepts/signals/traces/) model for providing observability into a request.
|
|
@ -0,0 +1,44 @@
|
|||
---
|
||||
title: Troubleshoot queries
|
||||
description: >
|
||||
Troubleshoot SQL and InfluxQL queries in InfluxDB.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_cloud_serverless:
|
||||
name: Troubleshoot queries
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/cloud-serverless/tags: [query, performance, observability, errors, sql, influxql]
|
||||
related:
|
||||
- /influxdb/cloud-serverless/query-data/sql/
|
||||
- /influxdb/cloud-serverless/query-data/influxql/
|
||||
- /influxdb/cloud-serverless/reference/client-libraries/v3/
|
||||
aliases:
|
||||
- /influxdb/cloud-serverless/query-data/execute-queries/troubleshoot/
|
||||
---
|
||||
|
||||
Troubleshoot SQL and InfluxQL queries that return unexpected results.
|
||||
|
||||
- [Why doesn't my query return data?](#why-doesnt-my-query-return-data)
|
||||
- [Optimize slow or expensive queries](#optimize-slow-or-expensive-queries)
|
||||
|
||||
## Why doesn't my query return data?
|
||||
|
||||
If a query doesn't return any data, it might be due to the following:
|
||||
|
||||
- Your data falls outside the time range (or other conditions) in the query--for example, the InfluxQL `SHOW TAG VALUES` command uses a default time range of 1 day.
|
||||
- The query (InfluxDB server) timed out.
|
||||
- The query client timed out.
|
||||
|
||||
If a query times out or returns an error, it might be due to the following:
|
||||
|
||||
- a bad request
|
||||
- a server or network problem
|
||||
- it queries too much data
|
||||
|
||||
[Understand Arrow Flight responses](/influxdb/cloud-serverless/query-data/troubleshoot-and-optimize/flight-responses/) and error messages for queries.
|
||||
|
||||
## Optimize slow or expensive queries
|
||||
|
||||
If a query is slow or uses too many compute resources, limit the amount of data that it queries.
|
||||
|
||||
See how to [optimize queries](/influxdb/cloud-serverless/query-data/troubleshoot-and-optimize/optimize-queries/) and use tools to view runtime metrics, identify bottlenecks, and debug queries.
|
|
@ -23,7 +23,7 @@ prepend:
|
|||
[**Compare tools you can use**](/influxdb/cloud-serverless/get-started/#tools-to-use) to interact with {{% product-name %}}.
|
||||
---
|
||||
|
||||
Arduino is an open-source hardware and software platform used for building electronics projects.
|
||||
Arduino is an open source hardware and software platform used for building electronics projects.
|
||||
|
||||
The documentation for this client library is available on GitHub.
|
||||
|
||||
|
|
|
@ -22,7 +22,7 @@ prepend:
|
|||
[**Compare tools you can use**](/influxdb/cloud-serverless/get-started/#tools-to-use) to interact with {{% product-name %}}.
|
||||
---
|
||||
|
||||
Kotlin is an open-source programming language that runs on the Java Virtual Machine (JVM).
|
||||
Kotlin is an open source programming language that runs on the Java Virtual Machine (JVM).
|
||||
|
||||
The documentation for this client library is available on GitHub.
|
||||
|
||||
|
|
|
@ -79,7 +79,7 @@ InfluxData typically recommends batch sizes of 5,000-10,000 points.
|
|||
In some use cases, performance may improve with significantly smaller or larger batches.
|
||||
|
||||
Related entries:
|
||||
[line protocol](#line-protocol),
|
||||
[line protocol](#line-protocol-lp),
|
||||
[point](#point)
|
||||
|
||||
### batch size
|
||||
|
@ -270,7 +270,7 @@ Aggregating high resolution data into lower resolution data to preserve disk spa
|
|||
|
||||
### duration
|
||||
|
||||
A data type that represents a duration of time (1s, 1m, 1h, 1d).
|
||||
A data type that represents a duration of time--for example, `1s`, `1m`, `1h`, `1d`.
|
||||
Retention periods are set using durations.
|
||||
|
||||
Related entries:
|
||||
|
@ -347,9 +347,6 @@ Related entries:
|
|||
|
||||
A file block is a fixed-length chunk of data read into memory when requested by an application.
|
||||
|
||||
Related entries:
|
||||
[block](#block)
|
||||
|
||||
### float
|
||||
|
||||
A real number written with a decimal point dividing the integer and fractional parts (`1.0`, `3.14`, `-20.1`).
|
||||
|
@ -413,11 +410,10 @@ Identifiers are tokens that refer to specific database objects such as database
|
|||
names, field keys, measurement names, tag keys, etc.
|
||||
|
||||
Related entries:
|
||||
[database](#database)
|
||||
[database](#database),
|
||||
[field key](#field-key),
|
||||
[measurement](#measurement),
|
||||
[tag key](#tag-key),
|
||||
|
||||
[tag key](#tag-key)
|
||||
|
||||
### influx
|
||||
|
||||
|
@ -438,7 +434,7 @@ and other required processes.
|
|||
|
||||
### InfluxDB
|
||||
|
||||
An open-source time series database (TSDB) developed by InfluxData.
|
||||
An open source time series database (TSDB) developed by InfluxData.
|
||||
Written in Go and optimized for fast, high-availability storage and retrieval of
|
||||
time series data in fields such as operations monitoring, application metrics,
|
||||
Internet of Things sensor data, and real-time analytics.
|
||||
|
@ -478,7 +474,7 @@ Related entries:
|
|||
The IOx (InfluxDB v3) storage engine is a real-time, columnar database optimized for time series
|
||||
data built in Rust on top of [Apache Arrow](https://arrow.apache.org/) and
|
||||
[DataFusion](https://arrow.apache.org/datafusion/user-guide/introduction.html).
|
||||
IOx replaces the [TSM](#tsm) storage engine.
|
||||
IOx replaces the [TSM (Time Structured Merge tree)](#tsm-time-structured-merge-tree) storage engine.
|
||||
|
||||
## J
|
||||
|
||||
|
@ -508,11 +504,13 @@ and array data types.
|
|||
### keyword
|
||||
|
||||
A keyword is reserved by a program because it has special meaning.
|
||||
Every programming language has a set of keywords (reserved names) that cannot be used as an identifier.
|
||||
Every programming language has a set of keywords (reserved names) that cannot be used as identifiers--for example,
|
||||
you can't use `SELECT` (an SQL keyword) as a variable name in an SQL query.
|
||||
|
||||
See a list of [SQL keywords](/influxdb/cloud-serverless/reference/sql/#keywords).
|
||||
See keyword lists:
|
||||
|
||||
<!-- TODO: Add a link to InfluxQL keywords -->
|
||||
- [SQL keywords](/influxdb/cloud-serverless/reference/sql/#keywords)
|
||||
- [InfluxQL keywords](/influxdb/cloud-serverless/reference/influxql/#keywords)
|
||||
|
||||
## L
|
||||
|
||||
|
@ -582,7 +580,6 @@ Related entries:
|
|||
[cluster](#cluster),
|
||||
[server](#server)
|
||||
|
||||
|
||||
### now
|
||||
|
||||
The local server's nanosecond timestamp.
|
||||
|
@ -623,7 +620,7 @@ Owners have read/write permissions.
|
|||
Users can have owner roles for databases and other resources.
|
||||
|
||||
Role permissions are separate from API token permissions.
|
||||
For additional information on API tokens, see [token](#tokens).
|
||||
For additional information on API tokens, see [token](#token).
|
||||
|
||||
### output plugin
|
||||
|
||||
|
@ -730,6 +727,15 @@ An InfluxDB query returns time series data.
|
|||
|
||||
See [Query data in InfluxDB](/influxdb/cloud-serverless/query-data/).
|
||||
|
||||
### query plan
|
||||
|
||||
A sequence of steps (_nodes_) that the InfluxDB Querier devises and executes to calculate the result of the query in the least amount of time.
|
||||
A _logical plan_ is a high level representation of a query and doesn't consider cluster configuration or data organization.
|
||||
A _physical plan_ represents the query execution plan and data flow through plan nodes that read (_scan_), deduplicate, merge, filter, and sort data.
|
||||
A physical plan is optimized for the cluster configuration and data organization.
|
||||
|
||||
See [Query plans](/influxdb/cloud-serverless/reference/internals/query-plans/).
|
||||
|
||||
## R
|
||||
|
||||
### REPL
|
||||
|
@ -755,8 +761,7 @@ relative to [now](#now).
|
|||
The minimum retention period is **one hour**.
|
||||
|
||||
Related entries:
|
||||
[bucket](#bucket),
|
||||
[shard group duration](#shard-group-duration)
|
||||
[bucket](#bucket)
|
||||
|
||||
### retention policy (RP)
|
||||
|
||||
|
@ -797,6 +802,18 @@ Related entries:
|
|||
[timestamp](#timestamp),
|
||||
[unix timestamp](#unix-timestamp)
|
||||
|
||||
### row
|
||||
|
||||
A row in a [table](#table) represents a specific record or instance of data.
|
||||
[Column](#column) values in a row represent specific attributes or properties of the instance.
|
||||
Each row has a [primary key](/#primary-key) that makes the row unique from other rows in the table.
|
||||
|
||||
Related entries:
|
||||
[column](#column),
|
||||
[primary key](#primary-key),
|
||||
[series](#series),
|
||||
[table](#table)
|
||||
|
||||
## S
|
||||
|
||||
### schema
|
||||
|
@ -816,7 +833,7 @@ Related entries:
|
|||
### secret
|
||||
|
||||
Secrets are key-value pairs that contain information you want to control access
|
||||
o, such as API keys, passwords, or certificates.
|
||||
to, such as API keys, passwords, or certificates.
|
||||
|
||||
### selector
|
||||
|
||||
|
@ -886,7 +903,7 @@ A series key identifies a particular series by measurement, tag set, and field k
|
|||
|
||||
For example:
|
||||
|
||||
```
|
||||
```text
|
||||
# measurement, tag set, field key
|
||||
h2o_level, location=santa_monica, h2o_feet
|
||||
```
|
||||
|
@ -953,7 +970,6 @@ Related entries:
|
|||
The key of a tag key-value pair.
|
||||
Tag keys are strings and store metadata.
|
||||
|
||||
|
||||
Related entries:
|
||||
[field key](#field-key),
|
||||
[tag](#tag),
|
||||
|
@ -1025,6 +1041,14 @@ Tokens provide authorization to perform specific actions in InfluxDB.
|
|||
Related entries:
|
||||
[Manage tokens](/influxdb/cloud-serverless/admin/tokens/)
|
||||
|
||||
### transformation
|
||||
|
||||
Data transformation refers to the process of converting or modifying input data from one format, value, or structure to another.
|
||||
|
||||
InfluxQL [transformation functions](/influxdb/cloud-serverless/reference/influxql/functions/transformations/) modify and return values in each row of queried data, but do not return an aggregated value across those rows.
|
||||
|
||||
Related entries: [aggregate](#aggregate), [function](#function), [selector](#selector)
|
||||
|
||||
### TSM (Time Structured Merge tree)
|
||||
|
||||
The InfluxDB v1 and v2 data storage format that allows greater compaction and
|
||||
|
@ -1085,7 +1109,7 @@ InfluxDB users are granted permission to access to InfluxDB.
|
|||
|
||||
### values per second
|
||||
|
||||
The preferred measurement of the rate at which data are persisted to InfluxDB.
|
||||
The preferred measurement of the rate at which data is persisted to InfluxDB.
|
||||
Write speeds are generally quoted in values per second.
|
||||
|
||||
To calculate the values per second rate, multiply the number of points written
|
||||
|
|
|
@ -9,7 +9,7 @@ menu:
|
|||
influxdb_cloud_serverless:
|
||||
name: Data durability
|
||||
parent: InfluxDB Cloud internals
|
||||
influxdb/cloud-dedicated/tags: [backups, internals]
|
||||
influxdb/cloud-serverless/tags: [backups, internals]
|
||||
related:
|
||||
- https://docs.aws.amazon.com/AmazonS3/latest/userguide/DataDurability.html, AWS S3 Data Durabililty
|
||||
---
|
||||
|
@ -43,7 +43,7 @@ youngest data in the Parquet file ages out of retention.
|
|||
## Data ingest
|
||||
|
||||
When data is written to {{< product-name >}}, the data is first written to a
|
||||
Write-Ahead-Log (WAL) on locally-attached storage on the ingester node before
|
||||
Write-Ahead-Log (WAL) on locally attached storage on the ingester node before
|
||||
the write request is acknowledged. After acknowledging the write request, the
|
||||
ingester holds the data in memory temporarily and then writes the contents of
|
||||
the WAL to Parquet files in object storage and updates the InfluxDB catalog to
|
||||
|
@ -55,7 +55,7 @@ the WAL to the Parquet files before shutting down.
|
|||
|
||||
{{< product-name >}} implements the following data backup strategies:
|
||||
|
||||
- **Backup of WAL file**: The WAL file is written on locally-attached storage.
|
||||
- **Backup of WAL file**: The WAL file is written on locally attached storage.
|
||||
If an ingester process fails, the new ingester simply reads the WAL file on
|
||||
startup and continues normal operation. WAL files are maintained until their
|
||||
contents have been written to the Parquet files in object storage.
|
||||
|
|
|
@ -0,0 +1,392 @@
|
|||
---
|
||||
title: Query plans
|
||||
description: >
|
||||
A query plan is a sequence of steps that the InfluxDB Querier devises and executes to calculate the result of a query in the least amount of time.
|
||||
InfluxDB query plans include DataFusion and InfluxDB logical plan and execution plan nodes for scanning, deduplicating, filtering, merging, and sorting data.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_cloud_serverless:
|
||||
name: Query plans
|
||||
parent: InfluxDB internals
|
||||
influxdb/cloud-serverless/tags: [query, sql, influxql]
|
||||
related:
|
||||
- /influxdb/cloud-serverless/query-data/sql/
|
||||
- /influxdb/cloud-serverless/query-data/influxql/
|
||||
- /influxdb/cloud-serverless/query-data/execute-queries/analyze-query-plan/
|
||||
- /influxdb/cloud-serverless/query-data/execute-queries/troubleshoot/
|
||||
- /influxdb/cloud-serverless/reference/internals/storage-engine/
|
||||
---
|
||||
|
||||
A query plan is a sequence of steps that the InfluxDB v3 [Querier](/influxdb/cloud-serverless/reference/internals/storage-engine/#querier) devises and executes to calculate the result of a query.
|
||||
The Querier uses DataFusion and Arrow to build and execute query plans
|
||||
that call DataFusion and InfluxDB-specific operators that read data from the [Object store](/influxdb/cloud-serverless/reference/internals/storage-engine/#object-store), and the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester), and apply query transformations, such as deduplicating, filtering, aggregating, merging, projecting, and sorting to calculate the final result.
|
||||
|
||||
Like many other databases, the [Querier](/influxdb/cloud-serverless/reference/internals/storage-engine/#querier) contains a Query Optimizer.
|
||||
After it parses an incoming query, the [Querier](/influxdb/cloud-serverless/reference/internals/storage-engine/#querier) builds a _logical plan_--a sequence of high-level steps such as scanning, filtering, and sorting, required for the query.
|
||||
Following the logical plan, the [Querier](/influxdb/cloud-serverless/reference/internals/storage-engine/#querier) then builds the optimal _physical plan_ to calculate the correct result in the least amount of time.
|
||||
The plan takes advantage of data partitioning by the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester) to parallelize plan operations and prune unnecessary data before executing the plan.
|
||||
The [Querier](/influxdb/cloud-serverless/reference/internals/storage-engine/#querier) also applies common techniques of predicate and projection pushdown to further prune data as early as possible.
|
||||
|
||||
- [Display syntax](#display-syntax)
|
||||
- [Example logical and physical plan](#example-logical-and-physical-plan)
|
||||
- [Data flow](#data-flow)
|
||||
- [Logical plan](#logical-plan)
|
||||
- [`LogicalPlan` nodes](#logicalplan-nodes)
|
||||
- [`TableScan`](#tablescan)
|
||||
- [`Projection`](#projection)
|
||||
- [`Filter`](#filter)
|
||||
- [`Sort`](#sort)
|
||||
- [Physical plan](#physical-plan)
|
||||
- [`ExecutionPlan` nodes](#executionplan-nodes)
|
||||
- [`DeduplicateExec`](#deduplicateexec)
|
||||
- [`EmptyExec`](#emptyexec)
|
||||
- [`FilterExec`](#filterexec)
|
||||
- [`ParquetExec`](#parquetexec)
|
||||
- [`ProjectionExec`](#projectionexec)
|
||||
- [`RecordBatchesExec`](#recordbatchesexec)
|
||||
- [`SortExec`](#sortexec)
|
||||
- [`SortPreservingMergeExec`](#sortpreservingmergeexec)
|
||||
- [Overlapping data and deduplication](#overlapping-data-and-deduplication)
|
||||
- [Example of overlapping data](#example-of-overlapping-data)
|
||||
- [DataFusion query plans](#datafusion-query-plans)
|
||||
|
||||
## Display syntax
|
||||
|
||||
[Logical](#logical-plan) and [physical query plans](#physical-plan) are represented (for example, in an `EXPLAIN` report) in _tree syntax_.
|
||||
|
||||
- Each plan is represented as an upside-down tree composed of _nodes_.
|
||||
- A parent node awaits the output of its child nodes.
|
||||
- Data flows up from the bottom innermost nodes of the tree to the outermost _root node_ at the top.
|
||||
|
||||
### Example logical and physical plan
|
||||
|
||||
The following query generates an `EXPLAIN` report that includes a logical and a physical plan:
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;
|
||||
```
|
||||
|
||||
The output is the following:
|
||||
|
||||
#### Figure 1. EXPLAIN report
|
||||
|
||||
```sql
|
||||
| plan_type | plan |
|
||||
+---------------+--------------------------------------------------------------------------+
|
||||
| logical_plan | Sort: h2o.city ASC NULLS LAST, h2o.time DESC NULLS FIRST |
|
||||
| | TableScan: h2o projection=[city, min_temp, time] |
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | UnionExec |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
Output from `EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;`
|
||||
{{% /caption %}}
|
||||
|
||||
The leaf nodes in the [Figure 1](#figure-1-explain-report) physical plan are parallel `ParquetExec` nodes:
|
||||
|
||||
```text
|
||||
ParquetExec: file_groups={...}, projection=[city, min_temp, time]
|
||||
...
|
||||
ParquetExec: file_groups={...}, projection=[city, min_temp, time]
|
||||
```
|
||||
|
||||
## Data flow
|
||||
|
||||
A [physical plan](#physical-plan) node represents a specific implementation of `ExecutionPlan` that receives an input stream, applies expressions for filtering and sorting, and then yields an output stream to its parent node.
|
||||
|
||||
The following diagram shows the data flow and sequence of `ExecutionPlan` nodes in the [Figure 1](#figure-1-explain-report) physical plan:
|
||||
|
||||
<!-- BEGIN Query plan diagram -->
|
||||
{{< html-diagram/query-plan >}}
|
||||
<!-- END Query plan diagram -->
|
||||
|
||||
{{% product-name %}} includes the following plan expressions:
|
||||
|
||||
## Logical plan
|
||||
|
||||
A logical plan for a query:
|
||||
|
||||
- is a high-level plan that expresses the "intent" of a query and the steps required for calculating the result.
|
||||
- requires information about the data schema
|
||||
- is independent of the [physical execution](#physical-plan), cluster configuration, data source (Ingester or Object store), or how data is organized or partitioned
|
||||
- is displayed as a tree of [DataFusion `LogicalPlan` nodes](#logical-plan-nodes)
|
||||
|
||||
## `LogicalPlan` nodes
|
||||
|
||||
Each node in an {{% product-name %}} logical plan tree represents a [`LogicalPlan` implementation](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html#variants) that receives criteria extracted from the query and applies relational operators and optimizations for transforming input data to an output table.
|
||||
|
||||
The following are some `LogicalPlan` nodes used in InfluxDB logical plans.
|
||||
|
||||
### `TableScan`
|
||||
|
||||
[`Tablescan`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.TableScan.html) retrieves rows from a table provider by reference or from the context.
|
||||
|
||||
### `Projection`
|
||||
|
||||
[`Projection`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Projection.html) evaluates an arbitrary list of expressions on the input; equivalent to an SQL `SELECT` statement with an expression list.
|
||||
|
||||
### `Filter`
|
||||
|
||||
[`Filter`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Filter.html) filters rows from the input that do not satisfy the specified expression; equivalent to an SQL `WHERE` clause with a predicate expression.
|
||||
|
||||
### `Sort`
|
||||
|
||||
[`Sort`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Sort.html) sorts the input according to a list of sort expressions; used to implement SQL `ORDER BY`.
|
||||
|
||||
For details and a list of `LogicalPlan` implementations, see [`Enum datafusion::logical_expr::LogicalPlan` Variants](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html#variants) in the DataFusion documentation.
|
||||
|
||||
## Physical plan
|
||||
|
||||
A physical plan, or _execution plan_, for a query:
|
||||
|
||||
- is an optimized plan that derives from the [logical plan](#logical-plan) and contains the low-level steps for query execution.
|
||||
- considers the cluster configuration (for example, CPU and memory allocation) and data organization (for example: partitions, the number of files, and whether files overlap)--for example:
|
||||
- If you run the same query with the same data on different clusters with different configurations, each cluster may generate a different physical plan for the query.
|
||||
- If you run the same query on the same cluster at different times, the physical plan may differ each time, depending on the data at query time.
|
||||
- if generated using `ANALYZE`, includes runtime metrics sampled during query execution
|
||||
- is displayed as a tree of [`ExecutionPlan` nodes](#execution-plan-nodes)
|
||||
|
||||
## `ExecutionPlan` nodes
|
||||
|
||||
Each node in an {{% product-name %}} physical plan represents a call to a specific implementation of the [DataFusion `ExecutionPlan`](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html)
|
||||
that receives input data, query criteria expressions, and an output schema.
|
||||
|
||||
The following are some `ExecutionPlan` nodes used in InfluxDB physical plans.
|
||||
|
||||
### `DeduplicateExec`
|
||||
|
||||
InfluxDB `DeduplicateExec` takes an input stream of `RecordBatch` sorted on `sort_key` and applies InfluxDB-specific deduplication logic.
|
||||
The output is dependent on the order of the input rows that have the same key.
|
||||
|
||||
### `EmptyExec`
|
||||
|
||||
DataFusion [`EmptyExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/empty/struct.EmptyExec.html) is an execution plan for an empty relation and indicates that the table doesn't contain data for the time range of the query.
|
||||
|
||||
### `FilterExec`
|
||||
|
||||
The execution plan for the [`Filter`](#filter) `LogicalPlan`.
|
||||
|
||||
DataFusion [`FilterExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/filter/struct.FilterExec.html) evaluates a boolean predicate against all input batches to determine which rows to include in the output batches.
|
||||
|
||||
### `ParquetExec`
|
||||
|
||||
DataFusion [`ParquetExec`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/parquet/struct.ParquetExec.html) scans one or more Parquet partitions.
|
||||
|
||||
#### `ParquetExec` expressions
|
||||
|
||||
##### `file_groups`
|
||||
|
||||
A _file group_ is a list of files to scan.
|
||||
Files are referenced by path:
|
||||
|
||||
- `1/1/b862a7e9b.../243db601-....parquet`
|
||||
- `1/1/b862a7e9b.../f5fb7c7d-....parquet`
|
||||
|
||||
In InfluxDB v3, the path structure represents how data is organized.
|
||||
|
||||
A path has the following structure:
|
||||
|
||||
```text
|
||||
<namespace_id>/<table_id>/<partition_hash_id>/<uuid_of_the_file>.parquet
|
||||
1 / 1 /b862a7e9b329ee6a4.../243db601-f3f1-4....parquet
|
||||
```
|
||||
|
||||
- `namespace_id`: the namespace (database) being queried
|
||||
- `table_id`: the table (measurement) being queried
|
||||
- `partition_hash_id`: the partition this file belongs to.
|
||||
You can count partition IDs to find how many partitions the query reads.
|
||||
- `uuid_of_the_file`: the file identifier.
|
||||
|
||||
`ParquetExec` processes groups in parallel and reads the files in each group sequentially.
|
||||
|
||||
##### `projection`
|
||||
|
||||
`projection` lists the table columns that the query plan needs to read to execute the query.
|
||||
The parameter name `projection` refers to _projection pushdown_, the action of filtering columns.
|
||||
|
||||
Consider the following sample data that contains many columns:
|
||||
|
||||
```text
|
||||
h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600
|
||||
```
|
||||
|
||||
| table | state | city | min_temp | max_temp | area | time |
|
||||
|:-----:|:-----:|:----:|:--------:|:--------:|:----:|:----:|
|
||||
| h2o | CA | SF | 68.4 | 85.7 | 500u | 600 |
|
||||
|
||||
However, the following SQL query specifies only three columns (`city`, `state`, and `time`):
|
||||
|
||||
```sql
|
||||
SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
When processing the query, the [Querier](/influxdb/cloud-serverless/reference/internals/storage-engine/#querier) specifies the three required columns in the projection and the projection is "pushed down" to leaf nodes--columns not specified are pruned as early as possible during query execution.
|
||||
|
||||
```text
|
||||
projection=[city, state, time]
|
||||
```
|
||||
|
||||
##### `output_ordering`
|
||||
|
||||
`output_ordering` specifies the sort order for the output.
|
||||
The Querier specifies `output_ordering` if the output should be ordered and if the [Querier](/influxdb/cloud-serverless/reference/internals/storage-engine/#querier) knows the order.
|
||||
|
||||
When storing data to Parquet files, InfluxDB sorts the data to improve storage compression and query efficiency and the planner tries to preserve that order for as long as possible.
|
||||
Generally, the `output_ordering` value that `ParquetExec` receives is the ordering (or a subset of the ordering) of stored data.
|
||||
|
||||
_By design, [`RecordBatchesExec`](#recordbatchesexec) data isn't sorted._
|
||||
|
||||
In the following example, the query planner specifies the output sort order `state ASC, city ASC, time ASC,`:
|
||||
|
||||
```text
|
||||
output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC]
|
||||
```
|
||||
|
||||
##### `predicate`
|
||||
|
||||
`predicate` is the data filter specified in the query and used for row filtering when scanning Parquet files.
|
||||
|
||||
For example, given the following SQL query:
|
||||
|
||||
```sql
|
||||
SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
The `predicate` value is the boolean expression in the `WHERE` statement:
|
||||
|
||||
```text
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA
|
||||
```
|
||||
|
||||
##### `pruning predicate`
|
||||
|
||||
`pruning_predicate` is created from the [`predicate`](#predicate) value and is used for pruning data and files from the chosen partitions.
|
||||
|
||||
For example, given the following `predicate` parsed from the SQL:
|
||||
|
||||
```text
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA,
|
||||
```
|
||||
|
||||
The Querier creates the following `pruning_predicate`:
|
||||
|
||||
```text
|
||||
pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3
|
||||
```
|
||||
|
||||
The default filters files by `time`.
|
||||
|
||||
_Before the physical plan is generated, an additional `partition pruning` step uses predicates on partitioning columns to prune partitions._
|
||||
|
||||
### `ProjectionExec`
|
||||
|
||||
DataFusion [`ProjectionExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/projection/struct.ProjectionExec.html) evaluates an arbitrary list of expressions on the input; the execution plan for the [`Projection`](#projection) `LogicalPlan`.
|
||||
|
||||
### `RecordBatchesExec`
|
||||
|
||||
The InfluxDB `RecordBatchesExec` implementation retrieves and scans recently written, yet-to-be-persisted, data from the InfluxDB v3 [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester).
|
||||
|
||||
When generating the plan, the [Querier](/influxdb/cloud-serverless/reference/internals/storage-engine/#querier) sends the query criteria, such as database (bucket), table (measurement), and columns, to the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester) to retrieve data not yet persisted to Parquet files.
|
||||
If the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester) has data that meets the criteria (the chunk size is non-zero), then the plan includes `RecordBatchesExec`.
|
||||
|
||||
#### `RecordBatchesExec` attributes
|
||||
|
||||
##### `chunks`
|
||||
|
||||
`chunks` is the number of data chunks from the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester).
|
||||
Often one (`1`), but it can be many.
|
||||
|
||||
##### `projection`
|
||||
|
||||
`projection` specifies a list of columns to read and output.
|
||||
|
||||
`__chunk_order` in a list of columns is an InfluxDB-generated column used to keep the chunks and files ordered for deduplication--for example:
|
||||
|
||||
```text
|
||||
projection=[__chunk_order, city, state, time]
|
||||
```
|
||||
|
||||
For details and other DataFusion `ExecutionPlan` implementations, see [`Struct datafusion::datasource::physical_plan` implementors](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html) in the DataFusion documentation.
|
||||
|
||||
### `SortExec`
|
||||
|
||||
The execution plan for the [`Sort`](#sort) `LogicalPlan`.
|
||||
|
||||
DataFusion [`SortExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/sorts/sort/struct.SortExec.html) supports sorting datasets that are larger than the memory allotted by the memory manager, by spilling to disk.
|
||||
|
||||
### `SortPreservingMergeExec`
|
||||
|
||||
DataFusion [`SortPreservingMergeExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/sorts/sort_preserving_merge/struct.SortPreservingMergeExec.html) takes an input execution plan and a list of sort expressions and, provided each partition of the input plan is sorted with respect to these sort expressions, yields a single partition sorted with respect to them.
|
||||
|
||||
#### `UnionExec`
|
||||
|
||||
DataFusion [`UnionExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/union/struct.UnionExec.html) is the `UNION ALL` execution plan for combining multiple inputs that have the same schema.
|
||||
`UnionExec` concatenates the partitions and does not mix or copy data within or across partitions.
|
||||
|
||||
## Overlapping data and deduplication
|
||||
|
||||
_Overlapping data_ refers to files or batches in which the time ranges (represented by timestamps) intersect.
|
||||
Two _chunks_ of data overlap if both chunks contain data for the same portion of time.
|
||||
|
||||
### Example of overlapping data
|
||||
|
||||
For example, the following chunks represent line protocol written to InfluxDB:
|
||||
|
||||
```text
|
||||
// Chunk 4: stored parquet file
|
||||
// - time range: 400-600
|
||||
// - no duplicates in its own chunk
|
||||
// - overlaps chunk 3
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600",
|
||||
"h2o,state=CA,city=SJ min_temp=69.5,max_temp=89.2 600", // duplicates row 3 in chunk 5
|
||||
"h2o,state=MA,city=Bedford max_temp=80.75,area=742u 400", // overlaps chunk 3
|
||||
"h2o,state=MA,city=Boston min_temp=65.40,max_temp=82.67 400", // overlaps chunk 3
|
||||
],
|
||||
|
||||
// Chunk 5: Ingester data
|
||||
// - time range: 550-700
|
||||
// - overlaps & duplicates data in chunk 4
|
||||
[
|
||||
"h2o,state=MA,city=Bedford max_temp=88.75,area=742u 600", // overlaps chunk 4
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 650",
|
||||
"h2o,state=CA,city=SJ min_temp=68.5,max_temp=90.0 600", // duplicates row 2 in chunk 4
|
||||
"h2o,state=CA,city=SJ min_temp=75.5,max_temp=84.08 700",
|
||||
"h2o,state=MA,city=Boston min_temp=67.4 550", // overlaps chunk 4
|
||||
]
|
||||
```
|
||||
|
||||
- `Chunk 4` spans the time range `400-600` and represents data persisted to a Parquet file in the [Object store](/influxdb/cloud-serverless/reference/internals/storage-engine/#object-store).
|
||||
- `Chunk 5` spans the time range `550-700` and represents yet-to-be persisted data from the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester).
|
||||
- The chunks overlap the range `550-600`.
|
||||
|
||||
If data overlaps at query time, the [Querier](/influxdb/cloud-serverless/reference/internals/storage-engine/#querier) must include the _deduplication_ process in the query plan, which uses the same multi-column sort-merge operators used by the [Ingester](/influxdb/cloud-serverless/reference/internals/storage-engine/#ingester).
|
||||
Compared to an ingestion plan that uses sort-merge operators, a query plan is more complex and ensures that data streams through the plan after deduplication.
|
||||
|
||||
Because sort-merge operations used in deduplication have a non-trivial execution cost, InfluxDB v3 tries to avoid the need for deduplication.
|
||||
Due to how InfluxDB organizes data, a Parquet file never contains duplicates of the data it stores; only overlapped data can contain duplicates.
|
||||
During compaction, the [Compactor](/influxdb/cloud-serverless/reference/internals/storage-engine/#compactor) sorts stored data to reduce overlaps and optimize query performance.
|
||||
For data that doesn't have overlaps, the [Querier](/influxdb/cloud-serverless/reference/internals/storage-engine/#querier) doesn't need to include the deduplication process and the query plan can further distribute non-overlapping data for parallel processing.
|
||||
|
||||
## DataFusion query plans
|
||||
|
||||
For more information about DataFusion query plans and the DataFusion API used in InfluxDB v3, see the following:
|
||||
|
||||
- [Query Planning and Execution Overview](https://docs.rs/datafusion/latest/datafusion/index.html#query-planning-and-execution-overview) in the DataFusion documentation.
|
||||
- [Plan representations](https://docs.rs/datafusion/latest/datafusion/#plan-representations) in the DataFusion documentation.
|
|
@ -1,31 +1,41 @@
|
|||
---
|
||||
title: EXPLAIN command
|
||||
description: >
|
||||
The `EXPLAIN` command shows the logical and physical execution plan for the
|
||||
specified SQL statement.
|
||||
The `EXPLAIN` command returns the logical and physical execution plans for the specified SQL statement.
|
||||
menu:
|
||||
influxdb_cloud_serverless:
|
||||
name: EXPLAIN command
|
||||
parent: SQL reference
|
||||
weight: 207
|
||||
related:
|
||||
- /influxdb/cloud-serverless/reference/internals/query-plan/
|
||||
- /influxdb/cloud-serverless/query-data/execute-queries/analyze-query-plan/
|
||||
- /influxdb/cloud-serverless/query-data/execute-queries/troubleshoot/
|
||||
---
|
||||
|
||||
The `EXPLAIN` command returns the logical and physical execution plan for the
|
||||
The `EXPLAIN` command returns the [logical plan](/influxdb/cloud-serverless/reference/internals/query-plan/#logical-plan) and the [physical plan](/influxdb/cloud-serverless/reference/internals/query-plan/#physical-plan) for the
|
||||
specified SQL statement.
|
||||
|
||||
```sql
|
||||
EXPLAIN [ANALYZE] [VERBOSE] statement
|
||||
```
|
||||
|
||||
- [EXPLAIN](#explain)
|
||||
- [EXPLAIN ANALYZE](#explain-analyze)
|
||||
- [`EXPLAIN`](#explain)
|
||||
- [Example `EXPLAIN`](#example-explain)
|
||||
- [`EXPLAIN ANALYZE`](#explain-analyze)
|
||||
- [Example `EXPLAIN ANALYZE`](#example-explain-analyze)
|
||||
- [`EXPLAIN ANALYZE VERBOSE`](#explain-analyze-verbose)
|
||||
- [Example `EXPLAIN ANALYZE VERBOSE`](#example-explain-analyze-verbose)
|
||||
|
||||
## EXPLAIN
|
||||
## `EXPLAIN`
|
||||
|
||||
Returns the execution plan of a statement.
|
||||
Returns the logical plan and physical (execution) plan of a statement.
|
||||
To output more details, use `EXPLAIN VERBOSE`.
|
||||
|
||||
##### Example EXPLAIN ANALYZE
|
||||
`EXPLAIN` doesn't execute the statement.
|
||||
To execute the statement and view runtime metrics, use [`EXPLAIN ANALYZE`](#explain-analyze).
|
||||
|
||||
### Example `EXPLAIN`
|
||||
|
||||
```sql
|
||||
EXPLAIN
|
||||
|
@ -39,20 +49,30 @@ GROUP BY room
|
|||
{{< expand-wrapper >}}
|
||||
{{% expand "View `EXPLAIN` example output" %}}
|
||||
|
||||
| plan_type | plan |
|
||||
| :------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| logical_plan | Projection: home.room, AVG(home.temp) AS temp Aggregate: groupBy=[[home.room]], aggr=[[AVG(home.temp)]] TableScan: home projection=[room, temp] |
|
||||
| physical_plan | ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp] AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)] CoalesceBatchesExec: target_batch_size=8192 RepartitionExec: partitioning=Hash([Column { name: "room", index: 0 }], 4), input_partitions=4 RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)] ParquetExec: limit=None, partitions={1 group: [[136/316/1120/1ede0031-e86e-06e5-12ba-b8e6fd76a202.parquet]]}, projection=[room, temp] |
|
||||
| | plan_type | plan |
|
||||
|---:|:--------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| 0 | logical_plan |<span style="white-space:pre-wrap;"> Projection: home.room, AVG(home.temp) AS temp </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> Aggregate: groupBy=[[home.room]], aggr=[[AVG(home.temp)]] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> TableScan: home projection=[room, temp] </span>|
|
||||
| 1 | physical_plan |<span style="white-space:pre-wrap;"> ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> CoalesceBatchesExec: target_batch_size=8192 </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> RepartitionExec: partitioning=Hash([room@0], 8), input_partitions=8 </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> ParquetExec: file_groups={8 groups: [[70434/116281/404d73cea0236530ea94f5470701eb814a8f0565c0e4bef5a2d2e33dfbfc3567/1be334e8-0af8-00da-2615-f67cd4be90f7.parquet, 70434/116281/b7a9e7c57fbfc3bba9427e4b3e35c89e001e2e618b0c7eb9feb4d50a3932f4db/d29370d4-262f-0d32-2459-fe7b099f682f.parquet], [70434/116281/c14418ba28a22a3abb693a1cb326a63b62dc611aec58c9bed438fdafd3bc5882/8b29ae98-761f-0550-2fe4-ee77503658e9.parquet], [70434/116281/fa677477eed622ae8123da1251aa7c351f801e2ee2f0bc28c0fe3002a30b3563/65bb4dc3-04e1-0e02-107a-90cee83c51b0.parquet], [70434/116281/db162bdd30261019960dd70da182e6ebd270284569ecfb5deffea7e65baa0df9/2505e079-67c5-06d9-3ede-89aca542dd18.parquet], [70434/116281/0c025dcccae8691f5fd70b0f131eea4ca6fafb95a02f90a3dc7bb015efd3ab4f/3f3e44c3-b71e-0ca4-3dc7-8b2f75b9ff86.parquet], ...]}, projection=[room, temp] </span>|
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
## EXPLAIN ANALYZE
|
||||
## `EXPLAIN ANALYZE`
|
||||
|
||||
Returns the execution plan and metrics of a statement.
|
||||
To output more information, use `EXPLAIN ANALYZE VERBOSE`.
|
||||
Executes a statement and returns the execution plan and runtime metrics of the statement.
|
||||
The report includes the [logical plan](/influxdb/cloud-serverless/reference/internals/query-plan/#logical-plan) and the [physical plan](/influxdb/cloud-serverless/reference/internals/query-plan/#physical-plan) annotated with execution counters, number of rows produced, and runtime metrics sampled during the query execution.
|
||||
|
||||
##### Example EXPLAIN ANALYZE
|
||||
If the plan requires reading lots of data files, `EXPLAIN` and `EXPLAIN ANALYZE` may truncate the list of files in the report.
|
||||
To output more information, including intermediate plans and paths for all scanned Parquet files, use [`EXPLAIN ANALYZE VERBOSE`](#explain-analyze-verbose).
|
||||
|
||||
### Example `EXPLAIN ANALYZE`
|
||||
|
||||
```sql
|
||||
EXPLAIN ANALYZE
|
||||
|
@ -60,15 +80,44 @@ SELECT
|
|||
room,
|
||||
avg(temp) AS temp
|
||||
FROM home
|
||||
WHERE time >= '2023-01-01' AND time <= '2023-12-31'
|
||||
GROUP BY room
|
||||
```
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View `EXPLAIN ANALYZE` example output" %}}
|
||||
|
||||
| plan_type | plan |
|
||||
| :---------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Plan with Metrics | CoalescePartitionsExec, metrics=[output_rows=2, elapsed_compute=8.892µs, spill_count=0, spilled_bytes=0, mem_used=0] ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp], metrics=[output_rows=2, elapsed_compute=3.608µs, spill_count=0, spilled_bytes=0, mem_used=0] AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)], metrics=[output_rows=2, elapsed_compute=121.771µs, spill_count=0, spilled_bytes=0, mem_used=0] CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=2, elapsed_compute=23.711µs, spill_count=0, spilled_bytes=0, mem_used=0] RepartitionExec: partitioning=Hash([Column { name: "room", index: 0 }], 4), input_partitions=4, metrics=[repart_time=25.117µs, fetch_time=1.614597ms, send_time=6.705µs] RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1, metrics=[repart_time=1ns, fetch_time=319.754µs, send_time=2.067µs] AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)], metrics=[output_rows=2, elapsed_compute=75.615µs, spill_count=0, spilled_bytes=0, mem_used=0] ParquetExec: limit=None, partitions={1 group: [[136/316/1120/1ede0031-e86e-06e5-12ba-b8e6fd76a202.parquet]]}, projection=[room, temp], metrics=[output_rows=26, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, pushdown_rows_filtered=0, bytes_scanned=290, row_groups_pruned=0, num_predicate_creation_errors=0, predicate_evaluation_errors=0, page_index_rows_filtered=0, time_elapsed_opening=100.37µs, page_index_eval_time=2ns, time_elapsed_scanning_total=157.086µs, time_elapsed_processing=226.644µs, pushdown_eval_time=2ns, time_elapsed_scanning_until_data=116.875µs] |
|
||||
| | plan_type | plan |
|
||||
|---:|:------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| 0 | Plan with Metrics |<span style="white-space:pre-wrap;"> ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp], metrics=[output_rows=2, elapsed_compute=4.768µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)], ordering_mode=Sorted, metrics=[output_rows=2, elapsed_compute=140.405µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=2, elapsed_compute=6.821µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> RepartitionExec: partitioning=Hash([room@0], 8), input_partitions=8, preserve_order=true, sort_exprs=room@0 ASC, metrics=[output_rows=2, elapsed_compute=18.408µs, repart_time=59.698µs, fetch_time=1.057882762s, send_time=5.83µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)], ordering_mode=Sorted, metrics=[output_rows=2, elapsed_compute=137.577µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=6, preserve_order=true, sort_exprs=room@0 ASC, metrics=[output_rows=46, elapsed_compute=26.637µs, repart_time=6ns, fetch_time=399.971411ms, send_time=6.658µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> ProjectionExec: expr=[room@0 as room, temp@2 as temp], metrics=[output_rows=46, elapsed_compute=3.102µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=46, elapsed_compute=25.585µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> FilterExec: time@1 >= 1672531200000000000 AND time@1 <= 1703980800000000000, metrics=[output_rows=46, elapsed_compute=26.51µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> ParquetExec: file_groups={6 groups: [[70434/116281/404d73cea0236530ea94f5470701eb814a8f0565c0e4bef5a2d2e33dfbfc3567/1be334e8-0af8-00da-2615-f67cd4be90f7.parquet], [70434/116281/c14418ba28a22a3abb693a1cb326a63b62dc611aec58c9bed438fdafd3bc5882/8b29ae98-761f-0550-2fe4-ee77503658e9.parquet], [70434/116281/fa677477eed622ae8123da1251aa7c351f801e2ee2f0bc28c0fe3002a30b3563/65bb4dc3-04e1-0e02-107a-90cee83c51b0.parquet], [70434/116281/db162bdd30261019960dd70da182e6ebd270284569ecfb5deffea7e65baa0df9/2505e079-67c5-06d9-3ede-89aca542dd18.parquet], [70434/116281/0c025dcccae8691f5fd70b0f131eea4ca6fafb95a02f90a3dc7bb015efd3ab4f/3f3e44c3-b71e-0ca4-3dc7-8b2f75b9ff86.parquet], ...]}, projection=[room, time, temp], output_ordering=[room@0 ASC, time@1 ASC], predicate=time@6 >= 1672531200000000000 AND time@6 <= 1703980800000000000, pruning_predicate=time_max@0 >= 1672531200000000000 AND time_min@1 <= 1703980800000000000, required_guarantees=[], metrics=[output_rows=46, elapsed_compute=6ns, predicate_evaluation_errors=0, bytes_scanned=3279, row_groups_pruned_statistics=0, file_open_errors=0, file_scan_errors=0, pushdown_rows_filtered=0, num_predicate_creation_errors=0, row_groups_pruned_bloom_filter=0, page_index_rows_filtered=0, time_elapsed_opening=398.462968ms, time_elapsed_processing=1.626106ms, time_elapsed_scanning_total=1.36822ms, page_index_eval_time=33.474µs, pushdown_eval_time=14.267µs, time_elapsed_scanning_until_data=1.27694ms] </span>|
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
## `EXPLAIN ANALYZE VERBOSE`
|
||||
|
||||
Executes a statement and returns the execution plan, runtime metrics, and additional details helpful for debugging the statement.
|
||||
|
||||
The report includes the following:
|
||||
|
||||
- the [logical plan](/influxdb/cloud-serverless/reference/internals/query-plan/#logical-plan)
|
||||
- the [physical plan](/influxdb/cloud-serverless/reference/internals/query-plan/#physical-plan) annotated with execution counters, number of rows produced, and runtime metrics sampled during the query execution
|
||||
- Information truncated in the `EXPLAIN` report--for example, the paths for all [Parquet files retrieved for the query](/influxdb/cloud-serverless/reference/internals/query-plan/#file_groups).
|
||||
- All intermediate physical plans that DataFusion and the [Querier](/influxdb/cloud-serverless/reference/internals/storage-engine/#querier) generate before generating the final physical plan--helpful in debugging to see when an [`ExecutionPlan` node](/influxdb/cloud-serverless/reference/internals/query-plan/#executionplan-nodes) is added or removed, and how InfluxDB optimizes the query.
|
||||
|
||||
### Example `EXPLAIN ANALYZE VERBOSE`
|
||||
|
||||
```SQL
|
||||
EXPLAIN ANALYZE VERBOSE SELECT temp FROM home
|
||||
WHERE time >= now() - INTERVAL '7 days' AND room = 'Kitchen'
|
||||
ORDER BY time
|
||||
```
|
||||
|
|
|
@ -1,6 +1,6 @@
|
|||
StylesPath = "../../../.ci/vale/styles"
|
||||
|
||||
Vocab = InfluxData, Clustered
|
||||
Vocab = Clustered
|
||||
|
||||
MinAlertLevel = warning
|
||||
|
||||
|
|
|
@ -463,7 +463,7 @@ spec:
|
|||
secretKey:
|
||||
value: S3_SECRET_KEY
|
||||
|
||||
# Bucket that the parquet files will be stored in
|
||||
# Bucket that the Parquet files will be stored in
|
||||
bucket: S3_BUCKET_NAME
|
||||
|
||||
# This value is required for AWS S3, it may or may not be required for other providers.
|
||||
|
|
|
@ -14,7 +14,7 @@ weight: 101
|
|||
InfluxDB Clustered requires the following prerequisites:
|
||||
|
||||
- **Kubernetes cluster**: version 1.25 or higher
|
||||
- **Object storage**: AWS S3 or S3-compatible storage used to store the InfluxDB parquet files.
|
||||
- **Object storage**: AWS S3 or S3-compatible storage used to store the InfluxDB Parquet files.
|
||||
|
||||
{{% note %}}
|
||||
We **strongly** recommend that you enable object versioning in your object store.
|
||||
|
|
|
@ -14,7 +14,7 @@ aliases:
|
|||
- /influxdb/clustered/query-data/influxql/execute-queries/
|
||||
---
|
||||
|
||||
Use tools and libraries to query data stored in an {{% product-name %}} bucket.
|
||||
Use tools and libraries to query data stored in an {{% product-name %}} database.
|
||||
|
||||
InfluxDB client libraries and Flight clients can use the Flight+gRPC protocol to query with SQL or InfluxQL and retrieve data in the [Arrow in-memory format](https://arrow.apache.org/docs/format/Columnar.html).
|
||||
HTTP clients can use the InfluxDB v1 `/query` REST API to query with InfluxQL and retrieve data in JSON format.
|
||||
|
|
|
@ -1,115 +0,0 @@
|
|||
---
|
||||
title: Optimize queries
|
||||
description: >
|
||||
Optimize your SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
|
||||
weight: 401
|
||||
menu:
|
||||
influxdb_clustered:
|
||||
name: Optimize queries
|
||||
parent: Execute queries
|
||||
influxdb/clustered/tags: [query, sql, influxql]
|
||||
related:
|
||||
- /influxdb/clustered/query-data/sql/
|
||||
- /influxdb/clustered/query-data/influxql/
|
||||
- /influxdb/clustered/query-data/execute-queries/troubleshoot/
|
||||
- /influxdb/clustered/reference/client-libraries/v3/
|
||||
---
|
||||
|
||||
Use the following tools to help you identify performance bottlenecks and troubleshoot problems in queries:
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [EXPLAIN and ANALYZE](#explain-and-analyze)
|
||||
|
||||
<!-- /TOC -->
|
||||
|
||||
### EXPLAIN and ANALYZE
|
||||
|
||||
To view the query engine's execution plan and metrics for an SQL query, prepend [`EXPLAIN`](/influxdb/clustered/reference/sql/explain/) or [`EXPLAIN ANALYZE`](/influxdb/clustered/reference/sql/explain/#explain-analyze) to the query.
|
||||
The report can reveal query bottlenecks such as a large number of table scans or parquet files, and can help triage the question, "Is the query slow due to the amount of work required or due to a problem with the schema, compactor, etc.?"
|
||||
|
||||
The following example shows how to use the InfluxDB v3 Python client library and pandas to view `EXPLAIN` and `EXPLAIN ANALYZE` results for a query:
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
|
||||
{{% code-placeholders "DATABASE_(NAME|TOKEN)" %}}
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
```python
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import pandas as pd
|
||||
import tabulate # Required for pandas.to_markdown()
|
||||
|
||||
# Instantiate an InfluxDB client.
|
||||
client = InfluxDBClient3(token = f"DATABASE_TOKEN",
|
||||
host = f"{{< influxdb/host >}}",
|
||||
database = f"DATABASE_NAME")
|
||||
|
||||
sql_explain = '''EXPLAIN
|
||||
SELECT temp
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '90 days'
|
||||
AND room = 'Kitchen'
|
||||
ORDER BY time'''
|
||||
|
||||
table = client.query(sql_explain)
|
||||
df = table.to_pandas()
|
||||
print(df.to_markdown(index=False))
|
||||
|
||||
assert df.shape == (2, 2), f'Expect {df.shape} to have 2 columns, 2 rows'
|
||||
assert 'physical_plan' in df.plan_type.values, "Expect physical_plan"
|
||||
assert 'logical_plan' in df.plan_type.values, "Expect logical_plan"
|
||||
```
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View EXPLAIN example results" %}}
|
||||
| plan_type | plan |
|
||||
|:--------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| logical_plan | Projection: home.temp |
|
||||
| | Sort: home.time ASC NULLS LAST |
|
||||
| | Projection: home.temp, home.time |
|
||||
| | TableScan: home projection=[room, temp, time], full_filters=[home.time >= TimestampNanosecond(1688676582918581320, None), home.room = Dictionary(Int32, Utf8("Kitchen"))] |
|
||||
| physical_plan | ProjectionExec: expr=[temp@0 as temp] |
|
||||
| | SortExec: expr=[time@1 ASC NULLS LAST] |
|
||||
| | EmptyExec: produce_one_row=false |
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
```python
|
||||
sql_explain_analyze = '''EXPLAIN ANALYZE
|
||||
SELECT *
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '90 days'
|
||||
ORDER BY time'''
|
||||
|
||||
table = client.query(sql_explain_analyze)
|
||||
df = table.to_pandas()
|
||||
print(df.to_markdown(index=False))
|
||||
|
||||
assert df.shape == (1,2)
|
||||
assert 'Plan with Metrics' in df.plan_type.values, "Expect plan metrics"
|
||||
|
||||
client.close()
|
||||
```
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database
|
||||
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: a [database token](/influxdb/cloud-dedicated/admin/tokens/) with sufficient permissions to the specified database
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View EXPLAIN ANALYZE example results" %}}
|
||||
| plan_type | plan |
|
||||
|:------------------|:-----------------------------------------------------------------------------------------------------------------------|
|
||||
| Plan with Metrics | ProjectionExec: expr=[temp@0 as temp], metrics=[output_rows=0, elapsed_compute=1ns] |
|
||||
| | SortExec: expr=[time@1 ASC NULLS LAST], metrics=[output_rows=0, elapsed_compute=1ns, spill_count=0, spilled_bytes=0] |
|
||||
| | EmptyExec: produce_one_row=false, metrics=[]
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
|
@ -42,10 +42,10 @@ Use InfluxQL `SHOW` statements to return information about your data schema.
|
|||
The following examples use data provided in [sample data sets](/influxdb/clustered/reference/sample-data/).
|
||||
To run the example queries and return identical results, follow the instructions
|
||||
provided for each sample data set to write the data to your {{% product-name %}}
|
||||
bucket.
|
||||
database.
|
||||
{{% /note %}}
|
||||
|
||||
- [List measurements in a bucket](#list-measurements-in-a-bucket)
|
||||
- [List measurements in a database](#list-measurements-in-a-database)
|
||||
- [List measurements that contain specific tag key-value pairs](#list-measurements-that-contain-specific-tag-key-value-pairs)
|
||||
- [List measurements that match a regular expression](#list-measurements-that-match-a-regular-expression)
|
||||
- [List field keys in a measurement](#list-field-keys-in-a-measurement)
|
||||
|
@ -56,10 +56,10 @@ bucket.
|
|||
- [List tag values for tags that match a regular expression](#list-tag-values-for-tags-that-match-a-regular-expression)
|
||||
- [List tag values associated with a specific tag key-value pair](#list-tag-values-associated-with-a-specific-tag-key-value-pair)
|
||||
|
||||
## List measurements in a bucket
|
||||
## List measurements in a database
|
||||
|
||||
Use [`SHOW MEASUREMENTS`](/influxdb/clustered/reference/influxql/show/#show-measurements)
|
||||
to list measurements in your InfluxDB bucket.
|
||||
to list measurements in your InfluxDB database.
|
||||
|
||||
```sql
|
||||
SHOW MEASUREMENTS
|
||||
|
@ -83,7 +83,7 @@ name: measurements
|
|||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
#### List measurements that contain specific tag key-value pairs
|
||||
### List measurements that contain specific tag key-value pairs
|
||||
|
||||
To return only measurements with specific tag key-value pairs, include a `WHERE`
|
||||
clause with tag key-value pairs to query for.
|
||||
|
@ -107,7 +107,7 @@ name: measurements
|
|||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
#### List measurements that match a regular expression
|
||||
### List measurements that match a regular expression
|
||||
|
||||
To return only measurements with names that match a
|
||||
[regular expression](/influxdb/clustered/reference/influxql/regular-expressions/),
|
||||
|
@ -137,7 +137,7 @@ name: measurements
|
|||
Use [`SHOW FIELD KEYS`](/influxdb/clustered/reference/influxql/show/#show-field-keys)
|
||||
to return all field keys in a measurement.
|
||||
Include a `FROM` clause to specify the measurement.
|
||||
If no measurement is specified, the query returns all field keys in the bucket.
|
||||
If no measurement is specified, the query returns all field keys in the database.
|
||||
|
||||
```sql
|
||||
SHOW FIELD KEYS FROM home
|
||||
|
@ -164,7 +164,7 @@ name: home
|
|||
Use [`SHOW TAG KEYS`](/influxdb/clustered/reference/influxql/show/#show-field-keys)
|
||||
to return all tag keys in a measurement.
|
||||
Include a `FROM` clause to specify the measurement.
|
||||
If no measurement is specified, the query returns all tag keys in the bucket.
|
||||
If no measurement is specified, the query returns all tag keys in the database.
|
||||
|
||||
```sql
|
||||
SHOW TAG KEYS FROM home_actions
|
||||
|
@ -186,7 +186,7 @@ name: home_actions
|
|||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
#### List tag keys in measurements that contain a specific tag key-value pair
|
||||
### List tag keys in measurements that contain a specific tag key-value pair
|
||||
|
||||
To return all tag keys measurements that contain specific tag key-value pairs,
|
||||
include a `WHERE` clause with the tag key-value pairs to query for.
|
||||
|
@ -264,7 +264,7 @@ name: weather
|
|||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
#### List tag values for multiple tags
|
||||
### List tag values for multiple tags
|
||||
|
||||
To return tag values for multiple specific tag keys, use the `IN` operator in
|
||||
the `WITH` clause to compare `KEY` to a list of tag keys.
|
||||
|
@ -290,7 +290,7 @@ name: home_actions
|
|||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
#### List tag values for tags that match a regular expression
|
||||
### List tag values for tags that match a regular expression
|
||||
|
||||
To return only tag values from tags keys that match a regular expression, use
|
||||
regular expression comparison operators in your `WITH` clause to compare `KEY`
|
||||
|
@ -324,7 +324,7 @@ name: home_actions
|
|||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
#### List tag values associated with a specific tag key-value pair
|
||||
### List tag values associated with a specific tag key-value pair
|
||||
|
||||
To list tag values for tags associated with a specific tag key-value pair:
|
||||
|
||||
|
|
|
@ -0,0 +1,23 @@
|
|||
---
|
||||
title: Troubleshoot and optimize queries
|
||||
description: >
|
||||
Troubleshoot errors and optimize performance for SQL and InfluxQL queries in InfluxDB.
|
||||
Use observability tools to view query execution and metrics.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_clustered:
|
||||
name: Troubleshoot and optimize queries
|
||||
parent: Query data
|
||||
influxdb/clustered/tags: [query, performance, observability, errors, sql, influxql]
|
||||
related:
|
||||
- /influxdb/clustered/query-data/sql/
|
||||
- /influxdb/clustered/query-data/influxql/
|
||||
aliases:
|
||||
- /influxdb/clustered/query-data/execute-queries/troubleshoot/
|
||||
|
||||
---
|
||||
|
||||
Troubleshoot errors and optimize performance for SQL and InfluxQL queries in {{% product-name %}}.
|
||||
Use observability tools to view query execution and metrics.
|
||||
|
||||
{{< children >}}
|
|
@ -0,0 +1,771 @@
|
|||
---
|
||||
title: Analyze a query plan
|
||||
description: >
|
||||
Learn how to read and analyze a query plan to
|
||||
understand how a query is executed and find performance bottlenecks.
|
||||
weight: 401
|
||||
menu:
|
||||
influxdb_clustered:
|
||||
name: Analyze a query plan
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/clustered/tags: [query, sql, influxql, observability, query plan]
|
||||
related:
|
||||
- /influxdb/clustered/query-data/sql/
|
||||
- /influxdb/clustered/query-data/influxql/
|
||||
- /influxdb/clustered/reference/internals/query-plan/
|
||||
- /influxdb/clustered/reference/internals/storage-engine
|
||||
---
|
||||
|
||||
Learn how to read and analyze a [query plan](/influxdb/clustered/reference/glossary/#query-plan) to
|
||||
understand query execution steps and data organization, and find performance bottlenecks.
|
||||
|
||||
When you query InfluxDB v3, the Querier devises a query plan for executing the query.
|
||||
The engine tries to determine the optimal plan for the query structure and data.
|
||||
By learning how to generate and interpret reports for the query plan,
|
||||
you can better understand how the query is executed and identify bottlenecks that affect the performance of your query.
|
||||
|
||||
For example, if the query plan reveals that your query reads a large number of Parquet files,
|
||||
you can then take steps to [optimize your query](/influxdb/clustered/query-data/optimize-queries/), such as add filters to read less data or
|
||||
configure your cluster to store fewer and larger files.
|
||||
|
||||
- [Use EXPLAIN keywords to view a query plan](#use-explain-keywords-to-view-a-query-plan)
|
||||
- [Read an EXPLAIN report](#read-an-explain-report)
|
||||
- [Read a query plan](#read-a-query-plan)
|
||||
- [Example physical plan for a SELECT - ORDER BY query](#example-physical-plan-for-a-select---order-by-query)
|
||||
- [Example `EXPLAIN` report for an empty result set](#example-explain-report-for-an-empty-result-set)
|
||||
- [Analyze a query plan for leading edge data](#analyze-a-query-plan-for-leading-edge-data)
|
||||
- [Sample data](#sample-data)
|
||||
- [Sample query](#sample-query)
|
||||
- [EXPLAIN report for the leading edge data query](#explain-report-for-the-leading-edge-data-query)
|
||||
- [Locate the physical plan](#locate-the-physical-plan)
|
||||
- [Read the physical plan](#read-the-physical-plan)
|
||||
- [Data scanning nodes (ParquetExec and RecordBatchesExec)](#data-scanning-nodes-parquetexec-and-recordbatchesexec)
|
||||
- [Analyze branch structures](#analyze-branch-structures)
|
||||
|
||||
## Use EXPLAIN keywords to view a query plan
|
||||
|
||||
Use the `EXPLAIN` keyword (and the optional [`ANALYZE`](/influxdb/clustered/reference/sql/explain/#explain-analyze) and [`VERBOSE`](/influxdb/clustered/reference/sql/explain/#explain-analyze-verbose) keywords) to view the query plans for a query.
|
||||
|
||||
{{% expand-wrapper %}}
|
||||
{{% expand "Use Python and pandas to view an EXPLAIN report" %}}
|
||||
|
||||
The following example shows how to use the InfluxDB v3 Python client library and pandas to view the `EXPLAIN` report for a query:
|
||||
|
||||
<!-- Import for tests and hide from users.
|
||||
```python
|
||||
import os
|
||||
```
|
||||
-->
|
||||
|
||||
<!--pytest-codeblocks:cont-->
|
||||
|
||||
{{% code-placeholders "DATABASE_(NAME|TOKEN)" %}}
|
||||
|
||||
```python
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import pandas as pd
|
||||
import tabulate # Required for pandas.to_markdown()
|
||||
|
||||
# Instantiate an InfluxDB client.
|
||||
client = InfluxDBClient3(token = f"TOKEN",
|
||||
host = f"{{< influxdb/host >}}",
|
||||
database = f"DATABASE_NAME")
|
||||
|
||||
sql_explain = '''EXPLAIN
|
||||
SELECT temp
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '7 days'
|
||||
AND room = 'Kitchen'
|
||||
ORDER BY time'''
|
||||
|
||||
table = client.query(sql_explain)
|
||||
df = table.to_pandas()
|
||||
print(df.to_markdown(index=False))
|
||||
|
||||
assert df.shape == (2, 2), f'Expect {df.shape} to have 2 columns, 2 rows'
|
||||
assert 'physical_plan' in df.plan_type.values, "Expect physical_plan"
|
||||
assert 'logical_plan' in df.plan_type.values, "Expect logical_plan"
|
||||
```
|
||||
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database
|
||||
- {{% code-placeholder-key %}}`TOKEN`{{% /code-placeholder-key %}}: a [token](/influxdb/clustered/admin/tokens/) with sufficient permissions to the specified database
|
||||
|
||||
{{% /expand %}}
|
||||
{{% /expand-wrapper %}}
|
||||
|
||||
## Read an EXPLAIN report
|
||||
|
||||
When you [use `EXPLAIN` keywords to view a query plan](#use-explain-keywords-to-view-a-query-plan), the report contains the following:
|
||||
|
||||
- two columns: `plan_type` and `plan`
|
||||
- one row for the [logical plan](/influxdb/clustered/reference/internals/query-plan/#logical-plan) (`logical_plan`)
|
||||
- one row for the [physical plan](/influxdb/clustered/reference/internals/query-plan/#physical-plan) (`physical_plan`)
|
||||
|
||||
## Read a query plan
|
||||
|
||||
Plans are in _tree format_--each plan is an upside-down tree in which
|
||||
execution and data flow from _leaf nodes_, the innermost steps in the plan, to outer _branch nodes_.
|
||||
Whether reading a logical or physical plan, keep the following in mind:
|
||||
|
||||
- Start at the the _leaf nodes_ and read upward.
|
||||
- At the top of the plan, the _root node_ represents the final, encompassing step.
|
||||
|
||||
In a [physical plan](/influxdb/clustered/reference/internals/query-plan/#physical-plan), each step is an [`ExecutionPlan` node](/influxdb/clustered/reference/internals/query-plan/#executionplan-nodes) that receives expressions for input data and output requirements, and computes a partition of data.
|
||||
|
||||
Use the following steps to analyze a query plan and estimate how much work is required to complete the query.
|
||||
The same steps apply regardless of how large or complex the plan might seem.
|
||||
|
||||
1. Start from the furthest indented steps (the _leaf nodes_), and read upward.
|
||||
2. Understand the job of each [`ExecutionPlan` node](/influxdb/clustered/reference/internals/query-plan/#executionplan-nodes)--for example, a [`UnionExec`](/influxdb/clustered/reference/internals/query-plan/#unionexec) node encompassing the leaf nodes means that the `UnionExec` concatenates the output of all the leaves.
|
||||
3. For each expression, answer the following questions:
|
||||
- What is the shape and size of data input to the plan?
|
||||
- What is the shape and size of data output from the plan?
|
||||
|
||||
The remainder of this guide walks you through analyzing a physical plan.
|
||||
Understanding the sequence, role, input, and output of nodes in your query plan can help you estimate the overall workload and find potential bottlenecks in the query.
|
||||
|
||||
### Example physical plan for a SELECT - ORDER BY query
|
||||
|
||||
The following example shows how to read an `EXPLAIN` report and a physical query plan.
|
||||
|
||||
Given `h20` measurement data and the following query:
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;
|
||||
```
|
||||
|
||||
The output is similar to the following:
|
||||
|
||||
#### EXPLAIN report
|
||||
|
||||
```sql
|
||||
| plan_type | plan |
|
||||
+---------------+--------------------------------------------------------------------------+
|
||||
| logical_plan | Sort: h2o.city ASC NULLS LAST, h2o.time DESC NULLS FIRST |
|
||||
| | TableScan: h2o projection=[city, min_temp, time] |
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | UnionExec |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
Output from `EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;`
|
||||
{{% /caption %}}
|
||||
|
||||
Each step, or _node_, in the physical plan is an `ExecutionPlan` name and the key-value _expressions_ that contain relevant parts of the query--for example, the first node in the [`EXPLAIN` report](#explain-report) physical plan is a `ParquetExec` execution plan:
|
||||
|
||||
```text
|
||||
ParquetExec: file_groups={...}, projection=[city, min_temp, time]
|
||||
```
|
||||
|
||||
Because `ParquetExec` and `RecordBatchesExec` nodes retrieve and scan data in InfluxDB queries, every query plan starts with one or more of these nodes.
|
||||
|
||||
#### Physical plan data flow
|
||||
|
||||
Data flows _up_ in a query plan.
|
||||
|
||||
The following diagram shows the data flow and sequence of nodes in the [`EXPLAIN` report](#explain-report) physical plan:
|
||||
|
||||
<!-- BEGIN Query plan diagram -->
|
||||
{{< html-diagram/query-plan >}}
|
||||
<!-- END Query plan diagram -->
|
||||
|
||||
{{% caption %}}
|
||||
Execution and data flow in the [`EXPLAIN` report](#explain-report) physical plan.
|
||||
`ParquetExec` nodes execute in parallel and `UnionExec` combines their output.
|
||||
{{% /caption %}}
|
||||
|
||||
The following steps summarize the [physical plan execution and data flow](#physical-plan-data-flow):
|
||||
|
||||
1. Two `ParquetExec` plans, in parallel, read data from Parquet files:
|
||||
- Each `ParquetExec` node processes one or more _file groups_.
|
||||
- Each file group contains one or more Parquet file paths.
|
||||
- A `ParquetExec` node processes its groups in parallel, reading each group's files sequentially.
|
||||
- The output is a stream of data to the corresponding `SortExec` node.
|
||||
2. The `SortExec` nodes, in parallel, sort the data by `city` (ascending) and `time` (descending). Sorting is required by the `SortPreservingMergeExec` plan.
|
||||
3. The `UnionExec` node concatenates the streams to union the output of the parallel `SortExec` nodes.
|
||||
4. The `SortPreservingMergeExec` node merges the previously sorted and unioned data from `UnionExec`.
|
||||
|
||||
### Example `EXPLAIN` report for an empty result set
|
||||
|
||||
If your table doesn't contain data for the time range in your query, the physical plan starts with an `EmptyExec` leaf node--for example:
|
||||
|
||||
{{% code-callout "EmptyExec"%}}
|
||||
|
||||
```sql
|
||||
ProjectionExec: expr=[temp@0 as temp]
|
||||
SortExec: expr=[time@1 ASC NULLS LAST]
|
||||
EmptyExec: produce_one_row=false
|
||||
```
|
||||
|
||||
{{% /code-callout %}}
|
||||
|
||||
## Analyze a query plan for leading edge data
|
||||
|
||||
The following sections guide you through analyzing a physical query plan for a typical time series use case--aggregating recently written (_leading edge_) data.
|
||||
Although the query and plan are more complex than in the [preceding example](#example-physical-plan-for-a-select---order-by-query), you'll follow the same [steps to read the query plan](#read-a-query-plan).
|
||||
After learning how to read the query plan, you'll have an understanding of `ExecutionPlans`, data flow, and potential query bottlenecks.
|
||||
|
||||
### Sample data
|
||||
|
||||
Consider the following `h20` data, represented as "chunks" of line protocol, written to InfluxDB:
|
||||
|
||||
```text
|
||||
// h20 data
|
||||
// The following data represents 5 batches, or "chunks", of line protocol
|
||||
// written to InfluxDB.
|
||||
// - Chunks 1-4 are ingested and each is persisted to a separate partition file in storage.
|
||||
// - Chunk 5 is ingested and not yet persisted to storage.
|
||||
// - Chunks 1 and 2 cover short windows of time that don't overlap times in other chunks.
|
||||
// - Chunks 3 and 4 cover larger windows of time and the time ranges overlap each other.
|
||||
// - Chunk 5 contains the largest time range and overlaps with chunk 4, the Parquet file with the largest time-range.
|
||||
// - In InfluxDB, a chunk never duplicates its own data.
|
||||
//
|
||||
// Chunk 1: stored Parquet file
|
||||
// - time range: 50-249
|
||||
// - no duplicates in its own chunk
|
||||
// - no overlap with any other chunks
|
||||
[
|
||||
"h2o,state=MA,city=Bedford min_temp=71.59 150",
|
||||
"h2o,state=MA,city=Boston min_temp=70.4, 50",
|
||||
"h2o,state=MA,city=Andover max_temp=69.2, 249",
|
||||
],
|
||||
|
||||
// Chunk 2: stored Parquet file
|
||||
// - time range: 250-349
|
||||
// - no duplicates in its own chunk
|
||||
// - no overlap with any other chunks
|
||||
// - adds a new field (area)
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=79.0,max_temp=87.2,area=500u 300",
|
||||
"h2o,state=CA,city=SJ min_temp=75.5,max_temp=84.08 349",
|
||||
"h2o,state=MA,city=Bedford max_temp=78.75,area=742u 300",
|
||||
"h2o,state=MA,city=Boston min_temp=65.4 250",
|
||||
],
|
||||
|
||||
// Chunk 3: stored Parquet file
|
||||
// - time range: 350-500
|
||||
// - no duplicates in its own chunk
|
||||
// - overlaps chunk 4
|
||||
[
|
||||
"h2o,state=CA,city=SJ min_temp=77.0,max_temp=90.7 450",
|
||||
"h2o,state=CA,city=SJ min_temp=69.5,max_temp=88.2 500",
|
||||
"h2o,state=MA,city=Boston min_temp=68.4 350",
|
||||
],
|
||||
|
||||
// Chunk 4: stored Parquet file
|
||||
// - time range: 400-600
|
||||
// - no duplicates in its own chunk
|
||||
// - overlaps chunk 3
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600",
|
||||
"h2o,state=CA,city=SJ min_temp=69.5,max_temp=89.2 600", // duplicates row 3 in chunk 5
|
||||
"h2o,state=MA,city=Bedford max_temp=80.75,area=742u 400", // overlaps chunk 3
|
||||
"h2o,state=MA,city=Boston min_temp=65.40,max_temp=82.67 400", // overlaps chunk 3
|
||||
],
|
||||
|
||||
// Chunk 5: Ingester data
|
||||
// - time range: 550-700
|
||||
// - overlaps and duplicates data in chunk 4
|
||||
[
|
||||
"h2o,state=MA,city=Bedford max_temp=88.75,area=742u 600", // overlaps chunk 4
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 650",
|
||||
"h2o,state=CA,city=SJ min_temp=68.5,max_temp=90.0 600", // duplicates row 2 in chunk 4
|
||||
"h2o,state=CA,city=SJ min_temp=75.5,max_temp=84.08 700",
|
||||
"h2o,state=MA,city=Boston min_temp=67.4 550", // overlaps chunk 4
|
||||
]
|
||||
```
|
||||
|
||||
The following query selects all the data:
|
||||
|
||||
```sql
|
||||
SELECT state, city, min_temp, max_temp, area, time
|
||||
FROM h2o
|
||||
ORDER BY state asc, city asc, time desc;
|
||||
```
|
||||
|
||||
The output is the following:
|
||||
|
||||
```sql
|
||||
+-------+---------+----------+----------+------+--------------------------------+
|
||||
| state | city | min_temp | max_temp | area | time |
|
||||
+-------+---------+----------+----------+------+--------------------------------+
|
||||
| CA | SF | 68.4 | 85.7 | 500 | 1970-01-01T00:00:00.000000650Z |
|
||||
| CA | SF | 68.4 | 85.7 | 500 | 1970-01-01T00:00:00.000000600Z |
|
||||
| CA | SF | 79.0 | 87.2 | 500 | 1970-01-01T00:00:00.000000300Z |
|
||||
| CA | SJ | 75.5 | 84.08 | | 1970-01-01T00:00:00.000000700Z |
|
||||
| CA | SJ | 68.5 | 90.0 | | 1970-01-01T00:00:00.000000600Z |
|
||||
| CA | SJ | 69.5 | 88.2 | | 1970-01-01T00:00:00.000000500Z |
|
||||
| CA | SJ | 77.0 | 90.7 | | 1970-01-01T00:00:00.000000450Z |
|
||||
| CA | SJ | 75.5 | 84.08 | | 1970-01-01T00:00:00.000000349Z |
|
||||
| MA | Andover | | 69.2 | | 1970-01-01T00:00:00.000000249Z |
|
||||
| MA | Bedford | | 88.75 | 742 | 1970-01-01T00:00:00.000000600Z |
|
||||
| MA | Bedford | | 80.75 | 742 | 1970-01-01T00:00:00.000000400Z |
|
||||
| MA | Bedford | | 78.75 | 742 | 1970-01-01T00:00:00.000000300Z |
|
||||
| MA | Bedford | 71.59 | | | 1970-01-01T00:00:00.000000150Z |
|
||||
| MA | Boston | 67.4 | | | 1970-01-01T00:00:00.000000550Z |
|
||||
| MA | Boston | 65.4 | 82.67 | | 1970-01-01T00:00:00.000000400Z |
|
||||
| MA | Boston | 68.4 | | | 1970-01-01T00:00:00.000000350Z |
|
||||
| MA | Boston | 65.4 | | | 1970-01-01T00:00:00.000000250Z |
|
||||
| MA | Boston | 70.4 | | | 1970-01-01T00:00:00.000000050Z |
|
||||
+-------+---------+----------+----------+------+--------------------------------+
|
||||
```
|
||||
|
||||
### Sample query
|
||||
|
||||
The following query selects leading edge data from the [sample data](#sample-data):
|
||||
|
||||
```sql
|
||||
SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
The output is the following:
|
||||
|
||||
```sql
|
||||
+---------+-----------------+
|
||||
| city | COUNT(Int64(1)) |
|
||||
+---------+-----------------+
|
||||
| Andover | 1 |
|
||||
| Bedford | 3 |
|
||||
| Boston | 4 |
|
||||
+---------+-----------------+
|
||||
```
|
||||
|
||||
### EXPLAIN report for the leading edge data query
|
||||
|
||||
The following query generates the `EXPLAIN` report for the preceding [sample query](#sample-query):
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "EXPLAIN report for a leading edge data query" %}}
|
||||
|
||||
```sql
|
||||
| plan_type | plan |
|
||||
| logical_plan | Sort: h2o.city ASC NULLS LAST |
|
||||
| | Aggregate: groupBy=[[h2o.city]], aggr=[[COUNT(Int64(1))]] |
|
||||
| | TableScan: h2o projection=[city], full_filters=[h2o.time >= TimestampNanosecond(200, None), h2o.time < TimestampNanosecond(700, None), h2o.state = Dictionary(Int32, Utf8("MA"))] |
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST] |
|
||||
| | AggregateExec: mode=FinalPartitioned, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | RepartitionExec: partitioning=Hash([city@0], 4), input_partitions=4 |
|
||||
| | AggregateExec: mode=Partial, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=3 |
|
||||
| | UnionExec |
|
||||
| | ProjectionExec: expr=[city@0 as city] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | FilterExec: time@2 >= 200 AND time@2 < 700 AND state@1 = MA |
|
||||
| | ParquetExec: file_groups={2 groups: [[1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/243db601-f3f1-401b-afda-82160d8cc1a8.Parquet], [1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/f5fb7c7d-16ac-49ba-a811-69578d05843f.Parquet]]}, projection=[city, state, time], output_ordering=[state@1 ASC, city@0 ASC, time@2 ASC], predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA, pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3 |
|
||||
| | ProjectionExec: expr=[city@1 as city] |
|
||||
| | DeduplicateExec: [state@2 ASC,city@1 ASC,time@3 ASC] |
|
||||
| | SortPreservingMergeExec: [state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC] |
|
||||
| | UnionExec |
|
||||
| | SortExec: expr=[state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA |
|
||||
| | RecordBatchesExec: chunks=1, projection=[__chunk_order, city, state, time] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA |
|
||||
| | ParquetExec: file_groups={2 groups: [[1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/2cbb3992-4607-494d-82e4-66c480123189.Parquet], [1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/9255eb7f-2b51-427b-9c9b-926199c85bdf.Parquet]]}, projection=[__chunk_order, city, state, time], output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC], predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA, pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3 |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
`EXPLAIN` report for a typical leading edge data query
|
||||
{{% /caption %}}
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
The comments in the [sample data](#sample-data) tell you which data chunks _overlap_ or duplicate data in other chunks.
|
||||
Two chunks of data overlap if there are portions of time for which data exists in both chunks.
|
||||
_You'll learn how to [recognize overlapping and duplicate data](#recognize-overlapping-and-duplicate-data) in a query plan later in this guide._
|
||||
|
||||
Unlike the sample data, your data likely doesn't tell you where overlaps or duplicates exist.
|
||||
A physical plan can reveal overlaps and duplicates in your data and how they affect your queries--for example, after learning how to read a physical plan, you might summarize the data scanning steps as follows:
|
||||
|
||||
- Query execution starts with two `ParquetExec` and one `RecordBatchesExec` execution plans that run in parallel.
|
||||
- The first `ParquetExec` node reads two files that don't overlap any other files and don't duplicate data; the files don't require deduplication.
|
||||
- The second `ParquetExec` node reads two files that overlap each other and overlap the ingested data scanned in the `RecordBatchesExec` node; the query plan must include the deduplication process for these nodes before completing the query.
|
||||
|
||||
The remaining sections analyze `ExecutionPlan` node structure and arguments in the example physical plan.
|
||||
The example includes DataFusion and InfluxDB-specific [`ExecutionPlan` nodes](/influxdb/cloud-dedicated/reference/internals/query-plans/#executionplan-nodes).
|
||||
|
||||
### Locate the physical plan
|
||||
|
||||
To begin analyzing the physical plan for the query, find the row in the [`EXPLAIN` report](#explain-report-for-the-leading-edge-data-query) where the `plan_type` column has the value `physical_plan`.
|
||||
The `plan` column for the row contains the physical plan.
|
||||
|
||||
### Read the physical plan
|
||||
|
||||
The following sections follow the steps to [read a query plan](#read-a-query-plan) and examine the physical plan nodes and their input and output.
|
||||
|
||||
{{% note %}}
|
||||
To [read the execution flow of a query plan](#read-a-query-plan), always start from the innermost (leaf) nodes and read up toward the top outermost root node.
|
||||
{{% /note %}}
|
||||
|
||||
#### Physical plan leaf nodes
|
||||
|
||||
<img src="/img/influxdb/3-0-query-plan-tree.png" alt="Query physical plan leaf node structures" />
|
||||
|
||||
{{% caption %}}
|
||||
Leaf node structures in the physical plan
|
||||
{{% /caption %}}
|
||||
|
||||
### Data scanning nodes (ParquetExec and RecordBatchesExec)
|
||||
|
||||
The [example physical plan](#physical-plan-leaf-nodes) contains three [leaf nodes](#physical-plan-leaf-nodes)--the innermost nodes where the execution flow begins:
|
||||
|
||||
- [`ParquetExec`](/influxdb/clustered/reference/internals/query-plan/#parquetexec) nodes retrieve and scan data from Parquet files in the [Object store](/influxdb/clustered/reference/internals/storage-engine/#object-store)
|
||||
- a [`RecordBatchesExec`](/influxdb/clustered/reference/internals/query-plan/#recordbatchesexec) node retrieves recently written, yet-to-be-persisted data from the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester)
|
||||
|
||||
Because `ParquetExec` and `RecordBatchesExec` retrieve and scan data for a query, every query plan starts with one or more of these nodes.
|
||||
|
||||
The number of `ParquetExec` and `RecordBatchesExec` nodes and their parameter values can tell you which data (and how much) is retrieved for your query, and how efficiently the plan handles the organization (for example, partitioning and deduplication) of your data.
|
||||
|
||||
For convenience, this guide uses the names _ParquetExec_A_ and _ParquetExec_B_ for the `ParquetExec` nodes in the [example physical plan](#physical-plan-leaf-nodes) .
|
||||
Reading from the top of the physical plan, **ParquetExec_A** is the first leaf node in the physical plan and **ParquetExec_B** is the last (bottom) leaf node.
|
||||
|
||||
_The names indicate the nodes' locations in the report, not their order of execution._
|
||||
|
||||
- [ParquetExec_A](#parquetexec_a)
|
||||
- [RecordBatchesExec](#recordbatchesexec)
|
||||
- [ParquetExec_B](#parquetexec_b)
|
||||
|
||||
#### ParquetExec_A
|
||||
|
||||
```sql
|
||||
ParquetExec: file_groups={2 groups: [[1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/243db601-f3f1-401b-afda-82160d8cc1a8.Parquet], [1/1/b862a7e9b329ee6a418cde191198eaeb1512753f19b87a81def2ae6c3d0ed237/f5fb7c7d-16ac-49ba-a811-69578d05843f.Parquet]]}, projection=[city, state, time], output_ordering=[state@1 ASC, city@0 ASC, time@2 ASC], predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA, pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3 |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
ParquetExec_A, the first ParquetExec node
|
||||
{{% /caption %}}
|
||||
|
||||
ParquetExec_A has the following traits:
|
||||
|
||||
##### `file_groups`
|
||||
|
||||
A _file group_ is a list of files for the operator to read.
|
||||
Files are referenced by path:
|
||||
|
||||
- `1/1/b862a7e9b.../243db601-....parquet`
|
||||
- `1/1/b862a7e9b.../f5fb7c7d-....parquet`
|
||||
|
||||
The path structure represents how your data is organized.
|
||||
You can use the file paths to gather more information about the query--for example:
|
||||
|
||||
- to find file information (for example: size and number of rows) in the catalog
|
||||
- to download the Parquet file from the Object store for debugging
|
||||
- to find how many partitions the query reads
|
||||
|
||||
A path has the following structure:
|
||||
|
||||
```text
|
||||
<namespace_id>/<table_id>/<partition_hash_id>/<uuid_of_the_file>.Parquet
|
||||
1 / 1 /b862a7e9b329ee6a4.../243db601-f3f1-4....Parquet
|
||||
```
|
||||
|
||||
- `namespace_id`: the namespace (database) being queried
|
||||
- `table_id`: the table (measurement) being queried
|
||||
- `partition_hash_id`: the partition this file belongs to.
|
||||
You can count partition IDs to find how many partitions the query reads.
|
||||
- `uuid_of_the_file`: the file identifier.
|
||||
|
||||
`ParquetExec` processes groups in parallel and reads the files in each group sequentially.
|
||||
|
||||
```text
|
||||
file_groups={2 groups: [[1/1/b862a7e9b329ee6a4/243db601....parquet], [1/1/b862a7e9b329ee6a4/f5fb7c7d....parquet]]}
|
||||
```
|
||||
|
||||
- `{2 groups: [[file], [file]}`: ParquetExec_A receives two groups with one file per group.
|
||||
Therefore, ParquetExec_A reads two files in parallel.
|
||||
|
||||
##### `projection`
|
||||
|
||||
`projection` lists the table columns for the `ExecutionPlan` to read and output.
|
||||
|
||||
```text
|
||||
projection=[city, state, time]
|
||||
```
|
||||
|
||||
- `[city, state, time]`: the [sample data](#sample-data) contains many columns, but the [sample query](#sample-query) requires the Querier to read only three
|
||||
|
||||
##### `output_ordering`
|
||||
|
||||
`output_ordering` specifies the sort order for the `ExecutionPlan` output.
|
||||
The Query planner passes the parameter if the output should be ordered and if the planner knows the order.
|
||||
|
||||
```text
|
||||
output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC]
|
||||
```
|
||||
|
||||
When storing data to Parquet files, InfluxDB sorts the data to improve storage compression and query efficiency and the planner tries to preserve that order for as long as possible.
|
||||
Generally, the `output_ordering` value that `ParquetExec` receives is the ordering (or a subset of the ordering) of stored data.
|
||||
|
||||
_By design, [`RecordBatchesExec`](#recordbatchesexec) data isn't sorted._
|
||||
|
||||
In the example, the planner specifies that ParquetExec_A use the existing sort order `state ASC, city ASC, time ASC,` for output.
|
||||
|
||||
{{% note %}}
|
||||
To view the sort order of your stored data, generate an `EXPLAIN` report for a `SELECT ALL` query--for example:
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT * FROM TABLE_NAME WHERE time > now() - interval '1 hour'
|
||||
```
|
||||
|
||||
Reduce the time range if the query returns too much data.
|
||||
{{% /note %}}
|
||||
|
||||
##### `predicate`
|
||||
|
||||
`predicate` is the data filter specified in the query.
|
||||
|
||||
```text
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA
|
||||
```
|
||||
|
||||
##### `pruning predicate`
|
||||
|
||||
`pruning_predicate` is created from the [`predicate`](#predicate) value and is the predicate actually used for pruning data and files from the chosen partitions.
|
||||
The default filters files by `time`.
|
||||
|
||||
```text
|
||||
pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3
|
||||
```
|
||||
|
||||
_Before the physical plan is generated, an additional `partition pruning` step uses predicates on partitioning columns to prune partitions._
|
||||
|
||||
#### `RecordBatchesExec`
|
||||
|
||||
```sql
|
||||
RecordBatchesExec: chunks=1, projection=[__chunk_order, city, state, time]
|
||||
```
|
||||
|
||||
{{% caption %}}RecordBatchesExec{{% /caption %}}
|
||||
|
||||
[`RecordBatchesExec`](/influxdb/clustered/reference/internals/query-plan/#recordbatchesexec) is an InfluxDB-specific `ExecutionPlan` implementation that retrieves recently written, yet-to-be-persisted data from the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester).
|
||||
|
||||
In the example, `RecordBatchesExec` contains the following expressions:
|
||||
|
||||
##### `chunks`
|
||||
|
||||
`chunks` is the number of data chunks received from the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester).
|
||||
|
||||
```text
|
||||
chunks=1
|
||||
```
|
||||
|
||||
- `chunks=1`: `RecordBatchesExec` receives one data chunk.
|
||||
|
||||
##### `projection`
|
||||
|
||||
The `projection` list specifies the columns or expressions for the node to read and output.
|
||||
|
||||
```text
|
||||
[__chunk_order, city, state, time]
|
||||
```
|
||||
|
||||
- `__chunk_order`: orders chunks and files for deduplication
|
||||
- `city, state, time`: the same columns specified in [`ParquetExec_A projection`](#projection-1)
|
||||
|
||||
{{% note %}}
|
||||
The presence of `__chunk_order` in data scanning nodes indicates that data overlaps, and is possibly duplicated, among the nodes.
|
||||
{{% /note %}}
|
||||
|
||||
#### ParquetExec_B
|
||||
|
||||
The bottom leaf node in the [example physical plan](#physical-plan-leaf-nodes) is another `ParquetExec` operator, _ParquetExec_B_.
|
||||
|
||||
##### ParquetExec_B expressions
|
||||
|
||||
```sql
|
||||
ParquetExec:
|
||||
file_groups={2 groups: [[1/1/b862a7e9b.../2cbb3992-....Parquet],
|
||||
[1/1/b862a7e9b.../9255eb7f-....Parquet]]},
|
||||
projection=[__chunk_order, city, state, time],
|
||||
output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC],
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA,
|
||||
pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3
|
||||
```
|
||||
|
||||
{{% caption %}}ParquetExec_B, the second ParquetExec{{% /caption %}}
|
||||
|
||||
Because ParquetExec_B has overlaps, the `projection` and `output_ordering` expressions use the `__chunk_order` column used in [`RecordBatchesExec` `projection`](#projection-1).
|
||||
|
||||
{{% note %}}
|
||||
The presence of `__chunk_order` in data scanning nodes indicates that data overlaps, and is possibly duplicated, among the nodes.
|
||||
{{% /note %}}
|
||||
|
||||
The remaining ParquetExec_B expressions are similar to those in [ParquetExec_A](#parquetexec_a).
|
||||
|
||||
##### How a query plan distributes data for scanning
|
||||
|
||||
If you compare [`file_group`](#file_groups) paths in [ParquetExec_A](#parquetexec_a) to those in [ParquetExec_B](#parquetexec_b), you'll notice that both contain files from the same partition:
|
||||
|
||||
{{% code-callout "b862a7e9b329ee6a4..." %}}
|
||||
|
||||
```text
|
||||
1/1/b862a7e9b329ee6a4.../...
|
||||
```
|
||||
|
||||
{{% /code-callout %}}
|
||||
|
||||
The planner may distribute files from the same partition to different scan nodes for several reasons, including optimizations for handling [overlaps](#how-a-query-plan-distributes-data-for-scanning)--for example:
|
||||
|
||||
- to separate non-overlapped files from overlapped files to minimize work required for deduplication (which is the case in this example)
|
||||
- to distribute non-overlapped files to increase parallel execution
|
||||
|
||||
### Analyze branch structures
|
||||
|
||||
After data is output from a data scanning node, it flows up to the next parent (outer) node.
|
||||
|
||||
In the example plan:
|
||||
|
||||
- Each leaf node is the first step in a branch of nodes planned for processing the scanned data.
|
||||
- The three branches execute in parallel.
|
||||
- After the leaf node, each branch contains the following similar node structure:
|
||||
|
||||
```sql
|
||||
...
|
||||
CoalesceBatchesExec: target_batch_size=8192
|
||||
FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA
|
||||
...
|
||||
```
|
||||
|
||||
- `FilterExec: time@3 >= 200 AND time@3 < 700 AND state@2 = MA`: filters data for the condition `time@3 >= 200 AND time@3 < 700 AND state@2 = MA`, and guarantees that all data is pruned.
|
||||
- `CoalesceBatchesExec: target_batch_size=8192`: combines small batches into larger batches. See the DataFusion [`CoalesceBatchesExec`] documentation.
|
||||
|
||||
#### Sorting yet-to-be-persisted data
|
||||
|
||||
In the `RecordBatchesExec` branch, the node that follows `CoalesceBatchesExec` is a `SortExec` node:
|
||||
|
||||
```sql
|
||||
SortExec: expr=[state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC]
|
||||
```
|
||||
|
||||
The node uses the specified expression `state ASC, city ASC, time ASC, __chunk_order ASC` to sort the yet-to-be-persisted data.
|
||||
Neither ParquetExec_A nor ParquetExec_B contain a similar node because data in the Object store is already sorted (by the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester) or the [Compactor](/influxdb/clustered/reference/internals/storage-engine/#compactor)) in the given order; the query plan only needs to sort data that arrives from the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester).
|
||||
|
||||
#### Recognize overlapping and duplicate data
|
||||
|
||||
In the example physical plan, the ParquetExec_B and `RecordBatchesExec` nodes share the following parent nodes:
|
||||
|
||||
```sql
|
||||
...
|
||||
DeduplicateExec: [state@2 ASC,city@1 ASC,time@3 ASC]
|
||||
SortPreservingMergeExec: [state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC]
|
||||
UnionExec
|
||||
...
|
||||
```
|
||||
|
||||
{{% caption %}}Overlapped data node structure{{% /caption %}}
|
||||
|
||||
1. `UnionExec`: unions multiple streams of input data by concatenating the partitions. `UnionExec` doesn't do any merging and is fast to execute.
|
||||
2. `SortPreservingMergeExec: [state@2 ASC,city@1 ASC,time@3 ASC,__chunk_order@0 ASC]`: merges already sorted data; indicates that preceding data (from nodes below it) is already sorted. The output data is a single sorted stream.
|
||||
3. `DeduplicateExec: [state@2 ASC,city@1 ASC,time@3 ASC]`: deduplicates an input stream of sorted data.
|
||||
Because `SortPreservingMergeExec` ensures a single sorted stream, it often, but not always, precedes `DeduplicateExec`.
|
||||
|
||||
A `DeduplicateExec` node indicates that encompassed nodes have [_overlapped data_](/influxdb/clustered/reference/internals/query-plan/#overlapping-data-and-deduplication)--data in a file or batch have timestamps in the same range as data in another file or batch.
|
||||
Due to how InfluxDB organizes data, data is never duplicated _within_ a file.
|
||||
|
||||
In the example, the `DeduplicateExec` node encompasses ParquetExec_B and the `RecordBatchesExec` node, which indicates that ParquetExec_B [file group](#file_groups) files overlap the yet-to-be persisted data.
|
||||
|
||||
The following [sample data](#sample-data) excerpt shows overlapping data between a file and [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester) data:
|
||||
|
||||
```text
|
||||
// Chunk 4: stored Parquet file
|
||||
// - time range: 400-600
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600",
|
||||
],
|
||||
|
||||
// Chunk 5: Ingester data
|
||||
// - time range: 550-700
|
||||
// - overlaps and duplicates data in chunk 4
|
||||
[
|
||||
"h2o,state=MA,city=Bedford max_temp=88.75,area=742u 600", // overlaps chunk 4
|
||||
...
|
||||
"h2o,state=MA,city=Boston min_temp=67.4 550", // overlaps chunk 4
|
||||
]
|
||||
```
|
||||
|
||||
If files or ingested data overlap, the Querier must include the `DeduplicateExec` in the query plan to remove any duplicates.
|
||||
`DeduplicateExec` doesn't necessarily indicate that data is duplicated.
|
||||
If a plan reads many files and performs deduplication on all of them, it might be for the following reasons:
|
||||
|
||||
- the files contain duplicate data
|
||||
- the Object store has many small overlapped files that the Compactor hasn't compacted yet. After compaction, your query may perform better because it has fewer files to read
|
||||
- the Compactor isn't keeping up. If the data isn't duplicated and you still have many small overlapping files after compaction, then you might want to review the Compactor's workload and add more resources as needed
|
||||
|
||||
A leaf node that doesn't have a `DeduplicateExec` node in its branch doesn't require deduplication and doesn't overlap other files or [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester) data--for example, ParquetExec_A has no overlaps:
|
||||
|
||||
```sql
|
||||
ProjectionExec:...
|
||||
CoalesceBatchesExec:...
|
||||
FilterExec:...
|
||||
ParquetExec:...
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
The absence of a `DeduplicateExec` node means that files don't overlap.
|
||||
{{% /caption %}}
|
||||
|
||||
##### Data scan output
|
||||
|
||||
`ProjectionExec` nodes filter columns so that only the `city` column remains in the output:
|
||||
|
||||
```sql
|
||||
`ProjectionExec: expr=[city@0 as city]`
|
||||
```
|
||||
|
||||
##### Final processing
|
||||
|
||||
After deduplicating and filtering data in each leaf node, the plan combines the output and then applies aggregation and sorting operators for the final result:
|
||||
|
||||
```sql
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST] |
|
||||
| | AggregateExec: mode=FinalPartitioned, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | CoalesceBatchesExec: target_batch_size=8192 |
|
||||
| | RepartitionExec: partitioning=Hash([city@0], 4), input_partitions=4 |
|
||||
| | AggregateExec: mode=Partial, gby=[city@0 as city], aggr=[COUNT(Int64(1))] |
|
||||
| | RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=3 |
|
||||
| | UnionExec
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
Operator structure for aggregating, sorting, and final output.
|
||||
{{% /caption %}}
|
||||
|
||||
- `UnionExec`: unions data streams. Note that the number of output streams is the same as the number of input streams--the `UnionExec` node is an intermediate step to downstream operators that actually merge or split data streams.
|
||||
- `RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=3`: Splits three input streams into four output streams in round-robin fashion. The plan splits streams to increase parallel execution.
|
||||
- `AggregateExec: mode=Partial, gby=[city@0 as city], aggr=[COUNT(Int64(1))]`: Groups data as specified in the [query](#sample-query): `city, count(1)`.
|
||||
This node aggregates each of the four streams separately, and then outputs four streams, indicated by `mode=Partial`--the data isn't fully aggregated.
|
||||
- `RepartitionExec: partitioning=Hash([city@0], 4), input_partitions=4`: Repartitions data on `Hash([city])` and into four streams--each stream contains data for one city.
|
||||
- `AggregateExec: mode=FinalPartitioned, gby=[city@0 as city], aggr=[COUNT(Int64(1))]`: Applies the final aggregation (`aggr=[COUNT(Int64(1))]`) to the data. `mode=FinalPartitioned` indicates that the data has already been partitioned (by city) and doesn't need further grouping by `AggregateExec`.
|
||||
- `SortExec: expr=[city@0 ASC NULLS LAST]`: Sorts the four streams of data, each on `city`, as specified in the query.
|
||||
- `SortPreservingMergeExec: [city@0 ASC NULLS LAST]`: Merges and sorts the four sorted streams for the final output.
|
||||
|
||||
In the preceding examples, the `EXPLAIN` report shows the query plan without executing the query.
|
||||
To view runtime metrics, such as execution time for a plan and its operators, use [`EXPLAIN ANALYZE`](/influxdb/clustered/reference/sql/explain/#explain-analyze) to generate the report and use [tracing](/influxdb/clustered/query-data/optimize-queries/#enable-trace-logging) for further debugging, if necessary.
|
|
@ -7,27 +7,22 @@ weight: 401
|
|||
menu:
|
||||
influxdb_clustered:
|
||||
name: Understand Flight responses
|
||||
parent: Execute queries
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/clustered/tags: [query, sql, influxql]
|
||||
related:
|
||||
- /influxdb/clustered/query-data/sql/
|
||||
- /influxdb/clustered/query-data/influxql/
|
||||
- /influxdb/clustered/reference/client-libraries/v3/
|
||||
---
|
||||
|
||||
Learn how to handle responses and troubleshoot errors encountered when querying {{% product-name %}} with Flight+gRPC and Arrow Flight clients.
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [InfluxDB Flight responses](#influxdb-flight-responses)
|
||||
- [Stream](#stream)
|
||||
- [Schema](#schema)
|
||||
- [Example](#example)
|
||||
- [RecordBatch](#recordbatch)
|
||||
- [InfluxDB status and error codes](#influxdb-status-and-error-codes)
|
||||
- [Troubleshoot errors](#troubleshoot-errors)
|
||||
- [Internal Error: Received RST_STREAM](#internal-error-received-rst_stream)
|
||||
- [Internal Error: stream terminated by RST_STREAM with NO_ERROR](#internal-error-stream-terminated-by-rst_stream-with-no_error)
|
||||
- [Invalid Argument: Invalid ticket](#invalid-argument-invalid-ticket)
|
||||
- [Unauthenticated: Unauthenticated](#unauthenticated-unauthenticated)
|
||||
- [Unauthorized: Permission denied](#unauthorized-permission-denied)
|
||||
- [FlightUnavailableError: Could not get default pem root certs](#flightunavailableerror-could-not-get-default-pem-root-certs)
|
||||
|
||||
## InfluxDB Flight responses
|
||||
|
||||
|
@ -42,7 +37,7 @@ For example, if you use the [`influxdb3-python` Python client library](/influxdb
|
|||
InfluxDB responds with one of the following:
|
||||
|
||||
- A [stream](#stream) in Arrow IPC streaming format
|
||||
- An [error status code](#influxdb-error-codes) and an optional `details` field that contains the status and a message that describes the error
|
||||
- An [error status code](#influxdb-status-and-error-codes) and an optional `details` field that contains the status and a message that describes the error
|
||||
|
||||
### Stream
|
||||
|
||||
|
@ -169,7 +164,6 @@ _For a list of gRPC codes that servers and clients may return, see [Status codes
|
|||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
|
||||
### Troubleshoot errors
|
||||
|
||||
#### Internal Error: Received RST_STREAM
|
|
@ -0,0 +1,62 @@
|
|||
---
|
||||
title: Optimize queries
|
||||
description: >
|
||||
Optimize queries to improve performance and reduce their memory and compute (CPU) requirements in InfluxDB.
|
||||
Learn how to use observability tools to analyze query execution and view metrics.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_clustered:
|
||||
name: Optimize queries
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/clustered/tags: [query, performance, observability, errors, sql, influxql]
|
||||
related:
|
||||
- /influxdb/clustered/query-data/sql/
|
||||
- /influxdb/clustered/query-data/influxql/
|
||||
aliases:
|
||||
- /influxdb/clustered/query-data/execute-queries/optimize-queries/
|
||||
- /influxdb/clustered/query-data/execute-queries/analyze-query-plan/
|
||||
---
|
||||
|
||||
Optimize SQL and InfluxQL queries to improve performance and reduce their memory and compute (CPU) requirements.
|
||||
Learn how to use observability tools to analyze query execution and view metrics.
|
||||
|
||||
- [Why is my query slow?](#why-is-my-query-slow)
|
||||
- [Strategies for improving query performance](#strategies-for-improving-query-performance)
|
||||
- [Analyze and troubleshoot queries](#analyze-and-troubleshoot-queries)
|
||||
|
||||
## Why is my query slow?
|
||||
|
||||
Query performance depends on time range and complexity.
|
||||
If a query is slower than you expect, it might be due to the following reasons:
|
||||
|
||||
- It queries data from a large time range.
|
||||
- It includes intensive operations, such as querying many string values or `ORDER BY` sorting or re-sorting large amounts of data.
|
||||
|
||||
## Strategies for improving query performance
|
||||
|
||||
The following design strategies generally improve query performance and resource use:
|
||||
|
||||
- Follow [schema design best practices](/influxdb/clustered/write-data/best-practices/schema-design/) to make querying easier and more performant.
|
||||
- Query only the data you need--for example, include a [`WHERE` clause](/influxdb/clustered/reference/sql/where/) that filters data by a time range.
|
||||
InfluxDB v3 stores data in a Parquet file for each measurement and day, and retrieves files from the Object store to answer a query.
|
||||
The smaller the time range in your query, the fewer files InfluxDB needs to retrieve from the Object store.
|
||||
|
||||
- [Downsample data](/influxdb/clustered/process-data/downsample/) to reduce the amount of data you need to query.
|
||||
|
||||
Some bottlenecks may be out of your control and are the result of a suboptimal execution plan, such as:
|
||||
|
||||
- Applying the same sort (`ORDER BY`) to already sorted data.
|
||||
- Retrieving many Parquet files from the Object store--the same query performs better if it retrieves fewer - though, larger - files.
|
||||
- Querying many overlapped Parquet files.
|
||||
- Performing a large number of table scans.
|
||||
|
||||
{{% note %}}
|
||||
#### Analyze query plans to view metrics and recognize bottlenecks
|
||||
|
||||
To view runtime metrics for a query, such as the number of files scanned, use the [`EXPLAIN ANALYZE` keywords](/influxdb/clustered/reference/sql/explain/#explain-analyze) and learn how to [analyze a query plan](/influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/).
|
||||
{{% /note %}}
|
||||
|
||||
## Analyze and troubleshoot queries
|
||||
|
||||
Learn how to [analyze a query plan](/influxdb/clustered/query-data/troubleshoot-and-optimize/analyze-query-plan/)
|
||||
to troubleshoot queries and find performance bottlenecks.
|
|
@ -0,0 +1,44 @@
|
|||
---
|
||||
title: Troubleshoot queries
|
||||
description: >
|
||||
Troubleshoot SQL and InfluxQL queries in InfluxDB.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_clustered:
|
||||
name: Troubleshoot queries
|
||||
parent: Troubleshoot and optimize queries
|
||||
influxdb/clustered/tags: [query, performance, observability, errors, sql, influxql]
|
||||
related:
|
||||
- /influxdb/clustered/query-data/sql/
|
||||
- /influxdb/clustered/query-data/influxql/
|
||||
- /influxdb/clustered/reference/client-libraries/v3/
|
||||
aliases:
|
||||
- /influxdb/clustered/query-data/execute-queries/troubleshoot/
|
||||
---
|
||||
|
||||
Troubleshoot SQL and InfluxQL queries that return unexpected results.
|
||||
|
||||
- [Why doesn't my query return data?](#why-doesnt-my-query-return-data)
|
||||
- [Optimize slow or expensive queries](#optimize-slow-or-expensive-queries)
|
||||
|
||||
## Why doesn't my query return data?
|
||||
|
||||
If a query doesn't return any data, it might be due to the following:
|
||||
|
||||
- Your data falls outside the time range (or other conditions) in the query--for example, the InfluxQL `SHOW TAG VALUES` command uses a default time range of 1 day.
|
||||
- The query (InfluxDB server) timed out.
|
||||
- The query client timed out.
|
||||
|
||||
If a query times out or returns an error, it might be due to the following:
|
||||
|
||||
- a bad request
|
||||
- a server or network problem
|
||||
- it queries too much data
|
||||
|
||||
[Understand Arrow Flight responses](/influxdb/clustered/query-data/troubleshoot-and-optimize/flight-responses/) and error messages for queries.
|
||||
|
||||
## Optimize slow or expensive queries
|
||||
|
||||
If a query is slow or uses too many compute resources, limit the amount of data that it queries.
|
||||
|
||||
See how to [optimize queries](/influxdb/clustered/query-data/troubleshoot-and-optimize/optimize-queries/) and use tools to view runtime metrics, identify bottlenecks, and debug queries.
|
|
@ -23,7 +23,7 @@ prepend:
|
|||
[**Compare tools you can use**](/influxdb/clustered/get-started/#tools-to-use) to interact with {{% product-name %}}.
|
||||
---
|
||||
|
||||
Arduino is an open-source hardware and software platform used for building electronics projects.
|
||||
Arduino is an open source hardware and software platform used for building electronics projects.
|
||||
|
||||
The documentation for this client library is available on GitHub.
|
||||
|
||||
|
|
|
@ -77,7 +77,7 @@ The JavaScript client library includes the following convenient features for wri
|
|||
.floatField('value', 24.0)
|
||||
```
|
||||
|
||||
5. Use the `writePoint()` method to write the point to your InfluxDB bucket.
|
||||
5. Use the `writePoint()` method to write the point to your InfluxDB database.
|
||||
Finally, use the `close()` method to flush all pending writes.
|
||||
The example logs the new data point followed by "WRITE FINISHED" to stdout.
|
||||
|
||||
|
|
|
@ -22,7 +22,7 @@ prepend:
|
|||
[**Compare tools you can use**](/influxdb/clustered/get-started/#tools-to-use) to interact with {{% product-name %}}.
|
||||
---
|
||||
|
||||
Kotlin is an open-source programming language that runs on the Java Virtual Machine (JVM).
|
||||
Kotlin is an open source programming language that runs on the Java Virtual Machine (JVM).
|
||||
|
||||
The documentation for this client library is available on GitHub.
|
||||
|
||||
|
|
|
@ -77,7 +77,7 @@ InfluxData typically recommends batch sizes of 5,000-10,000 points.
|
|||
In some use cases, performance may improve with significantly smaller or larger batches.
|
||||
|
||||
Related entries:
|
||||
[line protocol](#line-protocol),
|
||||
[line protocol](#line-protocol-lp),
|
||||
[point](#point)
|
||||
|
||||
### batch size
|
||||
|
@ -133,9 +133,12 @@ to the workload of a single customer.
|
|||
|
||||
### collect
|
||||
|
||||
Collect and write time series data to InfluxDB using line protocol, Telegraf,
|
||||
the InfluxDB v1 and v2 HTTP APIs, v1 and v2 `influx` command line interface (CLI),
|
||||
and InfluxDB client libraries.
|
||||
Collect and write time series data to InfluxDB using line protocol and any of the following tools:
|
||||
|
||||
- Telegraf
|
||||
- the InfluxDB v1 or v2 HTTP APIs
|
||||
- v1 or v2 `influx` command line interface (CLI)
|
||||
- InfluxDB client libraries
|
||||
|
||||
### collection interval
|
||||
|
||||
|
@ -149,7 +152,7 @@ Related entries:
|
|||
|
||||
Collection jitter prevents every input plugin from collecting metrics simultaneously,
|
||||
which can have a measurable effect on the system.
|
||||
For each collection interval, every Telegraf input plugin will sleep for a random
|
||||
For each collection interval, every Telegraf input plugin sleeps for a random
|
||||
time between zero and the collection jitter before collecting the metrics.
|
||||
|
||||
Related entries:
|
||||
|
@ -260,7 +263,7 @@ Aggregating high resolution data into lower resolution data to preserve disk spa
|
|||
|
||||
### duration
|
||||
|
||||
A data type that represents a duration of time (1s, 1m, 1h, 1d).
|
||||
A data type that represents a duration of time--for example, `1s`, `1m`, `1h`, `1d`.
|
||||
Retention periods are set using durations.
|
||||
|
||||
Related entries:
|
||||
|
@ -337,9 +340,6 @@ Related entries:
|
|||
|
||||
A file block is a fixed-length chunk of data read into memory when requested by an application.
|
||||
|
||||
Related entries:
|
||||
[block](#block)
|
||||
|
||||
### float
|
||||
|
||||
A real number written with a decimal point dividing the integer and fractional parts (`1.0`, `3.14`, `-20.1`).
|
||||
|
@ -359,7 +359,7 @@ Related entries:
|
|||
|
||||
Flush jitter prevents every Telegraf output plugin from sending writes
|
||||
simultaneously, which can overwhelm some data sinks.
|
||||
Each flush interval, every Telegraf output plugin will sleep for a random time
|
||||
Each flush interval, every Telegraf output plugin sleeps for a random time
|
||||
between zero and the flush jitter before emitting metrics.
|
||||
Flush jitter smooths out write spikes when running a large number of Telegraf instances.
|
||||
|
||||
|
@ -408,7 +408,6 @@ Related entries:
|
|||
[measurement](#measurement),
|
||||
[tag key](#tag-key),
|
||||
|
||||
|
||||
### influx
|
||||
|
||||
`influx` is a command line interface (CLI) that interacts with the InfluxDB v1.x and v2.x server.
|
||||
|
@ -426,7 +425,7 @@ and other required processes.
|
|||
|
||||
### InfluxDB
|
||||
|
||||
An open-source time series database (TSDB) developed by InfluxData.
|
||||
An open source time series database (TSDB) developed by InfluxData.
|
||||
Written in Go and optimized for fast, high-availability storage and retrieval of
|
||||
time series data in fields such as operations monitoring, application metrics,
|
||||
Internet of Things sensor data, and real-time analytics.
|
||||
|
@ -463,10 +462,10 @@ Related entries:
|
|||
|
||||
### IOx
|
||||
|
||||
The IOx (InfluxDB v3) storage engine is real-time, columnar database optimized for time series
|
||||
The IOx (InfluxDB v3) storage engine is a real-time, columnar database optimized for time series
|
||||
data built in Rust on top of [Apache Arrow](https://arrow.apache.org/) and
|
||||
[DataFusion](https://arrow.apache.org/datafusion/user-guide/introduction.html).
|
||||
IOx replaces the [TSM](#tsm) storage engine.
|
||||
IOx replaces the [TSM (Time Structured Merge tree)](#tsm-time-structured-merge-tree) storage engine.
|
||||
|
||||
## J
|
||||
|
||||
|
@ -496,11 +495,13 @@ and array data types.
|
|||
### keyword
|
||||
|
||||
A keyword is reserved by a program because it has special meaning.
|
||||
Every programming language has a set of keywords (reserved names) that cannot be used as an identifier.
|
||||
Every programming language has a set of keywords (reserved names) that cannot be used as identifiers--for example,
|
||||
you can't use `SELECT` (an SQL keyword) as a variable name in an SQL query.
|
||||
|
||||
See a list of [SQL keywords](/influxdb/clustered/reference/sql/#keywords).
|
||||
See keyword lists:
|
||||
|
||||
<!-- TODO: Add a link to InfluxQL keywords -->
|
||||
- [SQL keywords](/influxdb/clustered/reference/sql/#keywords)
|
||||
- [InfluxQL keywords](/influxdb/clustered/reference/influxql/#keywords)
|
||||
|
||||
## L
|
||||
|
||||
|
@ -570,7 +571,6 @@ Related entries:
|
|||
[cluster](#cluster),
|
||||
[server](#server)
|
||||
|
||||
|
||||
### now
|
||||
|
||||
The local server's nanosecond timestamp.
|
||||
|
@ -611,7 +611,7 @@ Owners have read/write permissions.
|
|||
Users can have owner roles for databases and other resources.
|
||||
|
||||
Role permissions are separate from API token permissions. For additional
|
||||
information on API tokens, see [token](#tokens).
|
||||
information on API tokens, see [token](#token).
|
||||
|
||||
### output plugin
|
||||
|
||||
|
@ -718,6 +718,15 @@ An InfluxDB query returns time series data.
|
|||
|
||||
See [Query data in InfluxDB](/influxdb/clustered/query-data/).
|
||||
|
||||
### query plan
|
||||
|
||||
A sequence of steps (_nodes_) that the InfluxDB Querier devises and executes to calculate the result of the query in the least amount of time.
|
||||
A _logical plan_ is a high level representation of a query and doesn't consider cluster configuration or data organization.
|
||||
A _physical plan_ represents the query execution plan and data flow through plan nodes that read (_scan_), deduplicate, merge, filter, and sort data.
|
||||
A physical plan is optimized for the cluster configuration and data organization.
|
||||
|
||||
See [Query plans](/influxdb/clustered/reference/internals/query-plans/).
|
||||
|
||||
## R
|
||||
|
||||
### REPL
|
||||
|
@ -744,7 +753,6 @@ The minimum retention period is **one hour**.
|
|||
|
||||
Related entries:
|
||||
[bucket](#bucket),
|
||||
[shard group duration](#shard-group-duration)
|
||||
|
||||
### retention policy (RP)
|
||||
|
||||
|
@ -785,6 +793,18 @@ Related entries:
|
|||
[timestamp](#timestamp),
|
||||
[unix timestamp](#unix-timestamp)
|
||||
|
||||
### row
|
||||
|
||||
A row in a [table](#table) represents a specific record or instance of data.
|
||||
[Column](#column) values in a row represent specific attributes or properties of the instance.
|
||||
Each row has a [primary key](/#primary-key) that makes the row unique from other rows in the table.
|
||||
|
||||
Related entries:
|
||||
[column](#column),
|
||||
[primary key](#primary-key),
|
||||
[series](#series),
|
||||
[table](#table)
|
||||
|
||||
## S
|
||||
|
||||
### schema
|
||||
|
@ -804,7 +824,7 @@ Related entries:
|
|||
### secret
|
||||
|
||||
Secrets are key-value pairs that contain information you want to control access
|
||||
o, such as API keys, passwords, or certificates.
|
||||
to, such as API keys, passwords, or certificates.
|
||||
|
||||
### selector
|
||||
|
||||
|
@ -831,7 +851,7 @@ Related entries:
|
|||
|
||||
The number of unique measurement, tag set, and field key combinations in an InfluxDB database.
|
||||
|
||||
For example, assume that an InfluxDB bucket has one measurement.
|
||||
For example, assume that an InfluxDB database has one measurement.
|
||||
The single measurement has two tag keys: `email` and `status`.
|
||||
If there are three different `email`s, and each email address is associated with two
|
||||
different `status`es, the series cardinality for the measurement is 6
|
||||
|
@ -874,7 +894,7 @@ A series key identifies a particular series by measurement, tag set, and field k
|
|||
|
||||
For example:
|
||||
|
||||
```
|
||||
```text
|
||||
# measurement, tag set, field key
|
||||
h2o_level, location=santa_monica, h2o_feet
|
||||
```
|
||||
|
@ -941,7 +961,6 @@ Related entries:
|
|||
The key of a tag key-value pair.
|
||||
Tag keys are strings and store metadata.
|
||||
|
||||
|
||||
Related entries:
|
||||
[field key](#field-key),
|
||||
[tag](#tag),
|
||||
|
@ -1017,6 +1036,14 @@ There are different types of API tokens:
|
|||
Related entries:
|
||||
[Manage token](/influxdb/clustered/admin/tokens/)
|
||||
|
||||
### transformation
|
||||
|
||||
Data transformation refers to the process of converting or modifying input data from one format, value, or structure to another.
|
||||
|
||||
InfluxQL [transformation functions](/influxdb/clustered/reference/influxql/functions/transformations/) modify and return values in each row of queried data, but do not return an aggregated value across those rows.
|
||||
|
||||
Related entries: [aggregate](#aggregate), [function](#function), [selector](#selector)
|
||||
|
||||
### TSM (Time Structured Merge tree)
|
||||
|
||||
The InfluxDB v1 and v2 data storage format that allows greater compaction and
|
||||
|
@ -1077,7 +1104,7 @@ InfluxDB users are granted permission to access to InfluxDB.
|
|||
|
||||
### values per second
|
||||
|
||||
The preferred measurement of the rate at which data are persisted to InfluxDB.
|
||||
The preferred measurement of the rate at which data is persisted to InfluxDB.
|
||||
Write speeds are generally quoted in values per second.
|
||||
|
||||
To calculate the values per second rate, multiply the number of points written
|
||||
|
|
|
@ -0,0 +1,392 @@
|
|||
---
|
||||
title: Query plans
|
||||
description: >
|
||||
A query plan is a sequence of steps that the InfluxDB Querier devises and executes to calculate the result of a query in the least amount of time.
|
||||
InfluxDB query plans include DataFusion and InfluxDB logical plan and execution plan nodes for scanning, deduplicating, filtering, merging, and sorting data.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_clustered:
|
||||
name: Query plans
|
||||
parent: InfluxDB internals
|
||||
influxdb/clustered/tags: [query, sql, influxql]
|
||||
related:
|
||||
- /influxdb/clustered/query-data/sql/
|
||||
- /influxdb/clustered/query-data/influxql/
|
||||
- /influxdb/clustered/query-data/execute-queries/analyze-query-plan/
|
||||
- /influxdb/clustered/query-data/execute-queries/troubleshoot/
|
||||
- /influxdb/clustered/reference/internals/storage-engine/
|
||||
---
|
||||
|
||||
A query plan is a sequence of steps that the InfluxDB v3 [Querier](/influxdb/clustered/reference/internals/storage-engine/#querier) devises and executes to calculate the result of a query.
|
||||
The Querier uses DataFusion and Arrow to build and execute query plans
|
||||
that call DataFusion and InfluxDB-specific operators that read data from the [Object store](/influxdb/clustered/reference/internals/storage-engine/#object-store), and the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester), and apply query transformations, such as deduplicating, filtering, aggregating, merging, projecting, and sorting to calculate the final result.
|
||||
|
||||
Like many other databases, the [Querier](/influxdb/clustered/reference/internals/storage-engine/#querier) contains a Query Optimizer.
|
||||
After it parses an incoming query, the [Querier](/influxdb/clustered/reference/internals/storage-engine/#querier) builds a _logical plan_--a sequence of high-level steps such as scanning, filtering, and sorting, required for the query.
|
||||
Following the logical plan, the [Querier](/influxdb/clustered/reference/internals/storage-engine/#querier) then builds the optimal _physical plan_ to calculate the correct result in the least amount of time.
|
||||
The plan takes advantage of data partitioning by the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester) to parallelize plan operations and prune unnecessary data before executing the plan.
|
||||
The [Querier](/influxdb/clustered/reference/internals/storage-engine/#querier) also applies common techniques of predicate and projection pushdown to further prune data as early as possible.
|
||||
|
||||
- [Display syntax](#display-syntax)
|
||||
- [Example logical and physical plan](#example-logical-and-physical-plan)
|
||||
- [Data flow](#data-flow)
|
||||
- [Logical plan](#logical-plan)
|
||||
- [`LogicalPlan` nodes](#logicalplan-nodes)
|
||||
- [`TableScan`](#tablescan)
|
||||
- [`Projection`](#projection)
|
||||
- [`Filter`](#filter)
|
||||
- [`Sort`](#sort)
|
||||
- [Physical plan](#physical-plan)
|
||||
- [`ExecutionPlan` nodes](#executionplan-nodes)
|
||||
- [`DeduplicateExec`](#deduplicateexec)
|
||||
- [`EmptyExec`](#emptyexec)
|
||||
- [`FilterExec`](#filterexec)
|
||||
- [`ParquetExec`](#parquetexec)
|
||||
- [`ProjectionExec`](#projectionexec)
|
||||
- [`RecordBatchesExec`](#recordbatchesexec)
|
||||
- [`SortExec`](#sortexec)
|
||||
- [`SortPreservingMergeExec`](#sortpreservingmergeexec)
|
||||
- [Overlapping data and deduplication](#overlapping-data-and-deduplication)
|
||||
- [Example of overlapping data](#example-of-overlapping-data)
|
||||
- [DataFusion query plans](#datafusion-query-plans)
|
||||
|
||||
## Display syntax
|
||||
|
||||
[Logical](#logical-plan) and [physical query plans](#physical-plan) are represented (for example, in an `EXPLAIN` report) in _tree syntax_.
|
||||
|
||||
- Each plan is represented as an upside-down tree composed of _nodes_.
|
||||
- A parent node awaits the output of its child nodes.
|
||||
- Data flows up from the bottom innermost nodes of the tree to the outermost _root node_ at the top.
|
||||
|
||||
### Example logical and physical plan
|
||||
|
||||
The following query generates an `EXPLAIN` report that includes a logical and a physical plan:
|
||||
|
||||
```sql
|
||||
EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;
|
||||
```
|
||||
|
||||
The output is the following:
|
||||
|
||||
#### Figure 1. EXPLAIN report
|
||||
|
||||
```sql
|
||||
| plan_type | plan |
|
||||
+---------------+--------------------------------------------------------------------------+
|
||||
| logical_plan | Sort: h2o.city ASC NULLS LAST, h2o.time DESC NULLS FIRST |
|
||||
| | TableScan: h2o projection=[city, min_temp, time] |
|
||||
| physical_plan | SortPreservingMergeExec: [city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | UnionExec |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | SortExec: expr=[city@0 ASC NULLS LAST,time@2 DESC] |
|
||||
| | ParquetExec: file_groups={...}, projection=[city, min_temp, time] |
|
||||
| | |
|
||||
```
|
||||
|
||||
{{% caption %}}
|
||||
Output from `EXPLAIN SELECT city, min_temp, time FROM h2o ORDER BY city ASC, time DESC;`
|
||||
{{% /caption %}}
|
||||
|
||||
The leaf nodes in the [Figure 1](#figure-1-explain-report) physical plan are parallel `ParquetExec` nodes:
|
||||
|
||||
```text
|
||||
ParquetExec: file_groups={...}, projection=[city, min_temp, time]
|
||||
...
|
||||
ParquetExec: file_groups={...}, projection=[city, min_temp, time]
|
||||
```
|
||||
|
||||
## Data flow
|
||||
|
||||
A [physical plan](#physical-plan) node represents a specific implementation of `ExecutionPlan` that receives an input stream, applies expressions for filtering and sorting, and then yields an output stream to its parent node.
|
||||
|
||||
The following diagram shows the data flow and sequence of `ExecutionPlan` nodes in the [Figure 1](#figure-1-explain-report) physical plan:
|
||||
|
||||
<!-- BEGIN Query plan diagram -->
|
||||
{{< html-diagram/query-plan >}}
|
||||
<!-- END Query plan diagram -->
|
||||
|
||||
{{% product-name %}} includes the following plan expressions:
|
||||
|
||||
## Logical plan
|
||||
|
||||
A logical plan for a query:
|
||||
|
||||
- is a high-level plan that expresses the "intent" of a query and the steps required for calculating the result.
|
||||
- requires information about the data schema
|
||||
- is independent of the [physical execution](#physical-plan), cluster configuration, data source (Ingester or Object store), or how data is organized or partitioned
|
||||
- is displayed as a tree of [DataFusion `LogicalPlan` nodes](#logical-plan-nodes)
|
||||
|
||||
## `LogicalPlan` nodes
|
||||
|
||||
Each node in an {{% product-name %}} logical plan tree represents a [`LogicalPlan` implementation](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html#variants) that receives criteria extracted from the query and applies relational operators and optimizations for transforming input data to an output table.
|
||||
|
||||
The following are some `LogicalPlan` nodes used in InfluxDB logical plans.
|
||||
|
||||
### `TableScan`
|
||||
|
||||
[`Tablescan`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.TableScan.html) retrieves rows from a table provider by reference or from the context.
|
||||
|
||||
### `Projection`
|
||||
|
||||
[`Projection`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Projection.html) evaluates an arbitrary list of expressions on the input; equivalent to an SQL `SELECT` statement with an expression list.
|
||||
|
||||
### `Filter`
|
||||
|
||||
[`Filter`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Filter.html) filters rows from the input that do not satisfy the specified expression; equivalent to an SQL `WHERE` clause with a predicate expression.
|
||||
|
||||
### `Sort`
|
||||
|
||||
[`Sort`](https://docs.rs/datafusion/latest/datafusion/logical_expr/struct.Sort.html) sorts the input according to a list of sort expressions; used to implement SQL `ORDER BY`.
|
||||
|
||||
For details and a list of `LogicalPlan` implementations, see [`Enum datafusion::logical_expr::LogicalPlan` Variants](https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html#variants) in the DataFusion documentation.
|
||||
|
||||
## Physical plan
|
||||
|
||||
A physical plan, or _execution plan_, for a query:
|
||||
|
||||
- is an optimized plan that derives from the [logical plan](#logical-plan) and contains the low-level steps for query execution.
|
||||
- considers the cluster configuration (for example, CPU and memory allocation) and data organization (for example: partitions, the number of files, and whether files overlap)--for example:
|
||||
- If you run the same query with the same data on different clusters with different configurations, each cluster may generate a different physical plan for the query.
|
||||
- If you run the same query on the same cluster at different times, the physical plan may differ each time, depending on the data at query time.
|
||||
- if generated using `ANALYZE`, includes runtime metrics sampled during query execution
|
||||
- is displayed as a tree of [`ExecutionPlan` nodes](#execution-plan-nodes)
|
||||
|
||||
## `ExecutionPlan` nodes
|
||||
|
||||
Each node in an {{% product-name %}} physical plan represents a call to a specific implementation of the [DataFusion `ExecutionPlan`](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html)
|
||||
that receives input data, query criteria expressions, and an output schema.
|
||||
|
||||
The following are some `ExecutionPlan` nodes used in InfluxDB physical plans.
|
||||
|
||||
### `DeduplicateExec`
|
||||
|
||||
InfluxDB `DeduplicateExec` takes an input stream of `RecordBatch` sorted on `sort_key` and applies InfluxDB-specific deduplication logic.
|
||||
The output is dependent on the order of the input rows that have the same key.
|
||||
|
||||
### `EmptyExec`
|
||||
|
||||
DataFusion [`EmptyExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/empty/struct.EmptyExec.html) is an execution plan for an empty relation and indicates that the table doesn't contain data for the time range of the query.
|
||||
|
||||
### `FilterExec`
|
||||
|
||||
The execution plan for the [`Filter`](#filter) `LogicalPlan`.
|
||||
|
||||
DataFusion [`FilterExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/filter/struct.FilterExec.html) evaluates a boolean predicate against all input batches to determine which rows to include in the output batches.
|
||||
|
||||
### `ParquetExec`
|
||||
|
||||
DataFusion [`ParquetExec`](https://docs.rs/datafusion/latest/datafusion/datasource/physical_plan/parquet/struct.ParquetExec.html) scans one or more Parquet partitions.
|
||||
|
||||
#### `ParquetExec` expressions
|
||||
|
||||
##### `file_groups`
|
||||
|
||||
A _file group_ is a list of files to scan.
|
||||
Files are referenced by path:
|
||||
|
||||
- `1/1/b862a7e9b.../243db601-....parquet`
|
||||
- `1/1/b862a7e9b.../f5fb7c7d-....parquet`
|
||||
|
||||
In InfluxDB v3, the path structure represents how data is organized.
|
||||
|
||||
A path has the following structure:
|
||||
|
||||
```text
|
||||
<namespace_id>/<table_id>/<partition_hash_id>/<uuid_of_the_file>.parquet
|
||||
1 / 1 /b862a7e9b329ee6a4.../243db601-f3f1-4....parquet
|
||||
```
|
||||
|
||||
- `namespace_id`: the namespace (database) being queried
|
||||
- `table_id`: the table (measurement) being queried
|
||||
- `partition_hash_id`: the partition this file belongs to.
|
||||
You can count partition IDs to find how many partitions the query reads.
|
||||
- `uuid_of_the_file`: the file identifier.
|
||||
|
||||
`ParquetExec` processes groups in parallel and reads the files in each group sequentially.
|
||||
|
||||
##### `projection`
|
||||
|
||||
`projection` lists the table columns that the query plan needs to read to execute the query.
|
||||
The parameter name `projection` refers to _projection pushdown_, the action of filtering columns.
|
||||
|
||||
Consider the following sample data that contains many columns:
|
||||
|
||||
```text
|
||||
h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600
|
||||
```
|
||||
|
||||
| table | state | city | min_temp | max_temp | area | time |
|
||||
|:-----:|:-----:|:----:|:--------:|:--------:|:----:|:----:|
|
||||
| h2o | CA | SF | 68.4 | 85.7 | 500u | 600 |
|
||||
|
||||
However, the following SQL query specifies only three columns (`city`, `state`, and `time`):
|
||||
|
||||
```sql
|
||||
SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
When processing the query, the [Querier](/influxdb/cloud-dedicated/reference/internals/storage-engine/#querier) specifies the three required columns in the projection and the projection is "pushed down" to leaf nodes--columns not specified are pruned as early as possible during query execution.
|
||||
|
||||
```text
|
||||
projection=[city, state, time]
|
||||
```
|
||||
|
||||
##### `output_ordering`
|
||||
|
||||
`output_ordering` specifies the sort order for the output.
|
||||
The Querier specifies `output_ordering` if the output should be ordered and if the [Querier](/influxdb/clustered/reference/internals/storage-engine/#querier) knows the order.
|
||||
|
||||
When storing data to Parquet files, InfluxDB sorts the data to improve storage compression and query efficiency and the planner tries to preserve that order for as long as possible.
|
||||
Generally, the `output_ordering` value that `ParquetExec` receives is the ordering (or a subset of the ordering) of stored data.
|
||||
|
||||
_By design, [`RecordBatchesExec`](#recordbatchesexec) data isn't sorted._
|
||||
|
||||
In the following example, the query planner specifies the output sort order `state ASC, city ASC, time ASC,`:
|
||||
|
||||
```text
|
||||
output_ordering=[state@2 ASC, city@1 ASC, time@3 ASC, __chunk_order@0 ASC]
|
||||
```
|
||||
|
||||
##### `predicate`
|
||||
|
||||
`predicate` is the data filter specified in the query and used for row filtering when scanning Parquet files.
|
||||
|
||||
For example, given the following SQL query:
|
||||
|
||||
```sql
|
||||
SELECT city, count(1)
|
||||
FROM h2o
|
||||
WHERE time >= to_timestamp(200) AND time < to_timestamp(700)
|
||||
AND state = 'MA'
|
||||
GROUP BY city
|
||||
ORDER BY city ASC;
|
||||
```
|
||||
|
||||
The `predicate` value is the boolean expression in the `WHERE` statement:
|
||||
|
||||
```text
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA
|
||||
```
|
||||
|
||||
##### `pruning predicate`
|
||||
|
||||
`pruning_predicate` is created from the [`predicate`](#predicate) value and is used for pruning data and files from the chosen partitions.
|
||||
|
||||
For example, given the following `predicate` parsed from the SQL:
|
||||
|
||||
```text
|
||||
predicate=time@5 >= 200 AND time@5 < 700 AND state@4 = MA,
|
||||
```
|
||||
|
||||
The Querier creates the following `pruning_predicate`:
|
||||
|
||||
```text
|
||||
pruning_predicate=time_max@0 >= 200 AND time_min@1 < 700 AND state_min@2 <= MA AND MA <= state_max@3
|
||||
```
|
||||
|
||||
The default filters files by `time`.
|
||||
|
||||
_Before the physical plan is generated, an additional `partition pruning` step uses predicates on partitioning columns to prune partitions._
|
||||
|
||||
### `ProjectionExec`
|
||||
|
||||
DataFusion [`ProjectionExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/projection/struct.ProjectionExec.html) evaluates an arbitrary list of expressions on the input; the execution plan for the [`Projection`](#projection) `LogicalPlan`.
|
||||
|
||||
### `RecordBatchesExec`
|
||||
|
||||
The InfluxDB `RecordBatchesExec` implementation retrieves and scans recently written, yet-to-be-persisted, data from the InfluxDB v3 [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester).
|
||||
|
||||
When generating the plan, the [Querier](/influxdb/clustered/reference/internals/storage-engine/#querier) sends the query criteria, such as database, table, and columns, to the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester) to retrieve data not yet persisted to Parquet files.
|
||||
If the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester) has data that meets the criteria (the chunk size is non-zero), then the plan includes `RecordBatchesExec`.
|
||||
|
||||
#### `RecordBatchesExec` attributes
|
||||
|
||||
##### `chunks`
|
||||
|
||||
`chunks` is the number of data chunks from the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester).
|
||||
Often one (`1`), but it can be many.
|
||||
|
||||
##### `projection`
|
||||
|
||||
`projection` specifies a list of columns to read and output.
|
||||
|
||||
`__chunk_order` in a list of columns is an InfluxDB-generated column used to keep the chunks and files ordered for deduplication--for example:
|
||||
|
||||
```text
|
||||
projection=[__chunk_order, city, state, time]
|
||||
```
|
||||
|
||||
For details and other DataFusion `ExecutionPlan` implementations, see [`Struct datafusion::datasource::physical_plan` implementors](https://docs.rs/datafusion/latest/datafusion/physical_plan/trait.ExecutionPlan.html) in the DataFusion documentation.
|
||||
|
||||
### `SortExec`
|
||||
|
||||
The execution plan for the [`Sort`](#sort) `LogicalPlan`.
|
||||
|
||||
DataFusion [`SortExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/sorts/sort/struct.SortExec.html) supports sorting datasets that are larger than the memory allotted by the memory manager, by spilling to disk.
|
||||
|
||||
### `SortPreservingMergeExec`
|
||||
|
||||
DataFusion [`SortPreservingMergeExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/sorts/sort_preserving_merge/struct.SortPreservingMergeExec.html) takes an input execution plan and a list of sort expressions and, provided each partition of the input plan is sorted with respect to these sort expressions, yields a single partition sorted with respect to them.
|
||||
|
||||
#### `UnionExec`
|
||||
|
||||
DataFusion [`UnionExec`](https://docs.rs/datafusion/latest/datafusion/physical_plan/union/struct.UnionExec.html) is the `UNION ALL` execution plan for combining multiple inputs that have the same schema.
|
||||
`UnionExec` concatenates the partitions and does not mix or copy data within or across partitions.
|
||||
|
||||
## Overlapping data and deduplication
|
||||
|
||||
_Overlapping data_ refers to files or batches in which the time ranges (represented by timestamps) intersect.
|
||||
Two _chunks_ of data overlap if both chunks contain data for the same portion of time.
|
||||
|
||||
### Example of overlapping data
|
||||
|
||||
For example, the following chunks represent line protocol written to InfluxDB:
|
||||
|
||||
```text
|
||||
// Chunk 4: stored parquet file
|
||||
// - time range: 400-600
|
||||
// - no duplicates in its own chunk
|
||||
// - overlaps chunk 3
|
||||
[
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 600",
|
||||
"h2o,state=CA,city=SJ min_temp=69.5,max_temp=89.2 600", // duplicates row 3 in chunk 5
|
||||
"h2o,state=MA,city=Bedford max_temp=80.75,area=742u 400", // overlaps chunk 3
|
||||
"h2o,state=MA,city=Boston min_temp=65.40,max_temp=82.67 400", // overlaps chunk 3
|
||||
],
|
||||
|
||||
// Chunk 5: Ingester data
|
||||
// - time range: 550-700
|
||||
// - overlaps & duplicates data in chunk 4
|
||||
[
|
||||
"h2o,state=MA,city=Bedford max_temp=88.75,area=742u 600", // overlaps chunk 4
|
||||
"h2o,state=CA,city=SF min_temp=68.4,max_temp=85.7,area=500u 650",
|
||||
"h2o,state=CA,city=SJ min_temp=68.5,max_temp=90.0 600", // duplicates row 2 in chunk 4
|
||||
"h2o,state=CA,city=SJ min_temp=75.5,max_temp=84.08 700",
|
||||
"h2o,state=MA,city=Boston min_temp=67.4 550", // overlaps chunk 4
|
||||
]
|
||||
```
|
||||
|
||||
- `Chunk 4` spans the time range `400-600` and represents data persisted to a Parquet file in the [Object store](/influxdb/clustered/reference/internals/storage-engine/#object-store).
|
||||
- `Chunk 5` spans the time range `550-700` and represents yet-to-be persisted data from the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester).
|
||||
- The chunks overlap the range `550-600`.
|
||||
|
||||
If data overlaps at query time, the [Querier](/influxdb/clustered/reference/internals/storage-engine/#querier) must include the _deduplication_ process in the query plan, which uses the same multi-column sort-merge operators used by the [Ingester](/influxdb/clustered/reference/internals/storage-engine/#ingester).
|
||||
Compared to an ingestion plan that uses sort-merge operators, a query plan is more complex and ensures that data streams through the plan after deduplication.
|
||||
|
||||
Because sort-merge operations used in deduplication have a non-trivial execution cost, InfluxDB v3 tries to avoid the need for deduplication.
|
||||
Due to how InfluxDB organizes data, a Parquet file never contains duplicates of the data it stores; only overlapped data can contain duplicates.
|
||||
During compaction, the [Compactor](/influxdb/clustered/reference/internals/storage-engine/#compactor) sorts stored data to reduce overlaps and optimize query performance.
|
||||
For data that doesn't have overlaps, the [Querier](/influxdb/clustered/reference/internals/storage-engine/#querier) doesn't need to include the deduplication process and the query plan can further distribute non-overlapping data for parallel processing.
|
||||
|
||||
## DataFusion query plans
|
||||
|
||||
For more information about DataFusion query plans and the DataFusion API used in InfluxDB v3, see the following:
|
||||
|
||||
- [Query Planning and Execution Overview](https://docs.rs/datafusion/latest/datafusion/index.html#query-planning-and-execution-overview) in the DataFusion documentation.
|
||||
- [Plan representations](https://docs.rs/datafusion/latest/datafusion/#plan-representations) in the DataFusion documentation.
|
|
@ -278,6 +278,7 @@ Use the SQL `SELECT` statement to query data from a specific measurement or meas
|
|||
```sql
|
||||
SELECT * FROM "h2o_feet"
|
||||
```
|
||||
|
||||
### WHERE clause
|
||||
|
||||
Use the `WHERE` clause to filter results based on `fields`, `tags`, and `timestamps`.
|
||||
|
@ -290,6 +291,7 @@ Rows that evaluate as `FALSE` are omitted from the result set.
|
|||
```sql
|
||||
SELECT * FROM "h2o_feet" WHERE "water_level" <= 9
|
||||
```
|
||||
|
||||
```sql
|
||||
SELECT
|
||||
*
|
||||
|
@ -489,7 +491,7 @@ FROM
|
|||
time,
|
||||
"water_level",
|
||||
row_number() OVER (
|
||||
order by
|
||||
ORDER BY
|
||||
water_level desc
|
||||
) as rn
|
||||
FROM
|
||||
|
@ -559,7 +561,8 @@ GROUP BY "location"
|
|||
|
||||
### Selector functions
|
||||
|
||||
Selector functions are unique to InfluxDB. They behave like aggregate functions in that they take a row of data and compute it down to a single value. However, selectors are unique in that they return a **time value** in addition to the computed value. In short, selectors return an aggregated value along with a timestamp.
|
||||
Selector functions are unique to InfluxDB. They behave like aggregate functions in that they take a row of data and compute it down to a single value.
|
||||
However, selectors are unique in that they return a **time value** in addition to the computed value. In short, selectors return an aggregated value along with a timestamp.
|
||||
|
||||
| Function | Description |
|
||||
| :--------------- | :-------------------------------------------------------------- |
|
||||
|
@ -618,7 +621,6 @@ GROUP BY time
|
|||
| APPROX_PERCENTILE_CONT | Returns the approximate percentile of input values. |
|
||||
| APPROX_PERCENTILE_CONT_WITH_WEIGHT | Returns the approximate percentile of input values with weight. |
|
||||
|
||||
|
||||
### Math functions
|
||||
|
||||
| Function | Description |
|
||||
|
@ -650,7 +652,6 @@ GROUP BY time
|
|||
| COALESCE | Returns the first argument that is not null. If all arguments are null, then `COALESCE` will return nulls. |
|
||||
| NULLIF | Returns a null value if value1 equals value2, otherwise returns value1. |
|
||||
|
||||
|
||||
### Regular expression functions
|
||||
|
||||
| Function | Description |
|
||||
|
|
|
@ -1,31 +1,41 @@
|
|||
---
|
||||
title: EXPLAIN command
|
||||
description: >
|
||||
The `EXPLAIN` command shows the logical and physical execution plan for the
|
||||
specified SQL statement.
|
||||
The `EXPLAIN` command returns the logical and physical execution plans for the specified SQL statement.
|
||||
menu:
|
||||
influxdb_clustered:
|
||||
name: EXPLAIN command
|
||||
parent: SQL reference
|
||||
weight: 207
|
||||
related:
|
||||
- /influxdb/clustered/reference/internals/query-plan/
|
||||
- /influxdb/clustered/query-data/execute-queries/analyze-query-plan/
|
||||
- /influxdb/clustered/query-data/execute-queries/troubleshoot/
|
||||
---
|
||||
|
||||
The `EXPLAIN` command returns the logical and physical execution plan for the
|
||||
The `EXPLAIN` command returns the [logical plan](/influxdb/clustered/reference/internals/query-plan/#logical-plan) and the [physical plan](/influxdb/clustered/reference/internals/query-plan/#physical-plan) for the
|
||||
specified SQL statement.
|
||||
|
||||
```sql
|
||||
EXPLAIN [ANALYZE] [VERBOSE] statement
|
||||
```
|
||||
|
||||
- [EXPLAIN](#explain)
|
||||
- [EXPLAIN ANALYZE](#explain-analyze)
|
||||
- [`EXPLAIN`](#explain)
|
||||
- [Example `EXPLAIN`](#example-explain)
|
||||
- [`EXPLAIN ANALYZE`](#explain-analyze)
|
||||
- [Example `EXPLAIN ANALYZE`](#example-explain-analyze)
|
||||
- [`EXPLAIN ANALYZE VERBOSE`](#explain-analyze-verbose)
|
||||
- [Example `EXPLAIN ANALYZE VERBOSE`](#example-explain-analyze-verbose)
|
||||
|
||||
## EXPLAIN
|
||||
## `EXPLAIN`
|
||||
|
||||
Returns the execution plan of a statement.
|
||||
Returns the logical plan and physical (execution) plan of a statement.
|
||||
To output more details, use `EXPLAIN VERBOSE`.
|
||||
|
||||
##### Example EXPLAIN ANALYZE
|
||||
`EXPLAIN` doesn't execute the statement.
|
||||
To execute the statement and view runtime metrics, use [`EXPLAIN ANALYZE`](#explain-analyze).
|
||||
|
||||
### Example `EXPLAIN`
|
||||
|
||||
```sql
|
||||
EXPLAIN
|
||||
|
@ -39,20 +49,30 @@ GROUP BY room
|
|||
{{< expand-wrapper >}}
|
||||
{{% expand "View `EXPLAIN` example output" %}}
|
||||
|
||||
| plan_type | plan |
|
||||
| :------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| logical_plan | Projection: home.room, AVG(home.temp) AS temp Aggregate: groupBy=[[home.room]], aggr=[[AVG(home.temp)]] TableScan: home projection=[room, temp] |
|
||||
| physical_plan | ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp] AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)] CoalesceBatchesExec: target_batch_size=8192 RepartitionExec: partitioning=Hash([Column { name: "room", index: 0 }], 4), input_partitions=4 RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)] ParquetExec: limit=None, partitions={1 group: [[136/316/1120/1ede0031-e86e-06e5-12ba-b8e6fd76a202.parquet]]}, projection=[room, temp] |
|
||||
| | plan_type | plan |
|
||||
|---:|:--------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| 0 | logical_plan |<span style="white-space:pre-wrap;"> Projection: home.room, AVG(home.temp) AS temp </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> Aggregate: groupBy=[[home.room]], aggr=[[AVG(home.temp)]] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> TableScan: home projection=[room, temp] </span>|
|
||||
| 1 | physical_plan |<span style="white-space:pre-wrap;"> ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> CoalesceBatchesExec: target_batch_size=8192 </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> RepartitionExec: partitioning=Hash([room@0], 8), input_partitions=8 </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> ParquetExec: file_groups={8 groups: [[70434/116281/404d73cea0236530ea94f5470701eb814a8f0565c0e4bef5a2d2e33dfbfc3567/1be334e8-0af8-00da-2615-f67cd4be90f7.parquet, 70434/116281/b7a9e7c57fbfc3bba9427e4b3e35c89e001e2e618b0c7eb9feb4d50a3932f4db/d29370d4-262f-0d32-2459-fe7b099f682f.parquet], [70434/116281/c14418ba28a22a3abb693a1cb326a63b62dc611aec58c9bed438fdafd3bc5882/8b29ae98-761f-0550-2fe4-ee77503658e9.parquet], [70434/116281/fa677477eed622ae8123da1251aa7c351f801e2ee2f0bc28c0fe3002a30b3563/65bb4dc3-04e1-0e02-107a-90cee83c51b0.parquet], [70434/116281/db162bdd30261019960dd70da182e6ebd270284569ecfb5deffea7e65baa0df9/2505e079-67c5-06d9-3ede-89aca542dd18.parquet], [70434/116281/0c025dcccae8691f5fd70b0f131eea4ca6fafb95a02f90a3dc7bb015efd3ab4f/3f3e44c3-b71e-0ca4-3dc7-8b2f75b9ff86.parquet], ...]}, projection=[room, temp] </span>|
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
## EXPLAIN ANALYZE
|
||||
## `EXPLAIN ANALYZE`
|
||||
|
||||
Returns the execution plan and metrics of a statement.
|
||||
To output more information, use `EXPLAIN ANALYZE VERBOSE`.
|
||||
Executes a statement and returns the execution plan and runtime metrics of the statement.
|
||||
The report includes the [logical plan](/influxdb/clustered/reference/internals/query-plan/#logical-plan) and the [physical plan](/influxdb/clustered/reference/internals/query-plan/#physical-plan) annotated with execution counters, number of rows produced, and runtime metrics sampled during the query execution.
|
||||
|
||||
##### Example EXPLAIN ANALYZE
|
||||
If the plan requires reading lots of data files, `EXPLAIN` and `EXPLAIN ANALYZE` may truncate the list of files in the report.
|
||||
To output more information, including intermediate plans and paths for all scanned Parquet files, use [`EXPLAIN ANALYZE VERBOSE`](#explain-analyze-verbose).
|
||||
|
||||
### Example `EXPLAIN ANALYZE`
|
||||
|
||||
```sql
|
||||
EXPLAIN ANALYZE
|
||||
|
@ -60,15 +80,44 @@ SELECT
|
|||
room,
|
||||
avg(temp) AS temp
|
||||
FROM home
|
||||
WHERE time >= '2023-01-01' AND time <= '2023-12-31'
|
||||
GROUP BY room
|
||||
```
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View `EXPLAIN ANALYZE` example output" %}}
|
||||
|
||||
| plan_type | plan |
|
||||
| :---------------- | :-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
|
||||
| Plan with Metrics | CoalescePartitionsExec, metrics=[output_rows=2, elapsed_compute=8.892µs, spill_count=0, spilled_bytes=0, mem_used=0] ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp], metrics=[output_rows=2, elapsed_compute=3.608µs, spill_count=0, spilled_bytes=0, mem_used=0] AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)], metrics=[output_rows=2, elapsed_compute=121.771µs, spill_count=0, spilled_bytes=0, mem_used=0] CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=2, elapsed_compute=23.711µs, spill_count=0, spilled_bytes=0, mem_used=0] RepartitionExec: partitioning=Hash([Column { name: "room", index: 0 }], 4), input_partitions=4, metrics=[repart_time=25.117µs, fetch_time=1.614597ms, send_time=6.705µs] RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1, metrics=[repart_time=1ns, fetch_time=319.754µs, send_time=2.067µs] AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)], metrics=[output_rows=2, elapsed_compute=75.615µs, spill_count=0, spilled_bytes=0, mem_used=0] ParquetExec: limit=None, partitions={1 group: [[136/316/1120/1ede0031-e86e-06e5-12ba-b8e6fd76a202.parquet]]}, projection=[room, temp], metrics=[output_rows=26, elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, pushdown_rows_filtered=0, bytes_scanned=290, row_groups_pruned=0, num_predicate_creation_errors=0, predicate_evaluation_errors=0, page_index_rows_filtered=0, time_elapsed_opening=100.37µs, page_index_eval_time=2ns, time_elapsed_scanning_total=157.086µs, time_elapsed_processing=226.644µs, pushdown_eval_time=2ns, time_elapsed_scanning_until_data=116.875µs] |
|
||||
| | plan_type | plan |
|
||||
|---:|:------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
||||
| 0 | Plan with Metrics |<span style="white-space:pre-wrap;"> ProjectionExec: expr=[room@0 as room, AVG(home.temp)@1 as temp], metrics=[output_rows=2, elapsed_compute=4.768µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=FinalPartitioned, gby=[room@0 as room], aggr=[AVG(home.temp)], ordering_mode=Sorted, metrics=[output_rows=2, elapsed_compute=140.405µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=2, elapsed_compute=6.821µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> RepartitionExec: partitioning=Hash([room@0], 8), input_partitions=8, preserve_order=true, sort_exprs=room@0 ASC, metrics=[output_rows=2, elapsed_compute=18.408µs, repart_time=59.698µs, fetch_time=1.057882762s, send_time=5.83µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> AggregateExec: mode=Partial, gby=[room@0 as room], aggr=[AVG(home.temp)], ordering_mode=Sorted, metrics=[output_rows=2, elapsed_compute=137.577µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> RepartitionExec: partitioning=RoundRobinBatch(8), input_partitions=6, preserve_order=true, sort_exprs=room@0 ASC, metrics=[output_rows=46, elapsed_compute=26.637µs, repart_time=6ns, fetch_time=399.971411ms, send_time=6.658µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> ProjectionExec: expr=[room@0 as room, temp@2 as temp], metrics=[output_rows=46, elapsed_compute=3.102µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=46, elapsed_compute=25.585µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> FilterExec: time@1 >= 1672531200000000000 AND time@1 <= 1703980800000000000, metrics=[output_rows=46, elapsed_compute=26.51µs] </span>|
|
||||
| | |<span style="white-space:pre-wrap;"> ParquetExec: file_groups={6 groups: [[70434/116281/404d73cea0236530ea94f5470701eb814a8f0565c0e4bef5a2d2e33dfbfc3567/1be334e8-0af8-00da-2615-f67cd4be90f7.parquet], [70434/116281/c14418ba28a22a3abb693a1cb326a63b62dc611aec58c9bed438fdafd3bc5882/8b29ae98-761f-0550-2fe4-ee77503658e9.parquet], [70434/116281/fa677477eed622ae8123da1251aa7c351f801e2ee2f0bc28c0fe3002a30b3563/65bb4dc3-04e1-0e02-107a-90cee83c51b0.parquet], [70434/116281/db162bdd30261019960dd70da182e6ebd270284569ecfb5deffea7e65baa0df9/2505e079-67c5-06d9-3ede-89aca542dd18.parquet], [70434/116281/0c025dcccae8691f5fd70b0f131eea4ca6fafb95a02f90a3dc7bb015efd3ab4f/3f3e44c3-b71e-0ca4-3dc7-8b2f75b9ff86.parquet], ...]}, projection=[room, time, temp], output_ordering=[room@0 ASC, time@1 ASC], predicate=time@6 >= 1672531200000000000 AND time@6 <= 1703980800000000000, pruning_predicate=time_max@0 >= 1672531200000000000 AND time_min@1 <= 1703980800000000000, required_guarantees=[], metrics=[output_rows=46, elapsed_compute=6ns, predicate_evaluation_errors=0, bytes_scanned=3279, row_groups_pruned_statistics=0, file_open_errors=0, file_scan_errors=0, pushdown_rows_filtered=0, num_predicate_creation_errors=0, row_groups_pruned_bloom_filter=0, page_index_rows_filtered=0, time_elapsed_opening=398.462968ms, time_elapsed_processing=1.626106ms, time_elapsed_scanning_total=1.36822ms, page_index_eval_time=33.474µs, pushdown_eval_time=14.267µs, time_elapsed_scanning_until_data=1.27694ms] </span>|
|
||||
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
## `EXPLAIN ANALYZE VERBOSE`
|
||||
|
||||
Executes a statement and returns the execution plan, runtime metrics, and additional details helpful for debugging the statement.
|
||||
|
||||
The report includes the following:
|
||||
|
||||
- the [logical plan](/influxdb/clustered/reference/internals/query-plan/#logical-plan)
|
||||
- the [physical plan](/influxdb/clustered/reference/internals/query-plan/#physical-plan) annotated with execution counters, number of rows produced, and runtime metrics sampled during the query execution
|
||||
- Information truncated in the `EXPLAIN` report--for example, the paths for all [Parquet files retrieved for the query](/influxdb/clustered/reference/internals/query-plan/#file_groups).
|
||||
- All intermediate physical plans that DataFusion and the [Querier](/influxdb/clustered/reference/internals/storage-engine/#querier) generate before generating the final physical plan--helpful in debugging to see when an [`ExecutionPlan` node](/influxdb/clustered/reference/internals/query-plan/#executionplan-nodes) is added or removed, and how InfluxDB optimizes the query.
|
||||
|
||||
### Example `EXPLAIN ANALYZE VERBOSE`
|
||||
|
||||
```SQL
|
||||
EXPLAIN ANALYZE VERBOSE SELECT temp FROM home
|
||||
WHERE time >= now() - INTERVAL '7 days' AND room = 'Kitchen'
|
||||
ORDER BY time
|
||||
```
|
||||
|
|
|
@ -30,7 +30,7 @@ by columns in a table.
|
|||
|
||||
## SHOW TABLES
|
||||
|
||||
Returns information about tables (measurements) in an InfluxDB bucket.
|
||||
Returns information about tables (measurements) in an InfluxDB database.
|
||||
|
||||
```sql
|
||||
SHOW TABLES
|
||||
|
@ -57,7 +57,7 @@ _Measurements are those that use the **`iox` table schema**._
|
|||
|
||||
## SHOW COLUMNS
|
||||
|
||||
Returns information about the schema of a table (measurement) in an InfluxDB bucket.
|
||||
Returns information about the schema of a table (measurement) in an InfluxDB database.
|
||||
|
||||
```sql
|
||||
SHOW COLUMNS FROM example_table
|
||||
|
|
|
@ -35,7 +35,6 @@ An InfluxQL query that runs automatically and periodically within a database.
|
|||
Continuous queries require a function in the `SELECT` clause and must include a `GROUP BY time()` clause.
|
||||
See [Continuous Queries](/influxdb/v1/query_language/continuous_queries/).
|
||||
|
||||
|
||||
Related entries: [function](/influxdb/v1/concepts/glossary/#function)
|
||||
|
||||
## database
|
||||
|
@ -151,7 +150,7 @@ Related entries: [field set](/influxdb/v1/concepts/glossary/#field-set), [series
|
|||
|
||||
## points per second
|
||||
|
||||
A deprecated measurement of the rate at which data are persisted to InfluxDB.
|
||||
A deprecated measurement of the rate at which data is persisted to InfluxDB.
|
||||
The schema allows and even encourages the recording of multiple metric values per point, rendering points per second ambiguous.
|
||||
|
||||
Write speeds are generally quoted in values per second, a more precise metric.
|
||||
|
@ -184,7 +183,7 @@ Related entries: [duration](/influxdb/v1/concepts/glossary/#duration), [measurem
|
|||
|
||||
## schema
|
||||
|
||||
How the data are organized in InfluxDB.
|
||||
How data is organized in InfluxDB.
|
||||
The fundamentals of the InfluxDB schema are databases, retention policies, series, measurements, tag keys, tag values, and field keys.
|
||||
See [Schema Design](/influxdb/v1/concepts/schema_and_data_layout/) for more information.
|
||||
|
||||
|
@ -364,7 +363,7 @@ See [Authentication and Authorization](/influxdb/v1/administration/authenticatio
|
|||
|
||||
## values per second
|
||||
|
||||
The preferred measurement of the rate at which data are persisted to InfluxDB. Write speeds are generally quoted in values per second.
|
||||
The preferred measurement of the rate at which data is persisted to InfluxDB. Write speeds are generally quoted in values per second.
|
||||
|
||||
To calculate the values per second rate, multiply the number of points written per second by the number of values stored per point. For example, if the points have four fields each, and a batch of 5000 points is written 10 times per second, then the values per second rate is `4 field values per point * 5000 points per batch * 10 batches per second = 200,000 values per second`.
|
||||
|
||||
|
|
|
@ -14,7 +14,7 @@ menu:
|
|||
weight: 201
|
||||
---
|
||||
|
||||
Arduino is an open-source hardware and software platform used for building electronics projects.
|
||||
Arduino is an open source hardware and software platform used for building electronics projects.
|
||||
|
||||
The documentation for this client library is available on GitHub.
|
||||
|
||||
|
|
|
@ -13,7 +13,7 @@ menu:
|
|||
weight: 201
|
||||
---
|
||||
|
||||
Kotlin is an open-source programming language that runs on the Java Virtual Machine (JVM).
|
||||
Kotlin is an open source programming language that runs on the Java Virtual Machine (JVM).
|
||||
|
||||
The documentation for this client library is available on GitHub.
|
||||
|
||||
|
|
|
@ -106,7 +106,6 @@ A bucket is a named location where time series data is stored.
|
|||
All buckets have a [retention period](#retention-period).
|
||||
A bucket belongs to an organization.
|
||||
|
||||
|
||||
### bucket schema
|
||||
|
||||
In InfluxDB Cloud, an explicit bucket schema lets you strictly enforce the data that can be written into one or more measurements in a bucket by defining the column names, tags, fields, and data types allowed for each measurement. By default, buckets in InfluxDB {{< current-version >}} have an `implicit` schema that lets you write data without restrictions on columns, fields, or data types.
|
||||
|
@ -279,6 +278,7 @@ InfluxDB supports the following data types:
|
|||
| time | dateTime |
|
||||
|
||||
For more information about different data types, see:
|
||||
|
||||
- [annotated CSV](/influxdb/v2/reference/syntax/annotated-csv/)
|
||||
- [extended annotated CSV](/influxdb/cloud/reference/syntax/annotated-csv/extended/#datatype)
|
||||
- [line protocol](/influxdb/v2/reference/syntax/line-protocol/#data-types-and-format)
|
||||
|
@ -306,7 +306,7 @@ Aggregating high resolution data into lower resolution data to preserve disk spa
|
|||
|
||||
### duration
|
||||
|
||||
A data type that represents a duration of time (1s, 1m, 1h, 1d).
|
||||
A data type that represents a duration of time--for example, `1s`, `1m`, `1h`, `1d`.
|
||||
Retention policies are set using durations.
|
||||
Data older than the duration is automatically dropped from the database.
|
||||
|
||||
|
@ -468,10 +468,10 @@ Related entries:
|
|||
|
||||
In Flux, an implicit block is a possibly empty sequence of statements within matching braces ({ }) that includes the following types:
|
||||
|
||||
- Universe: Encompasses all Flux source text.
|
||||
- Package: Each package includes a package block that contains Flux source text for the package.
|
||||
- File: Each file has a file block containing Flux source text in the file.
|
||||
- Function: Each function literal has a function block with Flux source text (even if not explicitly declared).
|
||||
- Universe: Encompasses all Flux source text.
|
||||
- Package: Each package includes a package block that contains Flux source text for the package.
|
||||
- File: Each file has a file block containing Flux source text in the file.
|
||||
- Function: Each function literal has a function block with Flux source text (even if not explicitly declared).
|
||||
|
||||
Related entries: [explicit block](#explicit-block), [block](#block)
|
||||
|
||||
|
@ -485,7 +485,7 @@ Related entries: [explicit block](#explicit-block), [block](#block)
|
|||
|
||||
### InfluxDB
|
||||
|
||||
An open-source time series database (TSDB) developed by InfluxData.
|
||||
An open source time series database (TSDB) developed by InfluxData.
|
||||
Written in Go and optimized for fast, high-availability storage and retrieval of time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics.
|
||||
|
||||
### InfluxDB UI
|
||||
|
@ -712,7 +712,6 @@ Learn about the [option assignment](/flux/v0/spec/assignment-scope/#option-assig
|
|||
A workspace for a group of users.
|
||||
All dashboards, tasks, buckets, members, and so on, belong to an organization.
|
||||
|
||||
|
||||
### owner
|
||||
|
||||
A type of role for a user.
|
||||
|
@ -720,7 +719,7 @@ Owners have read/write permissions.
|
|||
Users can have owner roles for bucket and organization resources.
|
||||
|
||||
Role permissions are separate from API token permissions. For additional
|
||||
information on API tokens, see [token](#tokens).
|
||||
information on API tokens, see [token](#token).
|
||||
|
||||
### output plugin
|
||||
|
||||
|
@ -794,6 +793,7 @@ A Flux predicate function is an anonymous function that returns `true` or `false
|
|||
based on one or more [predicate expressions](#predicate-expression).
|
||||
|
||||
###### Example predicate function
|
||||
|
||||
```js
|
||||
(r) => r.foo == "bar" and r.baz != "quz"
|
||||
```
|
||||
|
@ -954,6 +954,7 @@ The series cardinality would remain unchanged at `6`, as `firstname` is already
|
|||
| cliff@influxdata.com | finish | clifford |
|
||||
|
||||
##### Query for cardinality:
|
||||
|
||||
- **Flux:** [influxdb.cardinality()](/flux/v0/stdlib/influxdb/cardinality/)
|
||||
- **InfluxQL:** [SHOW CARDINALITY](/influxdb/v1/query_language/spec/#show-cardinality)
|
||||
|
||||
|
@ -1000,7 +1001,7 @@ A shard belongs to a single [shard group](#shard-group).
|
|||
|
||||
For more information, see [Shards and shard groups (OSS)](/influxdb/v2/reference/internals/shards/).
|
||||
|
||||
Related entries: [series](#series), [shard duration](#shard-duration),
|
||||
Related entries: [series](#series), [shard group duration](#shard-group-duration),
|
||||
[shard group](#shard-group), [tsm](#tsm-time-structured-merge-tree)
|
||||
|
||||
### shard group
|
||||
|
@ -1013,7 +1014,7 @@ The interval spanned by each shard group is the [shard group duration](#shard-gr
|
|||
For more information, see [Shards and shard groups (OSS)](/influxdb/v2/reference/internals/shards/).
|
||||
|
||||
Related entries: [bucket](#bucket), [retention period](#retention-period),
|
||||
[series](#series), [shard](#shard), [shard duration](#shard-duration)
|
||||
[series](#series), [shard](#shard), [shard group duration](#shard-group-duration)
|
||||
|
||||
### shard group duration
|
||||
|
||||
|
@ -1254,7 +1255,7 @@ Users are added as a member of an organization and are given a unique API token.
|
|||
|
||||
### values per second
|
||||
|
||||
The preferred measurement of the rate at which data are persisted to InfluxDB.
|
||||
The preferred measurement of the rate at which data is persisted to InfluxDB.
|
||||
Write speeds are generally quoted in values per second.
|
||||
|
||||
To calculate the values per second rate, multiply the number of points written
|
||||
|
@ -1264,7 +1265,6 @@ written 10 times per second, the values per second rate is:
|
|||
|
||||
**4 field values per point** × **5000 points per batch** × **10 batches per second** = **200,000 values per second**
|
||||
|
||||
|
||||
Related entries: [batch](#batch), [field](#field), [point](#point)
|
||||
|
||||
### variable
|
||||
|
|
|
@ -0,0 +1,34 @@
|
|||
<div id="query-plan-diagram">
|
||||
<div class="plan-single-column">
|
||||
<div class="plan-column">
|
||||
<div class="plan-block">
|
||||
<code>SortPreservingMergeExec</code>
|
||||
</div>
|
||||
<div class="plan-arrow"></div>
|
||||
<div class="plan-block">
|
||||
<code>UnionExec</code>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
<div class="plan-arrow split"></div>
|
||||
<div class="plan-double-column">
|
||||
<div class="plan-column">
|
||||
<div class="plan-block">
|
||||
<code>SortExec</code>
|
||||
</div>
|
||||
<div class="plan-arrow"></div>
|
||||
<div class="plan-block">
|
||||
<code>ParquetExec</code>
|
||||
</div>
|
||||
</div>
|
||||
<div class="plan-column">
|
||||
<div class="plan-block">
|
||||
<code>SortExec</code>
|
||||
</div>
|
||||
<div class="plan-arrow"></div>
|
||||
<div class="plan-block">
|
||||
<code>ParquetExec</code>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
|
@ -5,6 +5,7 @@
|
|||
"description": "InfluxDB documentation",
|
||||
"license": "MIT",
|
||||
"devDependencies": {
|
||||
"@vvago/vale": "^3.0.7",
|
||||
"autoprefixer": ">=10.2.5",
|
||||
"hugo-extended": ">=0.101.0",
|
||||
"postcss": ">=8.4.31",
|
||||
|
|
Binary file not shown.
After Width: | Height: | Size: 342 KiB |
|
@ -17,18 +17,14 @@ RUN apt-get update && apt-get upgrade -y && apt-get install -y \
|
|||
RUN apt-get install -y \
|
||||
python3 \
|
||||
python3-pip \
|
||||
&& rm -rf /var/lib/apt/lists/*
|
||||
python3-venv
|
||||
|
||||
RUN ln -s /usr/bin/python3 /usr/bin/python
|
||||
|
||||
WORKDIR /usr/src/app
|
||||
|
||||
ARG SOURCE_DIR
|
||||
|
||||
COPY test ./test
|
||||
COPY data ./test/data
|
||||
|
||||
RUN chmod -R 755 .
|
||||
# Create a virtual environment for Python to avoid conflicts with the system Python and having to use the --break-system-packages flag when installing packages with pip.
|
||||
RUN python -m venv /opt/venv
|
||||
# Enable venv
|
||||
ENV PATH="/opt/venv/bin:$PATH"
|
||||
|
||||
# Prevents Python from writing pyc files.
|
||||
ENV PYTHONDONTWRITEBYTECODE=1
|
||||
|
@ -36,11 +32,6 @@ ENV PYTHONDONTWRITEBYTECODE=1
|
|||
# the application crashes without emitting any logs due to buffering.
|
||||
ENV PYTHONUNBUFFERED=1
|
||||
|
||||
WORKDIR /usr/src/app/test
|
||||
|
||||
COPY test/run-tests.sh /usr/local/bin/run-tests.sh
|
||||
RUN chmod +x /usr/local/bin/run-tests.sh
|
||||
|
||||
# Some Python test dependencies (pytest-dotenv and pytest-codeblocks) aren't
|
||||
# available as packages in apt-cache, so use pip to download dependencies in a # separate step and use Docker's caching.
|
||||
# Leverage a cache mount to /root/.cache/pip to speed up subsequent builds.
|
||||
|
@ -48,38 +39,54 @@ RUN chmod +x /usr/local/bin/run-tests.sh
|
|||
# this layer.
|
||||
RUN --mount=type=cache,target=/root/.cache/pip \
|
||||
--mount=type=bind,source=test/requirements.txt,target=./requirements.txt \
|
||||
python -m pip install --break-system-packages -r ./requirements.txt
|
||||
pip install -Ur ./requirements.txt
|
||||
|
||||
# RUN --mount=type=cache,target=/root/.cache/node_modules \
|
||||
# --mount=type=bind,source=package.json,target=package.json \
|
||||
# npm install
|
||||
|
||||
# Copy docs test directory to the image.
|
||||
WORKDIR /usr/src/app
|
||||
|
||||
RUN chmod -R 755 .
|
||||
|
||||
ARG SOURCE_DIR
|
||||
|
||||
COPY test ./test
|
||||
COPY data ./test/data
|
||||
|
||||
WORKDIR /usr/src/app/test
|
||||
|
||||
COPY test/run-tests.sh /usr/local/bin/run-tests.sh
|
||||
RUN chmod +x /usr/local/bin/run-tests.sh
|
||||
|
||||
# Install parse_yaml.sh and parse YAML config files into dotenv files to be used by tests.
|
||||
RUN /bin/bash -c 'curl -sO https://raw.githubusercontent.com/mrbaseman/parse_yaml/master/src/parse_yaml.sh'
|
||||
RUN /bin/bash -c 'source ./parse_yaml.sh && parse_yaml ./data/products.yml > .env.products'
|
||||
|
||||
# Install Telegraf for use in tests.
|
||||
# Follow the install instructions (https://docs.influxdata.com/telegraf/v1/install/?t=curl), except for sudo (which isn't available in Docker).
|
||||
# influxdata-archive_compat.key GPG Fingerprint: 9D539D90D3328DC7D6C8D3B9D8FF8E1F7DF8B07E
|
||||
RUN curl -s https://repos.influxdata.com/influxdata-archive.key > influxdata-archive.key \
|
||||
&& \
|
||||
echo '943666881a1b8d9b849b74caebf02d3465d6beb716510d86a39f6c8e8dac7515 influxdata-archive.key' | sha256sum -c && cat influxdata-archive.key | gpg --dearmor | tee /etc/apt/trusted.gpg.d/influxdata-archive.gpg > /dev/null \
|
||||
&& \
|
||||
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive.gpg] https://repos.influxdata.com/debian stable main' | tee /etc/apt/sources.list.d/influxdata.list \
|
||||
&& \
|
||||
apt-get update && apt-get install telegraf
|
||||
# influxdata-archive_compat.key GPG fingerprint:
|
||||
# 9D53 9D90 D332 8DC7 D6C8 D3B9 D8FF 8E1F 7DF8 B07E
|
||||
RUN wget -q https://repos.influxdata.com/influxdata-archive_compat.key
|
||||
|
||||
RUN echo '393e8779c89ac8d958f81f942f9ad7fb82a25e133faddaf92e15b16e6ac9ce4c influxdata-archive_compat.key' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null
|
||||
|
||||
RUN echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | tee /etc/apt/sources.list.d/influxdata.list
|
||||
|
||||
RUN apt-get update && apt-get install telegraf
|
||||
|
||||
# Install influx v2 Cloud CLI for use in tests.
|
||||
# Follow the install instructions(https://portal.influxdata.com/downloads/), except for sudo (which isn't available in Docker).
|
||||
# influxdata-archive_compat.key GPG fingerprint:
|
||||
# 9D53 9D90 D332 8DC7 D6C8 D3B9 D8FF 8E1F 7DF8 B07E
|
||||
RUN wget -q https://repos.influxdata.com/influxdata-archive_compat.key \
|
||||
&& \
|
||||
echo '393e8779c89ac8d958f81f942f9ad7fb82a25e133faddaf92e15b16e6ac9ce4c influxdata-archive_compat.key' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null \
|
||||
&& \
|
||||
echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | tee /etc/apt/sources.list.d/influxdata.list \
|
||||
&& \
|
||||
apt-get update && apt-get install influxdb2-cli
|
||||
RUN wget -q https://repos.influxdata.com/influxdata-archive_compat.key
|
||||
|
||||
RUN echo '393e8779c89ac8d958f81f942f9ad7fb82a25e133faddaf92e15b16e6ac9ce4c influxdata-archive_compat.key' | sha256sum -c && cat influxdata-archive_compat.key | gpg --dearmor | tee /etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg > /dev/null
|
||||
|
||||
RUN echo 'deb [signed-by=/etc/apt/trusted.gpg.d/influxdata-archive_compat.gpg] https://repos.influxdata.com/debian stable main' | tee /etc/apt/sources.list.d/influxdata.list
|
||||
|
||||
RUN apt-get update && apt-get install influxdb2-cli
|
||||
|
||||
ENV TEMP_DIR=/usr/src/app/test/tmp
|
||||
ENTRYPOINT [ "run-tests.sh" ]
|
||||
|
|
2
test.sh
2
test.sh
|
@ -52,7 +52,7 @@ docker compose up test
|
|||
# If you want to examine files or run commands for debugging tests,
|
||||
# start the container and use `exec` to open an interactive shell--for example:
|
||||
|
||||
# docker compose start test && docker compose exec -it test /bin/bash
|
||||
# docker compose run -it --entrypoint=/bin/bash test
|
||||
|
||||
# To build and run a new container and debug test failures, use `docker compose run` which runs a one-off command in a new container. Pass additional flags to be used by the container's entrypoint and the test runners it executes--for example:
|
||||
|
||||
|
|
|
@ -62,6 +62,8 @@ _Note_: `pytest --codeblocks` uses Python's `subprocess.run()` to execute shell
|
|||
|
||||
To assert (and display) the expected output of your code, follow the code block with the `<!--pytest-codeblocks:expected-output-->` comment tag, and then the expected output in a code block--for example:
|
||||
|
||||
<!-- Your Markdown content -->
|
||||
|
||||
```python
|
||||
print("Hello, world!")
|
||||
```
|
||||
|
@ -74,6 +76,8 @@ If successful, the output is the following:
|
|||
Hello, world!
|
||||
```
|
||||
|
||||
<!-- End Markdown content -->
|
||||
|
||||
pytest-codeblocks has features for skipping tests and marking blocks as failed.
|
||||
To learn more, see the pytest-codeblocks README and tests.
|
||||
|
||||
|
|
|
@ -5,8 +5,8 @@ pytest-codeblocks==0.16.1
|
|||
python-dotenv==1.0.0
|
||||
pytest-dotenv==0.5.2
|
||||
# Code sample dependencies
|
||||
## TODO: install these using virtual environments in the docs and remove from here.
|
||||
influxdb3-python
|
||||
influxdb3-python-cli
|
||||
pandas
|
||||
## Tabulate for printing pandas DataFrames.
|
||||
tabulate
|
|
@ -1,5 +1,8 @@
|
|||
#!/bin/bash
|
||||
|
||||
# This script is used to run tests for the InfluxDB documentation.
|
||||
# The script is designed to be run in a Docker container. It is used to substitute placeholder values.
|
||||
|
||||
# Function to check if an option is present in the arguments
|
||||
has_option() {
|
||||
local target="$1"
|
||||
|
@ -26,7 +29,7 @@ fi
|
|||
BASE_DIR=$(pwd)
|
||||
cd $TEMP_DIR
|
||||
|
||||
for file in `find . -type f` ; do
|
||||
for file in `find . -type f \( -iname '*.md' \)` ; do
|
||||
if [ -f "$file" ]; then
|
||||
echo "PRETEST: substituting values in $file"
|
||||
|
||||
|
@ -93,6 +96,9 @@ mkdir -p ~/Downloads && rm -rf ~/Downloads/*
|
|||
# Clean up installed files from previous runs.
|
||||
gpg -q --batch --yes --delete-key D8FF8E1F7DF8B07E > /dev/null 2>&1
|
||||
|
||||
# Activate the Python virtual environment configured in the Dockerfile.
|
||||
. /opt/venv/bin/activate
|
||||
|
||||
# Run test commands with options provided in the CMD of the Dockerfile.
|
||||
# pytest rootdir is the directory where pytest.ini is located (/test).
|
||||
if [ -d ./content/influxdb/cloud-dedicated/ ]; then
|
||||
|
|
Loading…
Reference in New Issue