4806-document-different-ways-to-execute-queries-against-iox (#4850)
* chore(refactor): refactor the SQL schema intro into a partial shortcode for reuse. * fix(cloud-iox): #4806 Setup and query Grafana Flight SQL - Help configuring a Homebrew installed Grafana - Add some query help - Add note about schema elements. - Warn about functions not working. - Screenshot of query builder. * fix(cloud-iox): Grafana frontmatter * fix(cloud-iox): Remove incorrect note about aggregate function support. Clarify required time column and the use of aggregations. * wip: pandas.md * wip: python.md * chore(cloud-iox): update frontmatter for Execute queries * feature(cloud-iox): use Python and Flight SQL to query, pandas and pyarrow to analyze: - Adds /tools to Query Data - Adds using Python with flightsql-dbapi to query data - Adds starter for using PyArrow to analyze data - Adds starter for using pandas to analyze data * chore(cloud-iox): Move pages from Visualize data into Query Data > Tools - Move Grafana - Move Superset - #4806 * Update content/influxdb/cloud-iox/query-data/tools/grafana.md * Apply suggestions from code review @sanderson Thanks for the review! Sorry for all the whitespace fixes. Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com> * fix(cloud-iox): infinite cardinality (#4851) (#4853) * fix(cloud-iox): pyarrow examples. * fix(cloud-iox): wip - python examples. * fix(cloud-iox): Python flightsql-dbapi, pandas, pyarrow guides - part of #4806 - Add list code example - Add code comments - Fix whitespace - Fix description - add related - add steps - fix frontmatter - add comments - cleanup example * Update content/influxdb/cloud-iox/query-data/execute-queries/flight-sql/python.md * Update content/influxdb/cloud-iox/query-data/tools/grafana.md * fix(cloud-iox): #4806 Grafana instructions * fix(cloud-iox): #4806 pandas instructions --------- Co-authored-by: Scott Anderson <sanderson@users.noreply.github.com>pull/4858/head
parent
f57adc4118
commit
07650896dc
|
@ -1,7 +1,7 @@
|
|||
---
|
||||
title: Execute queries
|
||||
seotitle: Different ways to query InfluxDB
|
||||
description: There are multiple ways to query data from InfluxDB including the InfluxDB UI, CLI, and API.
|
||||
seotitle: Execute queries for data stored in an InfluxDB bucket powered by IOx
|
||||
description: Use tools and libraries to query data stored in an InfluxDB bucket powered by IOx.
|
||||
weight: 103
|
||||
menu:
|
||||
influxdb_cloud_iox:
|
||||
|
|
|
@ -0,0 +1,298 @@
|
|||
---
|
||||
title: Use Python and the Flight SQL library to query data
|
||||
description: >
|
||||
Use Python and the `flightsql-dbapi` Flight SQL library to query data
|
||||
stored in a bucket powered by InfluxDB IOx.
|
||||
weight: 101
|
||||
menu:
|
||||
influxdb_cloud_iox:
|
||||
parent: Query with Flight SQL
|
||||
name: Use Python
|
||||
identifier: query_with_python
|
||||
influxdb/cloud-iox/tags: [query, flightsql, python]
|
||||
related:
|
||||
- /influxdb/cloud-iox/query-data/tools/pandas/
|
||||
- /influxdb/cloud-iox/query-data/tools/pyarrow/
|
||||
- /influxdb/cloud-iox/query-data/sql/
|
||||
list_code_example: |
|
||||
```py
|
||||
from flightsql import FlightSQLClient
|
||||
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'bucket-name': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
|
||||
info = client.execute("SELECT * FROM home")
|
||||
|
||||
ticket = info.endpoints[0].ticket
|
||||
|
||||
reader = client.do_get(ticket)
|
||||
```
|
||||
---
|
||||
|
||||
Use Python and the Flight SQL library to query data stored in a bucket powered by InfluxDB IOx.
|
||||
|
||||
- [Get started using Python to query InfluxDB](#get-started-using-python-to-query-influxdb)
|
||||
- [Create a Python virtual environment](#create-a-python-virtual-environment)
|
||||
- [Install Python](#install-python)
|
||||
- [Create a project virtual environment](#create-a-project-virtual-environment)
|
||||
- [Install Anaconda](#install-anaconda)
|
||||
- [Query InfluxDB using Flight SQL](#query-influxdb-using-flight-sql)
|
||||
- [Install the Flight SQL Python Library](#install-the-flight-sql-python-library)
|
||||
- [Create a query client](#create-a-query-client)
|
||||
- [Execute a query](#execute-a-query)
|
||||
- [Retrieve data for Flight SQL query results](#retrieve-data-for-flight-sql-query-results)
|
||||
|
||||
## Get started using Python to query InfluxDB
|
||||
|
||||
This guide follows the recommended practice of using Python _virtual environments_.
|
||||
If you don't want to use virtual environments and you have Python installed,
|
||||
continue to [Query InfluxDB using Flight SQL](#query-influxdb-using-flight-sql).
|
||||
|
||||
## Create a Python virtual environment
|
||||
|
||||
Python [virtual environments](https://docs.python.org/3/library/venv.html) keep the Python interpreter and dependencies for your project self-contained and isolated from other projects.
|
||||
|
||||
To install Python and create a virtual environment, choose one of the following options:
|
||||
|
||||
- [Python venv](?t=venv#venv-install): The [`venv` module](https://docs.python.org/3/library/venv.html) comes standard in Python as of version 3.5.
|
||||
- [Anaconda® Distribution](?t=Anaconda#conda-install): A Python/R data science distribution that provides Python and the **conda** package and environment manager.
|
||||
|
||||
{{< code-tabs-wrapper >}}
|
||||
{{% code-tabs %}}
|
||||
[venv](#venv-install)
|
||||
[Anaconda](#conda-install)
|
||||
{{% /code-tabs %}}
|
||||
{{% code-tab-content %}}
|
||||
<!-- Begin venv -->
|
||||
|
||||
### Install Python
|
||||
|
||||
1. Follow the [Python installation instructions](https://wiki.python.org/moin/BeginnersGuide/Download)
|
||||
to install a recent version of the Python programming language for your system.
|
||||
2. Check that you can run `python` and `pip` commands.
|
||||
`pip` is a package manager included in most Python distributions.
|
||||
|
||||
In your terminal, enter the following commands:
|
||||
|
||||
```sh
|
||||
python --version
|
||||
```
|
||||
|
||||
```sh
|
||||
pip --version
|
||||
```
|
||||
|
||||
Depending on your system, you may need to use version-specific commands--for example.
|
||||
|
||||
```sh
|
||||
python3 --version
|
||||
```
|
||||
|
||||
```sh
|
||||
pip3 --version
|
||||
```
|
||||
|
||||
If neither `pip` nor `pip<PYTHON_VERSION>` works, follow one of the [Pypa.io Pip installation](https://pip.pypa.io/en/stable/installation/) methods for your system.
|
||||
|
||||
### Create a project virtual environment
|
||||
|
||||
1. Create a directory for your Python project and change to the new directory--for example:
|
||||
|
||||
```sh
|
||||
mkdir ./PROJECT_DIRECTORY && cd $_
|
||||
```
|
||||
|
||||
2. Use the Python `venv` module to create a virtual environment--for example:
|
||||
|
||||
```sh
|
||||
python -m venv envs/virtualenv-1
|
||||
```
|
||||
|
||||
`venv` creates the new virtual environment directory in your project.
|
||||
|
||||
3. To activate the new virtual environment in your terminal, run the `source` command and pass the file path of the virtual environment `activate` script:
|
||||
|
||||
```sh
|
||||
source envs/VIRTUAL_ENVIRONMENT_NAME/bin/activate
|
||||
```
|
||||
|
||||
For example:
|
||||
|
||||
```sh
|
||||
source envs/virtualenv-1/bin/activate
|
||||
```
|
||||
<!-- End venv -->
|
||||
{{% /code-tab-content %}}
|
||||
{{% code-tab-content %}}
|
||||
<!-- Begin conda -->
|
||||
|
||||
### Install Anaconda
|
||||
|
||||
1. Follow the [Anaconda installation instructions](https://docs.continuum.io/anaconda/install/) for your system.
|
||||
2. Check that you can run the `conda` command:
|
||||
|
||||
```sh
|
||||
conda
|
||||
```
|
||||
|
||||
3. Use `conda` to create a virtual environment--for example:
|
||||
|
||||
```sh
|
||||
conda create --prefix envs/virtualenv-1
|
||||
```
|
||||
|
||||
`conda` creates a virtual environment in a directory named `./envs/virtualenv-1`.
|
||||
|
||||
4. To activate the new virtual environment, use the `conda activate` command and pass the directory path of the virtual environment:
|
||||
|
||||
```sh
|
||||
conda activate envs/VIRTUAL_ENVIRONMENT_NAME
|
||||
```
|
||||
|
||||
For example:
|
||||
|
||||
```sh
|
||||
conda activate ./envs/virtualenv-1
|
||||
```
|
||||
{{% /code-tab-content %}}
|
||||
{{< /code-tabs-wrapper >}}
|
||||
|
||||
When a virtual environment is activated, the name displays at the beginning of your terminal command line--for example:
|
||||
{{% code-callout "(virtualenv-1)"%}}
|
||||
```sh
|
||||
(virtualenv-1) $ PROJECT_DIRECTORY
|
||||
```
|
||||
{{% /code-callout %}}
|
||||
|
||||
## Query InfluxDB using Flight SQL
|
||||
|
||||
1. [Install the Flight SQL Python Library](#install-the-flight-sql-python-library)
|
||||
2. [Create a query client](#create-a-query-client)
|
||||
3. [Execute a query](#execute-a-query)
|
||||
|
||||
### Install the Flight SQL Python Library
|
||||
|
||||
The [`flightsql-dbapi`](https://github.com/influxdata/flightsql-dbapi) Flight SQL library for Python provides a
|
||||
[DB API 2](https://peps.python.org/pep-0249/) interface and
|
||||
[SQLAlchemy](https://www.sqlalchemy.org/) dialect for
|
||||
[Flight SQL](https://arrow.apache.org/docs/format/FlightSql.html).
|
||||
Installing `flightsql-dbapi` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that you'll use for working with Arrow data.
|
||||
|
||||
In your terminal, use `pip` to install `flightsql-dbapi`:
|
||||
|
||||
```sh
|
||||
pip install flightsql-dbapi
|
||||
```
|
||||
|
||||
With `flightsql-dbapi` and `pyarrow` installed, you're ready to query and analyze data stored in an InfluxDB bucket.
|
||||
|
||||
### Create a query client
|
||||
|
||||
The following example shows how to use Python with `flightsql-dbapi`
|
||||
and the _DB API 2_ interface to instantiate a Flight SQL client configured for an InfluxDB bucket.
|
||||
|
||||
1. In your editor, copy and paste the following sample code to a new file--for example, `query-example.py`:
|
||||
|
||||
```py
|
||||
# query-example.py
|
||||
|
||||
from flightsql import FlightSQLClient
|
||||
|
||||
# Instantiate a FlightSQLClient configured for your bucket
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'bucket-name': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
```
|
||||
|
||||
2. Replace the following configuration values:
|
||||
|
||||
- **`INFLUX_READ_WRITE_TOKEN`**: Your InfluxDB token with read permissions on the databases you want to query.
|
||||
- **`INFLUX_BUCKET`**: The name of your InfluxDB bucket.
|
||||
|
||||
### Execute a query
|
||||
|
||||
To execute an SQL query, call the query client's `execute(query)` method and pass the query as a string.
|
||||
|
||||
#### Syntax {#execute-query-syntax}
|
||||
|
||||
```py
|
||||
execute(query: str, call_options: Optional[FlightSQLCallOptions] = None)
|
||||
```
|
||||
|
||||
#### Example {#execute-query-example}
|
||||
|
||||
```py
|
||||
# query-example.py
|
||||
|
||||
from flightsql import FlightSQLClient
|
||||
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'bucket-name': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
|
||||
# Execute the query
|
||||
info = client.execute("SELECT * FROM home")
|
||||
```
|
||||
|
||||
The response contains a `flight.FlightInfo` object that contains metadata and an `endpoints: [...]` list. Each endpoint contains the following:
|
||||
|
||||
- A list of addresses where you can retrieve the data.
|
||||
- A `ticket` value that identifies the data to retrieve.
|
||||
|
||||
Next, use the ticket to [retrieve data for Flight SQL query results](#retrieve-data-for-flight-sql-query-results)
|
||||
|
||||
### Retrieve data for Flight SQL query results
|
||||
|
||||
To retrieve Arrow data for a query result, call the client's `do_get(ticket)` method.
|
||||
|
||||
#### Syntax {#retrieve-data-syntax}
|
||||
|
||||
```py
|
||||
do_get(ticket, call_options: Optional[FlightSQLCallOptions] = None)
|
||||
```
|
||||
|
||||
#### Example {#retrieve-data-example}
|
||||
|
||||
The following sample shows how to use Python with `flightsql-dbapi` and `pyarrow` to query InfluxDB and retrieve data.
|
||||
|
||||
```py
|
||||
# query-example.py
|
||||
|
||||
from flightsql import FlightSQLClient
|
||||
|
||||
# Instantiate a FlightSQLClient configured for a bucket
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'bucket-name': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
|
||||
# Execute the query to retrieve FlightInfo
|
||||
info = client.execute("SELECT * FROM home")
|
||||
|
||||
# Extract the token for retrieving data
|
||||
ticket = info.endpoints[0].ticket
|
||||
|
||||
# Use the ticket to request the Arrow data stream.
|
||||
# Return a FlightStreamReader for streaming the results.
|
||||
reader = client.do_get(ticket)
|
||||
|
||||
# Read all data to a pyarrow.Table
|
||||
table = reader.read_all()
|
||||
```
|
||||
|
||||
`do_get(ticket)` returns a [`pyarrow.flight.FlightStreamReader`](https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightStreamReader.html) for streaming Arrow [record batches](https://arrow.apache.org/docs/python/data.html#record-batches).
|
||||
|
||||
To read data from the stream, call one of the following `FlightStreamReader` methods:
|
||||
|
||||
- `read_all()`: Read all record batches as a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html).
|
||||
- `read_chunk()`: Read the next RecordBatch and metadata.
|
||||
- `read_pandas()`: Read all record batches and convert them to a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/frame.html).
|
||||
|
||||
Next, learn how to use Python tools to work with time series data:
|
||||
|
||||
- [Use PyArrow](/influxdb/cloud-iox/query-data/tools/pyarrow/)
|
||||
- [Use pandas](/influxdb/cloud-iox/query-data/tools/pandas/)
|
|
@ -86,7 +86,7 @@ pip3 --version
|
|||
|
||||
If neither `pip` nor `pip3` works, follow one of the [Pypa.io Pip Installation](https://pip.pypa.io/en/stable/installation/) methods for your system.
|
||||
|
||||
3. Use Pip to install the `flightsql-dbapi` Flight SQL SQL Alchemy library.
|
||||
3. Use Pip to install the `flightsql-dbapi` library.
|
||||
|
||||
{{< code-tabs-wrapper >}}
|
||||
{{% code-tabs %}}
|
||||
|
@ -105,11 +105,11 @@ pip3 install flightsql-dbapi
|
|||
{{% /code-tab-content %}}
|
||||
{{< /code-tabs-wrapper >}}
|
||||
|
||||
Flight SQL SQL Alchemy is a Python library that provides a
|
||||
The `flightsql-dbapi` library for Python provides a
|
||||
[DB API 2](https://peps.python.org/pep-0249/) interface and
|
||||
[SQLAlchemy](https://www.sqlalchemy.org/) dialect for
|
||||
[Flight SQL](https://arrow.apache.org/docs/format/FlightSql.html).
|
||||
Later, you'll add it to Superset's Docker configuration.
|
||||
Later, you'll add `flightsql-dbapi` to Superset's Docker configuration.
|
||||
|
||||
{{% warn %}}
|
||||
The `flightsql-dbapi` library is experimental and under active development.
|
||||
|
|
|
@ -0,0 +1,14 @@
|
|||
---
|
||||
title: Use analysis and visualization tools with InfluxDB Cloud (IOx) APIs
|
||||
description: Use popular tools to analyze and visualize time series data stored in an InfluxDB bucket powered by IOx.
|
||||
weight: 201
|
||||
menu:
|
||||
influxdb_cloud_iox:
|
||||
name: Analyze and visualize data
|
||||
parent: Query data
|
||||
influxdb/cloud-iox/tags: [analysis, visualization, tools]
|
||||
aliases:
|
||||
- /influxdb/cloud-iox/visualize-data/
|
||||
---
|
||||
|
||||
{{< children >}}
|
|
@ -9,8 +9,10 @@ weight: 101
|
|||
menu:
|
||||
influxdb_cloud_iox:
|
||||
name: Use Grafana
|
||||
parent: Visualize data
|
||||
influxdb/cloud-iox/tags: [visualization]
|
||||
parent: Analyze and visualize data
|
||||
influxdb/cloud-iox/tags: [query, visualization]
|
||||
aliases:
|
||||
- /influxdb/cloud-iox/query-data/tools/grafana/
|
||||
alt_engine: /influxdb/cloud/tools/grafana/
|
||||
---
|
||||
|
||||
|
@ -28,11 +30,11 @@ Install the [grafana-flight-sql-plugin](https://github.com/influxdata/grafana-fl
|
|||
<!-- TOC -->
|
||||
|
||||
- [Install Grafana](#install-grafana)
|
||||
- [Download the Grafana Flight SQL Plugin](#download-the-grafana-flight-sql-plugin)
|
||||
- [Download the Grafana Flight SQL plugin](#download-the-grafana-flight-sql-plugin)
|
||||
- [Extract the Flight SQL plugin](#extract-the-flight-sql-plugin)
|
||||
- [Install the Grafana Flight SQL plugin](#install-the-grafana-flight-sql-plugin)
|
||||
- [Install with Docker Run](#install-with-docker-run)
|
||||
- [Install with Docker-Compose](#install-with-docker-compose)
|
||||
- [Install with Docker Run](#install-with-docker-run)
|
||||
- [Install with Docker-Compose](#install-with-docker-compose)
|
||||
- [Configure the Flight SQL datasource](#configure-the-flight-sql-datasource)
|
||||
- [Query InfluxDB with Grafana](#query-influxdb-with-grafana)
|
||||
- [Build visualizations with Grafana](#build-visualizations-with-grafana)
|
||||
|
@ -41,18 +43,19 @@ Install the [grafana-flight-sql-plugin](https://github.com/influxdata/grafana-fl
|
|||
|
||||
## Install Grafana
|
||||
|
||||
Follow [Grafana installations instructions](https://grafana.com/docs/grafana/latest/setup-grafana/installation/)
|
||||
for your operating system to Install Grafana.
|
||||
Follow [Grafana instructions](https://grafana.com/docs/grafana/latest/setup-grafana/installation/)
|
||||
to Install Grafana for your operating system.
|
||||
|
||||
{{% warn %}}
|
||||
Because Grafana Flight SQL Plugin is a custom plugin, you can't use it with Grafana Cloud.
|
||||
For more information, see [Find and Use Plugins in the Grafana Cloud documentation](https://grafana.com/docs/grafana-cloud/fundamentals/find-and-use-plugins/)
|
||||
{{% /warn %}}
|
||||
|
||||
## Download the Grafana Flight SQL plugin
|
||||
|
||||
Download the latest release from [influxdata/grafana-flightsql-datasource releases](https://github.com/influxdata/grafana-flightsql-datasource/releases).
|
||||
|
||||
{{% warn %}}
|
||||
Because Grafana Flight SQL Plugin is a custom plugin, you can't use it with Grafana Cloud.
|
||||
For more information, see [Find and Use Plugins in the Grafana Cloud documentation](https://grafana.com/docs/grafana-cloud/fundamentals/find-and-use-plugins/)
|
||||
|
||||
The Grafana Flight SQL plugin is experimental and subject to change.
|
||||
{{% /warn %}}
|
||||
|
||||
|
@ -78,11 +81,6 @@ unzip influxdata-flightsql-datasource.zip -d /custom/plugins/directory/
|
|||
Install the custom-built Flight SQL plugin in a local or Docker-based instance
|
||||
of Grafana OSS or Grafana Enterprise.
|
||||
|
||||
{{% warn %}}
|
||||
Because Grafana Flight SQL Plugin is a custom plugin, you can't use it with Grafana Cloud.
|
||||
For more information, see [Find and Use Plugins in the Grafana Cloud documentation](https://grafana.com/docs/grafana-cloud/fundamentals/find-and-use-plugins/)
|
||||
{{% /warn %}}
|
||||
|
||||
{{< tabs-wrapper >}}
|
||||
{{% tabs %}}
|
||||
[Local](#)
|
|
@ -0,0 +1,200 @@
|
|||
---
|
||||
title: Use pandas to analyze and visualize data
|
||||
seotitle: Use Python and pandas to analyze and visualize data
|
||||
description: >
|
||||
Use the [pandas](https://pandas.pydata.org/) Python data analysis library
|
||||
to analyze and visualize data stored in a bucket powered by InfluxDB IOx.
|
||||
weight: 101
|
||||
menu:
|
||||
influxdb_cloud_iox:
|
||||
parent: Analyze and visualize data
|
||||
name: Use pandas
|
||||
influxdb/cloud-iox/tags: [analysis, pandas, pyarrow, python, visualization]
|
||||
related:
|
||||
- /influxdb/cloud-iox/query-data/tools/python/
|
||||
- /influxdb/cloud-iox/query-data/tools/pyarrow/
|
||||
- /influxdb/cloud-iox/query-data/sql/
|
||||
list_code_example: |
|
||||
```py
|
||||
...
|
||||
dataframe = reader.read_pandas()
|
||||
dataframe = dataframe.set_index('time')
|
||||
|
||||
print(dataframe.index)
|
||||
|
||||
resample = dataframe.resample("1H")
|
||||
|
||||
resample['temp'].mean()
|
||||
```
|
||||
---
|
||||
|
||||
Use [pandas](https://pandas.pydata.org/), the Python data analysis library, to process, analyze, and visualize data
|
||||
stored in an InfluxDB bucket powered by InfluxDB IOx.
|
||||
|
||||
> **pandas** is an open source, BSD-licensed library providing high-performance,
|
||||
> easy-to-use data structures and data analysis tools for the Python programming language.
|
||||
>
|
||||
> {{% caption %}}[pandas documentation](https://pandas.pydata.org/docs/){{% /caption %}}
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [Install prerequisites](#install-prerequisites)
|
||||
- [Install pandas](#install-pandas)
|
||||
- [Use PyArrow to convert query results to pandas](#use-pyarrow-to-convert-query-results-to-pandas)
|
||||
- [Use pandas to analyze data](#use-pandas-to-analyze-data)
|
||||
- [View data information and statistics](#view-data-information-and-statistics)
|
||||
- [Downsample time series](#downsample-time-series)
|
||||
|
||||
<!-- /TOC -->
|
||||
|
||||
## Install prerequisites
|
||||
|
||||
The examples in this guide assume using a Python virtual environment and the Flight SQL library for Python.
|
||||
Installing `flightsql-dbapi` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
|
||||
|
||||
For more information, see how to [get started querying InfluxDB with Python and flightsql-dbapi](/influxdb/cloud-iox/query-data/execute-queries/flight-sql/python/)
|
||||
|
||||
## Install pandas
|
||||
|
||||
To use pandas, you need to install and import the `pandas` library.
|
||||
|
||||
In your terminal, use `pip` to install `pandas` in your active [Python virtual environment](/influxdb/cloud-iox/query-data/execute-queries/flight-sql/python/#create-a-project-virtual-environment):
|
||||
|
||||
```sh
|
||||
pip install pandas
|
||||
```
|
||||
|
||||
## Use PyArrow to convert query results to pandas
|
||||
|
||||
The following steps use Python, `flightsql-dbapi`, and `pyarrow` to query InfluxDB and stream Arrow data to a pandas `DataFrame`.
|
||||
|
||||
1. In your editor, copy and paste the following code to a new file--for example, `pandas-example.py`:
|
||||
|
||||
```py
|
||||
# pandas-example.py
|
||||
|
||||
from flightsql import FlightSQLClient
|
||||
import pandas
|
||||
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'bucket-name': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
|
||||
info = client.execute("SELECT * FROM home")
|
||||
|
||||
reader = client.do_get(info.endpoints[0].ticket)
|
||||
|
||||
# Read all record batches in the stream to a pandas DataFrame
|
||||
dataframe = reader.read_pandas()
|
||||
|
||||
dataframe.info()
|
||||
```
|
||||
|
||||
2. Replace the following configuration values:
|
||||
|
||||
- **`INFLUX_READ_WRITE_TOKEN`**: Your InfluxDB token with read permissions on the databases you want to query.
|
||||
- **`INFLUX_BUCKET`**: The name of your InfluxDB bucket.
|
||||
|
||||
3. In your terminal, use the Python interpreter to run the file:
|
||||
|
||||
```sh
|
||||
python pandas-example.py
|
||||
```
|
||||
|
||||
The `pyarrow.flight.FlightStreamReader` [`read_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightStreamReader.html#pyarrow.flight.FlightStreamReader.read_pandas) method:
|
||||
|
||||
- Takes the same options as [`pyarrow.Table.to_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas).
|
||||
- Reads all Arrow record batches in the stream to a `pyarrow.Table` and then converts the `Table` to a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame).
|
||||
|
||||
Next, [use pandas to analyze data](#use-pandas-to-analyze-data).
|
||||
|
||||
## Use pandas to analyze data
|
||||
|
||||
- [View information and statistics for data](#view-information-and-statistics-for-data)
|
||||
- [Downsample time series](#downsample-time-series)
|
||||
|
||||
### View data information and statistics
|
||||
|
||||
The following example uses the DataFrame `info()` and `describe()`
|
||||
methods to print information about the DataFrame.
|
||||
|
||||
```py
|
||||
# pandas-example.py
|
||||
|
||||
from flightsql import FlightSQLClient
|
||||
import pandas
|
||||
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'bucket-name': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
|
||||
info = client.execute("SELECT * FROM home")
|
||||
|
||||
reader = client.do_get(info.endpoints[0].ticket)
|
||||
|
||||
dataframe = reader.read_pandas()
|
||||
|
||||
# Print a summary of the DataFrame to stdout
|
||||
dataframe.info()
|
||||
|
||||
# Calculate summary statistics for the data
|
||||
print(dataframe.describe())
|
||||
```
|
||||
|
||||
### Downsample time series
|
||||
|
||||
The pandas library provides extensive features for working with time series data.
|
||||
|
||||
The [`pandas.DataFrame.resample()` method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html) downsamples and upsamples data to time-based groups--for example:
|
||||
|
||||
```py
|
||||
from flightsql import FlightSQLClient
|
||||
import pandas
|
||||
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'bucket-name': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
|
||||
info = client.execute("SELECT * FROM home")
|
||||
|
||||
reader = client.do_get(info.endpoints[0].ticket)
|
||||
|
||||
dataframe = reader.read_pandas()
|
||||
|
||||
# Use the `time` column to generate a DatetimeIndex for the DataFrame
|
||||
dataframe = dataframe.set_index('time')
|
||||
|
||||
# Print information about the index
|
||||
print(dataframe.index)
|
||||
|
||||
# Downsample data into 5-minute groups based on the DatetimeIndex
|
||||
resample = dataframe.resample("1H")
|
||||
|
||||
# Print a summary that shows the start time and average temp for each group
|
||||
print(resample['temp'].mean())
|
||||
```
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View example results" %}}
|
||||
```sh
|
||||
time
|
||||
1970-01-01 00:00:00 22.374138
|
||||
1970-01-01 01:00:00 NaN
|
||||
1970-01-01 02:00:00 NaN
|
||||
1970-01-01 03:00:00 NaN
|
||||
1970-01-01 04:00:00 NaN
|
||||
...
|
||||
2023-07-16 22:00:00 NaN
|
||||
2023-07-16 23:00:00 22.600000
|
||||
2023-07-17 00:00:00 22.513889
|
||||
2023-07-17 01:00:00 22.208333
|
||||
2023-07-17 02:00:00 22.300000
|
||||
Freq: H, Name: temp, Length: 469323, dtype: float64
|
||||
```
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
For more detail and examples, see the [pandas documentation](https://pandas.pydata.org/docs/index.html).
|
|
@ -0,0 +1,136 @@
|
|||
---
|
||||
title: Use the PyArrow library to analyze data
|
||||
description: >
|
||||
Use [PyArrow](https://arrow.apache.org/docs/python/) to read and analyze InfluxDB query results from a bucket powered by InfluxDB IOx.
|
||||
weight: 101
|
||||
menu:
|
||||
influxdb_cloud_iox:
|
||||
parent: Analyze and visualize data
|
||||
name: Use PyArrow
|
||||
influxdb/cloud-iox/tags: [analysis, arrow, pyarrow, python]
|
||||
related:
|
||||
- /influxdb/cloud-iox/query-data/tools/pandas/
|
||||
- /influxdb/cloud-iox/query-data/tools/pyarrow/
|
||||
- /influxdb/cloud-iox/query-data/sql/
|
||||
list_code_example: |
|
||||
```py
|
||||
...
|
||||
table = reader.read_all()
|
||||
table.group_by('room').aggregate([('temp', 'mean')])
|
||||
```
|
||||
---
|
||||
|
||||
Use [PyArrow](https://arrow.apache.org/docs/python/) to read and analyze query results
|
||||
from an InfluxDB bucket powered by InfluxDB IOx.
|
||||
The PyArrow library provides efficient computation, aggregation, serialization, and conversion of Arrow format data.
|
||||
|
||||
> Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable
|
||||
> big data systems to store, process and move data fast.
|
||||
>
|
||||
> The Arrow Python bindings (also named “PyArrow”) have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.
|
||||
> {{% caption %}}[PyArrow documentation](https://arrow.apache.org/docs/python/index.html){{% /caption %}}
|
||||
|
||||
<!-- TOC -->
|
||||
|
||||
- [Install prerequisites](#install-prerequisites)
|
||||
- [Use PyArrow to read query results](#use-pyarrow-to-read-query-results)
|
||||
- [Use PyArrow to analyze data](#use-pyarrow-to-analyze-data)
|
||||
- [Group and aggregate data](#group-and-aggregate-data)
|
||||
|
||||
<!-- /TOC -->
|
||||
|
||||
## Install prerequisites
|
||||
|
||||
The examples in this guide assume using a Python virtual environment and the Flight SQL library for Python.
|
||||
For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-iox/query-data/execute-queries/flight-sql/python/)
|
||||
|
||||
Installing `flightsql-dbapi` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
|
||||
|
||||
## Use PyArrow to read query results
|
||||
|
||||
The following example shows how to use Python with `flightsql-dbapi` and `pyarrow` to query InfluxDB and view Arrow data as a PyArrow `Table`.
|
||||
|
||||
1. In your editor, copy and paste the following sample code to a new file--for example, `pyarrow-example.py`:
|
||||
|
||||
```py
|
||||
# pyarrow-example.py
|
||||
|
||||
from flightsql import FlightSQLClient
|
||||
|
||||
# Instantiate a FlightSQLClient configured for a bucket
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'bucket-name': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
|
||||
# Execute the query to retrieve FlightInfo
|
||||
info = client.execute('SELECT * FROM home')
|
||||
|
||||
# Use the ticket to request the Arrow data stream.
|
||||
# Return a FlightStreamReader for streaming the results.
|
||||
reader = client.do_get(info.endpoints[0].ticket)
|
||||
|
||||
# Read all data to a pyarrow.Table
|
||||
table = reader.read_all()
|
||||
|
||||
print(table)
|
||||
```
|
||||
|
||||
2. Replace the following configuration values:
|
||||
|
||||
- **`INFLUX_READ_WRITE_TOKEN`**: Your InfluxDB token with read permissions on the databases you want to query.
|
||||
- **`INFLUX_BUCKET`**: The name of your InfluxDB bucket.
|
||||
|
||||
|
||||
3. In your terminal, use the Python interpreter to run the file:
|
||||
|
||||
```sh
|
||||
python pyarrow-example.py
|
||||
```
|
||||
|
||||
The `FlightStreamReader.read_all()` method reads all Arrow record batches in the stream as a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html).
|
||||
|
||||
Next, [use PyArrow to analyze data](#use-pyarrow-to-analyze-data).
|
||||
|
||||
## Use PyArrow to analyze data
|
||||
|
||||
### Group and aggregate data
|
||||
|
||||
With a `pyarrow.Table`, you can use values in a column as _keys_ for grouping.
|
||||
|
||||
The following example shows how to query InfluxDB, group the table data, and then calculate an aggregate value for each group:
|
||||
|
||||
```py
|
||||
# pyarrow-example.py
|
||||
|
||||
from flightsql import FlightSQLClient
|
||||
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'bucket-name': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
|
||||
info = client.execute('SELECT * FROM home')
|
||||
|
||||
reader = client.do_get(info.endpoints[0].ticket)
|
||||
|
||||
table = reader.read_all()
|
||||
|
||||
# Use PyArrow to aggregate data
|
||||
print(table.group_by('room').aggregate([('temp', 'mean')]))
|
||||
```
|
||||
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View example results" %}}
|
||||
```arrow
|
||||
pyarrow.Table
|
||||
temp_mean: double
|
||||
room: string
|
||||
----
|
||||
temp_mean: [[22.581987577639747,22.10807453416151]]
|
||||
room: [["Kitchen","Living Room"]]
|
||||
```
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
|
||||
For more detail and examples, see the [PyArrow documentation](https://arrow.apache.org/docs/python/getstarted.html) and the [Apache Arrow Python Cookbook](https://arrow.apache.org/cookbook/py/data.html).
|
|
@ -8,10 +8,12 @@ description: >
|
|||
weight: 101
|
||||
menu:
|
||||
influxdb_cloud_iox:
|
||||
parent: Visualize data
|
||||
parent: Analyze and visualize data
|
||||
name: Use Superset
|
||||
identifier: visualize_with_superset
|
||||
influxdb/cloud-iox/tags: [visualization]
|
||||
influxdb/cloud-iox/tags: [query, visualization]
|
||||
aliases:
|
||||
- /influxdb/cloud-iox/query-data/tools/superset/
|
||||
related:
|
||||
- /influxdb/cloud-iox/query-data/execute-queries/flight-sql/superset/
|
||||
---
|
|
@ -1,17 +0,0 @@
|
|||
---
|
||||
title: Visualize data
|
||||
seotitle: Visualize data stored in InfluxDB
|
||||
description: >
|
||||
Use tools like Grafana and Apache Superset to visualize time series data
|
||||
stored in InfluxDB.
|
||||
weight: 5
|
||||
menu:
|
||||
influxdb_cloud_iox:
|
||||
name: Visualize data
|
||||
influxdb/cloud-iox/tags: [visualization]
|
||||
---
|
||||
|
||||
Use visualization tools like Grafana and Apache Superset to visualize your
|
||||
time series data stored in InfluxDB.
|
||||
|
||||
{{< children >}}
|
|
@ -1,3 +1,3 @@
|
|||
When working with the InfluxDB SQL implementation, a **bucket** is equivalent
|
||||
to a databases, a **measurement** is structured as a table, and **time**,
|
||||
to a database, a **measurement** is structured as a table, and **time**,
|
||||
**fields**, and **tags** are structured as columns.
|
Loading…
Reference in New Issue