chore(v3): Use python client library (#5046)
- Replaces Flight SQL examples with Client Library examples. - Closes #5025.pull/5052/head
parent
ac47b3ec05
commit
1cb3cf856a
|
|
@ -1,11 +1,11 @@
|
|||
---
|
||||
title: Use pandas to analyze data
|
||||
list_title: pandas
|
||||
seotitle: Use Python and pandas to analyze data
|
||||
seotitle: Use Python and pandas to analyze and visualize data
|
||||
description: >
|
||||
Use the [pandas](https://pandas.pydata.org/) Python data analysis library
|
||||
to analyze time series data stored in InfluxDB.
|
||||
weight: 201
|
||||
to analyze and visualize time series data stored in InfluxDB Cloud Dedicated.
|
||||
weight: 101
|
||||
menu:
|
||||
influxdb_cloud_dedicated:
|
||||
parent: Use data analysis tools
|
||||
|
|
@ -51,10 +51,10 @@ stored in InfluxDB.
|
|||
|
||||
## Install prerequisites
|
||||
|
||||
The examples in this guide assume using a Python virtual environment and the Flight SQL library for Python.
|
||||
Installing `flightsql-dbapi` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
|
||||
The examples in this guide assume using a Python virtual environment and the InfluxDB v3 [`influxdb3-python` Python client library](/influxdb/cloud-dedicated/reference/client-libraries/v3/python/).
|
||||
For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/).
|
||||
|
||||
For more information, see how to [get started querying InfluxDB with Python and flightsql-dbapi](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/)
|
||||
Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
|
||||
|
||||
## Install pandas
|
||||
|
||||
|
|
@ -68,35 +68,47 @@ pip install pandas
|
|||
|
||||
## Use PyArrow to convert query results to pandas
|
||||
|
||||
The following steps use Python, `flightsql-dbapi`, and `pyarrow` to query InfluxDB and stream Arrow data to a pandas `DataFrame`.
|
||||
The following steps use Python, `influxdb3-python`, and `pyarrow` to query InfluxDB and stream Arrow data to a pandas `DataFrame`.
|
||||
|
||||
1. In your editor, copy and paste the following code to a new file--for example, `pandas-example.py`:
|
||||
|
||||
```py
|
||||
# pandas-example.py
|
||||
{{% tabs-wrapper %}}
|
||||
{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}}
|
||||
```py
|
||||
# pandas-example.py
|
||||
|
||||
from flightsql import FlightSQLClient
|
||||
import pandas
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import pandas
|
||||
|
||||
client = FlightSQLClient(host='cluster-id.influxdb.io',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'database': 'INFLUX_DATABASE'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
# Instantiate an InfluxDB client configured for a database
|
||||
client = InfluxDBClient3(
|
||||
"https://cluster-id.influxdb.io",
|
||||
database="DATABASE_NAME",
|
||||
token="DATABASE_TOKEN")
|
||||
|
||||
info = client.execute("SELECT * FROM home")
|
||||
# Execute the query to retrieve all record batches in the stream
|
||||
# formatted as a PyArrow Table.
|
||||
table = client.query(
|
||||
'''SELECT *
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '90 days'
|
||||
ORDER BY time'''
|
||||
)
|
||||
|
||||
reader = client.do_get(info.endpoints[0].ticket)
|
||||
client.close()
|
||||
|
||||
# Read all record batches in the stream to a pandas DataFrame
|
||||
dataframe = reader.read_pandas()
|
||||
# Convert the PyArrow Table to a pandas DataFrame.
|
||||
dataframe = table.to_pandas()
|
||||
|
||||
dataframe.info()
|
||||
```
|
||||
print(dataframe)
|
||||
```
|
||||
{{% /code-placeholders %}}
|
||||
{{% /tabs-wrapper %}}
|
||||
|
||||
2. Replace the following configuration values:
|
||||
|
||||
- **`INFLUX_READ_WRITE_TOKEN`**: Your InfluxDB token with read permissions on the databases you want to query.
|
||||
- **`INFLUX_DATABASE`**: The name of your InfluxDB database.
|
||||
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-dedicated/admin/tokens/) with read permissions on the databases you want to query.
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [database](/influxdb/cloud-dedicated/admin/databases/) to query.
|
||||
|
||||
3. In your terminal, use the Python interpreter to run the file:
|
||||
|
||||
|
|
@ -104,10 +116,47 @@ The following steps use Python, `flightsql-dbapi`, and `pyarrow` to query Influx
|
|||
python pandas-example.py
|
||||
```
|
||||
|
||||
The `pyarrow.flight.FlightStreamReader` [`read_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightStreamReader.html#pyarrow.flight.FlightStreamReader.read_pandas) method:
|
||||
The example calls the following methods:
|
||||
|
||||
- Takes the same options as [`pyarrow.Table.to_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas).
|
||||
- Reads all Arrow record batches in the stream to a `pyarrow.Table` and then converts the `Table` to a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame).
|
||||
- [`InfluxDBClient3.query()`](/influxdb/cloud-dedicated/reference/client-libraries/v3/python/#influxdbclient3query): sends the query request and returns a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) that contains all the Arrow record batches from the response stream.
|
||||
|
||||
- [`pyarrow.Table.to_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas): Creates a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame) from the data in the PyArrow `Table`.
|
||||
|
||||
{{% influxdb/custom-timestamps %}}
|
||||
{{% expand-wrapper %}}
|
||||
{{% expand "View example results" %}}
|
||||
```sh
|
||||
co hum room temp time
|
||||
0 0 35.9 Living Room 21.1 2022-01-02 11:46:40
|
||||
1 0 35.9 Kitchen 21.0 2022-01-02 11:46:40
|
||||
2 0 36.2 Kitchen 23.0 2022-01-02 12:46:40
|
||||
3 0 35.9 Living Room 21.4 2022-01-02 12:46:40
|
||||
4 0 36.1 Kitchen 22.7 2022-01-02 13:46:40
|
||||
5 0 36.0 Living Room 21.8 2022-01-02 13:46:40
|
||||
6 0 36.0 Kitchen 22.4 2022-01-02 14:46:40
|
||||
7 0 36.0 Living Room 22.2 2022-01-02 14:46:40
|
||||
8 0 36.0 Kitchen 22.5 2022-01-02 15:46:40
|
||||
9 0 35.9 Living Room 22.2 2022-01-02 15:46:40
|
||||
10 1 36.5 Kitchen 22.8 2022-01-02 16:46:40
|
||||
11 0 36.0 Living Room 22.4 2022-01-02 16:46:40
|
||||
12 1 36.3 Kitchen 22.8 2022-01-02 17:46:40
|
||||
13 0 36.1 Living Room 22.3 2022-01-02 17:46:40
|
||||
14 3 36.2 Kitchen 22.7 2022-01-02 18:46:40
|
||||
15 1 36.1 Living Room 22.3 2022-01-02 18:46:40
|
||||
16 7 36.0 Kitchen 22.4 2022-01-02 19:46:40
|
||||
17 4 36.0 Living Room 22.4 2022-01-02 19:46:40
|
||||
18 9 36.0 Kitchen 22.7 2022-01-02 20:46:40
|
||||
19 5 35.9 Living Room 22.6 2022-01-02 20:46:40
|
||||
20 18 36.9 Kitchen 23.3 2022-01-02 21:46:40
|
||||
21 9 36.2 Living Room 22.8 2022-01-02 21:46:40
|
||||
22 22 36.6 Kitchen 23.1 2022-01-02 22:46:40
|
||||
23 14 36.3 Living Room 22.5 2022-01-02 22:46:40
|
||||
24 26 36.5 Kitchen 22.7 2022-01-02 23:46:40
|
||||
25 17 36.4 Living Room 22.2 2022-01-02 23:46:40
|
||||
```
|
||||
{{% /expand %}}
|
||||
{{% /expand-wrapper %}}
|
||||
{{% /influxdb/custom-timestamps %}}
|
||||
|
||||
Next, [use pandas to analyze data](#use-pandas-to-analyze-data).
|
||||
|
||||
|
|
@ -118,25 +167,34 @@ Next, [use pandas to analyze data](#use-pandas-to-analyze-data).
|
|||
|
||||
### View data information and statistics
|
||||
|
||||
The following example uses the DataFrame `info()` and `describe()`
|
||||
methods to print information about the DataFrame.
|
||||
The following example shows how to use pandas `DataFrame` methods to transform and summarize data stored in {{% cloud-name %}}.
|
||||
|
||||
{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}}
|
||||
```py
|
||||
# pandas-example.py
|
||||
|
||||
from flightsql import FlightSQLClient
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import pandas
|
||||
|
||||
client = FlightSQLClient(host='cluster-id.influxdb.io',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'database': 'INFLUX_DATABASE'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
# Instantiate an InfluxDB client configured for a database
|
||||
client = InfluxDBClient3(
|
||||
"https://cluster-id.influxdb.io",
|
||||
database="DATABASE_NAME",
|
||||
token="DATABASE_TOKEN")
|
||||
|
||||
info = client.execute("SELECT * FROM home")
|
||||
# Execute the query to retrieve all record batches in the stream
|
||||
# formatted as a PyArrow Table.
|
||||
table = client.query(
|
||||
'''SELECT *
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '90 days'
|
||||
ORDER BY time'''
|
||||
)
|
||||
|
||||
reader = client.do_get(info.endpoints[0].ticket)
|
||||
client.close()
|
||||
|
||||
dataframe = reader.read_pandas()
|
||||
# Convert the PyArrow Table to a pandas DataFrame.
|
||||
dataframe = table.to_pandas()
|
||||
|
||||
# Print information about the results DataFrame,
|
||||
# including the index dtype and columns, non-null values, and memory usage.
|
||||
|
|
@ -144,8 +202,20 @@ dataframe.info()
|
|||
|
||||
# Calculate descriptive statistics that summarize the distribution of the results.
|
||||
print(dataframe.describe())
|
||||
```
|
||||
|
||||
# Extract a DataFrame column.
|
||||
print(dataframe['temp'])
|
||||
|
||||
# Print the DataFrame in Markdown format.
|
||||
print(dataframe.to_markdown())
|
||||
```
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following configuration values:
|
||||
|
||||
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-dedicated/admin/tokens/) with read permissions on the databases you want to query.
|
||||
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [database](/influxdb/cloud-dedicated/admin/databases/) to query.
|
||||
|
||||
### Downsample time series
|
||||
|
||||
The pandas library provides extensive features for working with time series data.
|
||||
|
|
@ -153,19 +223,9 @@ The pandas library provides extensive features for working with time series data
|
|||
The [`pandas.DataFrame.resample()` method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html) downsamples and upsamples data to time-based groups--for example:
|
||||
|
||||
```py
|
||||
from flightsql import FlightSQLClient
|
||||
import pandas
|
||||
# pandas-example.py
|
||||
|
||||
client = FlightSQLClient(host='cluster-id.influxdb.io',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'database': 'INFLUX_DATABASE'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
|
||||
info = client.execute("SELECT * FROM home")
|
||||
|
||||
reader = client.do_get(info.endpoints[0].ticket)
|
||||
|
||||
dataframe = reader.read_pandas()
|
||||
...
|
||||
|
||||
# Use the `time` column to generate a DatetimeIndex for the DataFrame
|
||||
dataframe = dataframe.set_index('time')
|
||||
|
|
@ -180,24 +240,21 @@ resample = dataframe.resample("1H")
|
|||
print(resample['temp'].mean())
|
||||
```
|
||||
|
||||
{{% influxdb/custom-timestamps %}}
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View example results" %}}
|
||||
```sh
|
||||
time
|
||||
1970-01-01 00:00:00 22.374138
|
||||
1970-01-01 01:00:00 NaN
|
||||
1970-01-01 02:00:00 NaN
|
||||
1970-01-01 03:00:00 NaN
|
||||
1970-01-01 04:00:00 NaN
|
||||
...
|
||||
time
|
||||
2023-07-16 22:00:00 NaN
|
||||
2023-07-16 23:00:00 22.600000
|
||||
2023-07-17 00:00:00 22.513889
|
||||
2023-07-17 01:00:00 22.208333
|
||||
2023-07-17 02:00:00 22.300000
|
||||
...
|
||||
Freq: H, Name: temp, Length: 469323, dtype: float64
|
||||
```
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
{{% /influxdb/custom-timestamps %}}
|
||||
|
||||
For more detail and examples, see the [pandas documentation](https://pandas.pydata.org/docs/index.html).
|
||||
|
|
|
|||
|
|
@ -52,7 +52,7 @@ and conversion of Arrow format data.
|
|||
## Install prerequisites
|
||||
|
||||
The examples in this guide assume using a Python virtual environment and the InfluxDB v3 [`influxdb3-python` Python client library](/influxdb/cloud-dedicated/reference/client-libraries/v3/python/).
|
||||
For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/)
|
||||
For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/).
|
||||
|
||||
Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
|
||||
|
||||
|
|
@ -78,7 +78,7 @@ def querySQL():
|
|||
database="DATABASE_NAME",
|
||||
token="DATABASE_TOKEN")
|
||||
|
||||
# Execute the query to retrieve data formatted as a PyArrow Table
|
||||
# Execute the query to retrieve all record batches in the stream formatted as a PyArrow Table.
|
||||
table = client.query(
|
||||
'''SELECT *
|
||||
FROM home
|
||||
|
|
@ -131,7 +131,8 @@ def querySQL():
|
|||
database="DATABASE_NAME",
|
||||
token="DATABASE_TOKEN")
|
||||
|
||||
# Execute the query to retrieve data formatted as a PyArrow Table
|
||||
# Execute the query to retrieve data
|
||||
# formatted as a PyArrow Table
|
||||
table = client.query(
|
||||
'''SELECT *
|
||||
FROM home
|
||||
|
|
|
|||
|
|
@ -4,7 +4,7 @@ list_title: pandas
|
|||
seotitle: Use Python and pandas to analyze and visualize data
|
||||
description: >
|
||||
Use the [pandas](https://pandas.pydata.org/) Python data analysis library
|
||||
to analyze and visualize data stored in InfluxDB Cloud Serverless.
|
||||
to analyze and visualize time series data stored in InfluxDB Cloud Serverless.
|
||||
weight: 101
|
||||
menu:
|
||||
influxdb_cloud_serverless:
|
||||
|
|
@ -53,10 +53,10 @@ stored in an InfluxDB Cloud Serverless bucket.
|
|||
|
||||
## Install prerequisites
|
||||
|
||||
The examples in this guide assume using a Python virtual environment and the Flight SQL library for Python.
|
||||
Installing `flightsql-dbapi` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
|
||||
The examples in this guide assume using a Python virtual environment and the InfluxDB v3 [`influxdb3-python` Python client library](/influxdb/cloud-serverless/reference/client-libraries/v3/python/).
|
||||
For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-serverless/query-data/execute-queries/flight-sql/python/).
|
||||
|
||||
For more information, see how to [get started querying InfluxDB with Python and flightsql-dbapi](/influxdb/cloud-serverless/query-data/sql/execute-queries/python/)
|
||||
Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
|
||||
|
||||
## Install pandas
|
||||
|
||||
|
|
@ -70,35 +70,47 @@ pip install pandas
|
|||
|
||||
## Use PyArrow to convert query results to pandas
|
||||
|
||||
The following steps use Python, `flightsql-dbapi`, and `pyarrow` to query InfluxDB and stream Arrow data to a pandas `DataFrame`.
|
||||
The following steps use Python, `influxdb3-python`, and `pyarrow` to query InfluxDB and stream Arrow data to a pandas `DataFrame`.
|
||||
|
||||
1. In your editor, copy and paste the following code to a new file--for example, `pandas-example.py`:
|
||||
|
||||
```py
|
||||
# pandas-example.py
|
||||
{{% tabs-wrapper %}}
|
||||
{{% code-placeholders "BUCKET_NAME|API_TOKEN" %}}
|
||||
```py
|
||||
# pandas-example.py
|
||||
|
||||
from flightsql import FlightSQLClient
|
||||
import pandas
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import pandas
|
||||
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'database': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
# Instantiate an InfluxDB client configured for a bucket
|
||||
client = InfluxDBClient3(
|
||||
"https://cloud2.influxdata.com",
|
||||
database="BUCKET_NAME",
|
||||
token="API_TOKEN")
|
||||
|
||||
info = client.execute("SELECT * FROM home")
|
||||
# Execute the query to retrieve all record batches in the stream
|
||||
# formatted as a PyArrow Table.
|
||||
table = client.query(
|
||||
'''SELECT *
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '90 days'
|
||||
ORDER BY time'''
|
||||
)
|
||||
|
||||
reader = client.do_get(info.endpoints[0].ticket)
|
||||
client.close()
|
||||
|
||||
# Read all record batches in the stream to a pandas DataFrame
|
||||
dataframe = reader.read_pandas()
|
||||
# Convert the PyArrow Table to a pandas DataFrame.
|
||||
dataframe = table.to_pandas()
|
||||
|
||||
dataframe.info()
|
||||
```
|
||||
print(dataframe)
|
||||
```
|
||||
{{% /code-placeholders %}}
|
||||
{{% /tabs-wrapper %}}
|
||||
|
||||
2. Replace the following configuration values:
|
||||
|
||||
- **`INFLUX_READ_WRITE_TOKEN`**: An InfluxDB token with _read_ permission to the bucket.
|
||||
- **`INFLUX_BUCKET`**: The name of the InfluxDB bucket to query.
|
||||
- {{% code-placeholder-key %}}`API_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-serverless/admin/tokens/) with read permissions on the buckets you want to query.
|
||||
- {{% code-placeholder-key %}}`BUCKET_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [bucket](/influxdb/cloud-serverless/admin/buckets/) to query.
|
||||
|
||||
3. In your terminal, use the Python interpreter to run the file:
|
||||
|
||||
|
|
@ -106,39 +118,85 @@ The following steps use Python, `flightsql-dbapi`, and `pyarrow` to query Influx
|
|||
python pandas-example.py
|
||||
```
|
||||
|
||||
The `pyarrow.flight.FlightStreamReader` [`read_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightStreamReader.html#pyarrow.flight.FlightStreamReader.read_pandas) method:
|
||||
The example calls the following methods:
|
||||
|
||||
- Takes the same options as [`pyarrow.Table.to_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas).
|
||||
- Reads all Arrow record batches in the stream to a `pyarrow.Table` and then converts the `Table` to a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame).
|
||||
- [`InfluxDBClient3.query()`](/influxdb/cloud-serverless/reference/client-libraries/v3/python/#influxdbclient3query): Sends the query request and returns a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) that contains all the Arrow record batches from the response stream.
|
||||
|
||||
- [`pyarrow.Table.to_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas): Creates a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame) from the data in the PyArrow `Table`.
|
||||
|
||||
{{% influxdb/custom-timestamps %}}
|
||||
{{% expand-wrapper %}}
|
||||
{{% expand "View example results" %}}
|
||||
```sh
|
||||
co hum room temp time
|
||||
0 0 35.9 Living Room 21.1 2022-01-02 11:46:40
|
||||
1 0 35.9 Kitchen 21.0 2022-01-02 11:46:40
|
||||
2 0 36.2 Kitchen 23.0 2022-01-02 12:46:40
|
||||
3 0 35.9 Living Room 21.4 2022-01-02 12:46:40
|
||||
4 0 36.1 Kitchen 22.7 2022-01-02 13:46:40
|
||||
5 0 36.0 Living Room 21.8 2022-01-02 13:46:40
|
||||
6 0 36.0 Kitchen 22.4 2022-01-02 14:46:40
|
||||
7 0 36.0 Living Room 22.2 2022-01-02 14:46:40
|
||||
8 0 36.0 Kitchen 22.5 2022-01-02 15:46:40
|
||||
9 0 35.9 Living Room 22.2 2022-01-02 15:46:40
|
||||
10 1 36.5 Kitchen 22.8 2022-01-02 16:46:40
|
||||
11 0 36.0 Living Room 22.4 2022-01-02 16:46:40
|
||||
12 1 36.3 Kitchen 22.8 2022-01-02 17:46:40
|
||||
13 0 36.1 Living Room 22.3 2022-01-02 17:46:40
|
||||
14 3 36.2 Kitchen 22.7 2022-01-02 18:46:40
|
||||
15 1 36.1 Living Room 22.3 2022-01-02 18:46:40
|
||||
16 7 36.0 Kitchen 22.4 2022-01-02 19:46:40
|
||||
17 4 36.0 Living Room 22.4 2022-01-02 19:46:40
|
||||
18 9 36.0 Kitchen 22.7 2022-01-02 20:46:40
|
||||
19 5 35.9 Living Room 22.6 2022-01-02 20:46:40
|
||||
20 18 36.9 Kitchen 23.3 2022-01-02 21:46:40
|
||||
21 9 36.2 Living Room 22.8 2022-01-02 21:46:40
|
||||
22 22 36.6 Kitchen 23.1 2022-01-02 22:46:40
|
||||
23 14 36.3 Living Room 22.5 2022-01-02 22:46:40
|
||||
24 26 36.5 Kitchen 22.7 2022-01-02 23:46:40
|
||||
25 17 36.4 Living Room 22.2 2022-01-02 23:46:40
|
||||
```
|
||||
{{% /expand %}}
|
||||
{{% /expand-wrapper %}}
|
||||
{{% /influxdb/custom-timestamps %}}
|
||||
|
||||
Next, [use pandas to analyze data](#use-pandas-to-analyze-data).
|
||||
|
||||
## Use pandas to analyze data
|
||||
|
||||
- [View information and statistics for data](#view-information-and-statistics-for-data)
|
||||
- [View data information and statistics](#view-data-information-and-statistics)
|
||||
- [Downsample time series](#downsample-time-series)
|
||||
|
||||
### View data information and statistics
|
||||
|
||||
The following example uses the DataFrame `info()` and `describe()`
|
||||
methods to print information about the DataFrame.
|
||||
The following example shows how to use pandas `DataFrame` methods to transform and summarize data stored in {{% cloud-name %}}.
|
||||
|
||||
{{% code-placeholders "BUCKET_NAME|API_TOKEN" %}}
|
||||
```py
|
||||
# pandas-example.py
|
||||
|
||||
from flightsql import FlightSQLClient
|
||||
from influxdb_client_3 import InfluxDBClient3
|
||||
import pandas
|
||||
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'database': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
# Instantiate an InfluxDB client configured for a bucket
|
||||
client = InfluxDBClient3(
|
||||
"https://cloud2.influxdata.com",
|
||||
database="BUCKET_NAME",
|
||||
token="API_TOKEN")
|
||||
|
||||
info = client.execute("SELECT * FROM home")
|
||||
# Execute the query to retrieve all record batches in the stream
|
||||
# formatted as a PyArrow Table.
|
||||
table = client.query(
|
||||
'''SELECT *
|
||||
FROM home
|
||||
WHERE time >= now() - INTERVAL '90 days'
|
||||
ORDER BY time'''
|
||||
)
|
||||
|
||||
reader = client.do_get(info.endpoints[0].ticket)
|
||||
client.close()
|
||||
|
||||
dataframe = reader.read_pandas()
|
||||
# Convert the PyArrow Table to a pandas DataFrame.
|
||||
dataframe = table.to_pandas()
|
||||
|
||||
# Print information about the results DataFrame,
|
||||
# including the index dtype and columns, non-null values, and memory usage.
|
||||
|
|
@ -146,8 +204,20 @@ dataframe.info()
|
|||
|
||||
# Calculate descriptive statistics that summarize the distribution of the results.
|
||||
print(dataframe.describe())
|
||||
```
|
||||
|
||||
# Extract a DataFrame column.
|
||||
print(dataframe['temp'])
|
||||
|
||||
# Print the DataFrame in Markdown format.
|
||||
print(dataframe.to_markdown())
|
||||
```
|
||||
{{% /code-placeholders %}}
|
||||
|
||||
Replace the following configuration values:
|
||||
|
||||
- {{% code-placeholder-key %}}`API_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-serverless/admin/tokens/) with read permissions on the buckets you want to query.
|
||||
- {{% code-placeholder-key %}}`BUCKET_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [bucket](/influxdb/cloud-serverless/admin/buckets/) to query.
|
||||
|
||||
### Downsample time series
|
||||
|
||||
The pandas library provides extensive features for working with time series data.
|
||||
|
|
@ -155,19 +225,9 @@ The pandas library provides extensive features for working with time series data
|
|||
The [`pandas.DataFrame.resample()` method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html) downsamples and upsamples data to time-based groups--for example:
|
||||
|
||||
```py
|
||||
from flightsql import FlightSQLClient
|
||||
import pandas
|
||||
# pandas-example.py
|
||||
|
||||
client = FlightSQLClient(host='cloud2.influxdata.com',
|
||||
token='INFLUX_READ_WRITE_TOKEN',
|
||||
metadata={'database': 'INFLUX_BUCKET'},
|
||||
features={'metadata-reflection': 'true'})
|
||||
|
||||
info = client.execute("SELECT * FROM home")
|
||||
|
||||
reader = client.do_get(info.endpoints[0].ticket)
|
||||
|
||||
dataframe = reader.read_pandas()
|
||||
...
|
||||
|
||||
# Use the `time` column to generate a DatetimeIndex for the DataFrame
|
||||
dataframe = dataframe.set_index('time')
|
||||
|
|
@ -182,24 +242,21 @@ resample = dataframe.resample("1H")
|
|||
print(resample['temp'].mean())
|
||||
```
|
||||
|
||||
{{% influxdb/custom-timestamps %}}
|
||||
{{< expand-wrapper >}}
|
||||
{{% expand "View example results" %}}
|
||||
```sh
|
||||
time
|
||||
1970-01-01 00:00:00 22.374138
|
||||
1970-01-01 01:00:00 NaN
|
||||
1970-01-01 02:00:00 NaN
|
||||
1970-01-01 03:00:00 NaN
|
||||
1970-01-01 04:00:00 NaN
|
||||
...
|
||||
time
|
||||
2023-07-16 22:00:00 NaN
|
||||
2023-07-16 23:00:00 22.600000
|
||||
2023-07-17 00:00:00 22.513889
|
||||
2023-07-17 01:00:00 22.208333
|
||||
2023-07-17 02:00:00 22.300000
|
||||
...
|
||||
Freq: H, Name: temp, Length: 469323, dtype: float64
|
||||
```
|
||||
{{% /expand %}}
|
||||
{{< /expand-wrapper >}}
|
||||
{{% /influxdb/custom-timestamps %}}
|
||||
|
||||
For more detail and examples, see the [pandas documentation](https://pandas.pydata.org/docs/index.html).
|
||||
|
|
|
|||
|
|
@ -53,7 +53,7 @@ and conversion of Arrow format data.
|
|||
## Install prerequisites
|
||||
|
||||
The examples in this guide assume using a Python virtual environment and the InfluxDB v3 [`influxdb3-python` Python client library](/influxdb/cloud-serverless/reference/client-libraries/v3/python/).
|
||||
For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/)
|
||||
For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-serverless/query-data/execute-queries/flight-sql/python/).
|
||||
|
||||
Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
|
||||
|
||||
|
|
@ -79,7 +79,7 @@ def querySQL():
|
|||
database="BUCKET_NAME",
|
||||
token="API_TOKEN")
|
||||
|
||||
# Execute the query to retrieve data formatted as a PyArrow Table
|
||||
# Execute the query to retrieve all record batches in the stream formatted as a PyArrow Table.
|
||||
table = client.query(
|
||||
'''SELECT *
|
||||
FROM home
|
||||
|
|
@ -132,7 +132,8 @@ def querySQL():
|
|||
database="BUCKET_NAME",
|
||||
token="API_TOKEN")
|
||||
|
||||
# Execute the query to retrieve data formatted as a PyArrow Table
|
||||
# Execute the query to retrieve data
|
||||
# formatted as a PyArrow Table
|
||||
table = client.query(
|
||||
'''SELECT *
|
||||
FROM home
|
||||
|
|
|
|||
Loading…
Reference in New Issue