fix(v3): Replace FlightSQL examples with Python client library (part of #5025) (#5043)

pull/5046/head
Jason Stirnaman 2023-07-24 17:21:07 -05:00 committed by GitHub
parent e96d3313f1
commit 580fa03a35
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
4 changed files with 186 additions and 85 deletions

View File

@ -11,13 +11,20 @@ menu:
identifier: analyze_with_pyarrow
influxdb/cloud-dedicated/tags: [analysis, arrow, pyarrow, python]
related:
- /influxdb/cloud-dedicated/process-data/tools/pandas/
- /influxdb/cloud-dedicated/query-data/sql/
- /influxdb/cloud-dedicated/query-data/sql/execute-queries/python/
aliases:
- /influxdb/cloud-dedicated/visualize-data/pyarrow/
list_code_example: |
```py
...
table = reader.read_all()
table = client.query(
'''SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
)
table.group_by('room').aggregate([('temp', 'mean')])
```
---
@ -44,46 +51,52 @@ and conversion of Arrow format data.
## Install prerequisites
The examples in this guide assume using a Python virtual environment and the Flight SQL library for Python.
The examples in this guide assume using a Python virtual environment and the InfluxDB v3 [`influxdb3-python` Python client library](/influxdb/cloud-dedicated/reference/client-libraries/v3/python/).
For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/)
Installing `flightsql-dbapi` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
## Use PyArrow to read query results
The following example shows how to use Python with `flightsql-dbapi` and `pyarrow` to query InfluxDB and view Arrow data as a PyArrow `Table`.
The following example shows how to use `influxdb3-python` and `pyarrow` to query InfluxDB and view Arrow data as a PyArrow `Table`.
1. In your editor, copy and paste the following sample code to a new file--for example, `pyarrow-example.py`:
1. In your editor, copy and paste the following sample code to a new file--for example, `pyarrow-example.py`:
```py
# pyarrow-example.py
{{% tabs-wrapper %}}
{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}}
```py
# pyarrow-example.py
from flightsql import FlightSQLClient
from influxdb_client_3 import InfluxDBClient3
import pandas
# Instantiate a FlightSQLClient configured for a database
client = FlightSQLClient(host='cluster-id.influxdb.io',
token='INFLUX_READ_WRITE_TOKEN',
metadata={'database': 'INFLUX_DATABASE'},
features={'metadata-reflection': 'true'})
def querySQL():
# Instantiate an InfluxDB client configured for a database
client = InfluxDBClient3(
"https://cluster-id.influxdb.io",
database="DATABASE_NAME",
token="DATABASE_TOKEN")
# Execute the query to retrieve FlightInfo
info = client.execute('SELECT * FROM home')
# Execute the query to retrieve data formatted as a PyArrow Table
table = client.query(
'''SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
)
# Use the ticket to request the Arrow data stream.
# Return a FlightStreamReader for streaming the results.
reader = client.do_get(info.endpoints[0].ticket)
client.close()
# Read all data to a pyarrow.Table
table = reader.read_all()
print(querySQL())
```
{{% /code-placeholders %}}
{{% /tabs-wrapper %}}
print(table)
```
2. Replace the following configuration values:
- **`INFLUX_READ_WRITE_TOKEN`**: Your InfluxDB token with read permissions on the databases you want to query.
- **`INFLUX_DATABASE`**: The name of your InfluxDB database.
2. Replace the following configuration values:
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-dedicated/admin/tokens/) with read permissions on the databases you want to query.
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [database](/influxdb/cloud-dedicated/admin/databases/) to query.
3. In your terminal, use the Python interpreter to run the file:
@ -91,7 +104,7 @@ The following example shows how to use Python with `flightsql-dbapi` and `pyarro
python pyarrow-example.py
```
The `FlightStreamReader.read_all()` method reads all Arrow record batches in the stream as a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html).
The `InfluxDBClient3.query()` method sends the query request, and then returns a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) that contains all the Arrow record batches from the response stream.
Next, [use PyArrow to analyze data](#use-pyarrow-to-analyze-data).
@ -101,27 +114,46 @@ Next, [use PyArrow to analyze data](#use-pyarrow-to-analyze-data).
With a `pyarrow.Table`, you can use values in a column as _keys_ for grouping.
The following example shows how to query InfluxDB, group the table data, and then calculate an aggregate value for each group:
The following example shows how to query InfluxDB, and then use PyArrow to group the table data and calculate an aggregate value for each group:
{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}}
```py
# pyarrow-example.py
from flightsql import FlightSQLClient
from influxdb_client_3 import InfluxDBClient3
import pandas
client = FlightSQLClient(host='cluster-id.influxdb.io',
token='INFLUX_READ_WRITE_TOKEN',
metadata={'database': 'INFLUX_DATABASE'},
features={'metadata-reflection': 'true'})
def querySQL():
# Instantiate an InfluxDB client configured for a database
client = InfluxDBClient3(
"https://cluster-id.influxdb.io",
database="DATABASE_NAME",
token="DATABASE_TOKEN")
info = client.execute('SELECT * FROM home')
# Execute the query to retrieve data formatted as a PyArrow Table
table = client.query(
'''SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
)
reader = client.do_get(info.endpoints[0].ticket)
client.close()
table = reader.read_all()
return table
table = querySQL()
# Use PyArrow to aggregate data
print(table.group_by('room').aggregate([('temp', 'mean')]))
```
{{% /code-placeholders %}}
Replace the following:
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-dedicated/admin/tokens/) with read permissions on the databases you want to query.
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [database](/influxdb/cloud-dedicated/admin/tokens/) to query.
{{< expand-wrapper >}}
{{% expand "View example results" %}}

View File

@ -12,6 +12,27 @@ influxdb/cloud-dedicated/tags: [python, gRPC, SQL, Flight SQL, client libraries]
weight: 201
aliases:
- /influxdb/cloud-dedicated/reference/client-libraries/v3/pyinflux3/
list_code_example: >
```py
from influxdb_client_3 import InfluxDBClient3
# Instantiate an InfluxDB client configured for a database
client = InfluxDBClient3(
"https://us-east-1-1.aws.cloud2.influxdata.com",
database="DATABASE_NAME",
token="DATABASE_TOKEN")
# Execute the query and retrieve data formatted as a PyArrow Table
table = client.query(
'''SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
)
```
---
The InfluxDB v3 [`influxdb3-python` Python client library](https://github.com/InfluxCommunity/influxdb3-python)

View File

@ -11,22 +11,29 @@ menu:
name: Use PyArrow
identifier: analyze-with-pyarrow
influxdb/cloud-serverless/tags: [analysis, arrow, pyarrow, python]
aliases:
- /influxdb/cloud-serverless/visualize-data/pyarrow/
related:
- /influxdb/cloud-serverless/process-data/tools/pandas/
- /influxdb/cloud-serverless/query-data/sql/
- /influxdb/cloud-serverless/query-data/sql/execute-queries/python/
aliases:
- /influxdb/cloud-serverless/visualize-data/pyarrow/
list_code_example: |
```py
...
table = reader.read_all()
table = client.query(
'''SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
)
table.group_by('room').aggregate([('temp', 'mean')])
```
---
Use [PyArrow](https://arrow.apache.org/docs/python/) to read and analyze query results
from a InfluxDB Cloud Serverless.
The PyArrow library provides efficient computation, aggregation, serialization, and conversion of Arrow format data.
Use [PyArrow](https://arrow.apache.org/docs/python/) to read and analyze query
results from {{% cloud-name %}}.
The PyArrow library provides efficient computation, aggregation, serialization,
and conversion of Arrow format data.
> Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable
> big data systems to store, process and move data fast.
@ -45,46 +52,52 @@ The PyArrow library provides efficient computation, aggregation, serialization,
## Install prerequisites
The examples in this guide assume using a Python virtual environment and the Flight SQL library for Python.
For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-serverless/query-data/sql/execute-queries/python/)
The examples in this guide assume using a Python virtual environment and the InfluxDB v3 [`influxdb3-python` Python client library](/influxdb/cloud-serverless/reference/client-libraries/v3/python/).
For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/)
Installing `flightsql-dbapi` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.
## Use PyArrow to read query results
The following example shows how to use Python with `flightsql-dbapi` and `pyarrow` to query InfluxDB and view Arrow data as a PyArrow `Table`.
The following example shows how to use `influxdb3-python` and `pyarrow` to query InfluxDB and view Arrow data as a PyArrow `Table`.
1. In your editor, copy and paste the following sample code to a new file--for example, `pyarrow-example.py`:
1. In your editor, copy and paste the following sample code to a new file--for example, `pyarrow-example.py`:
```py
# pyarrow-example.py
{{% tabs-wrapper %}}
{{% code-placeholders "BUCKET_NAME|API_TOKEN" %}}
```py
# pyarrow-example.py
from flightsql import FlightSQLClient
from influxdb_client_3 import InfluxDBClient3
import pandas
# Instantiate a FlightSQLClient configured for a bucket
client = FlightSQLClient(host='cloud2.influxdata.com',
token='INFLUX_READ_WRITE_TOKEN',
metadata={'database': 'BUCKET_NAME'},
features={'metadata-reflection': 'true'})
def querySQL():
# Instantiate an InfluxDB client configured for a bucket
client = InfluxDBClient3(
"https://cloud2.influxdata.com",
database="BUCKET_NAME",
token="API_TOKEN")
# Execute the query to retrieve FlightInfo
info = client.execute('SELECT * FROM home')
# Execute the query to retrieve data formatted as a PyArrow Table
table = client.query(
'''SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
)
# Use the ticket to request the Arrow data stream.
# Return a FlightStreamReader for streaming the results.
reader = client.do_get(info.endpoints[0].ticket)
client.close()
# Read all data to a pyarrow.Table
table = reader.read_all()
print(querySQL())
```
{{% /code-placeholders %}}
{{% /tabs-wrapper %}}
print(table)
```
2. Replace the following configuration values:
- **`INFLUX_READ_WRITE_TOKEN`**: An InfluxDB token with _read_ permission to the bucket.
- **`BUCKET_NAME`**: The name of the InfluxDB bucket to query.
2. Replace the following configuration values:
- {{% code-placeholder-key %}}`API_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-serverless/admin/tokens/) with read permissions on the buckets you want to query.
- {{% code-placeholder-key %}}`BUCKET_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [bucket](/influxdb/cloud-serverless/admin/buckets/) to query.
3. In your terminal, use the Python interpreter to run the file:
@ -92,7 +105,7 @@ The following example shows how to use Python with `flightsql-dbapi` and `pyarro
python pyarrow-example.py
```
The `FlightStreamReader.read_all()` method reads all Arrow record batches in the stream as a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html).
The `InfluxDBClient3.query()` method sends the query request, and then returns a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) that contains all the Arrow record batches from the response stream.
Next, [use PyArrow to analyze data](#use-pyarrow-to-analyze-data).
@ -102,27 +115,46 @@ Next, [use PyArrow to analyze data](#use-pyarrow-to-analyze-data).
With a `pyarrow.Table`, you can use values in a column as _keys_ for grouping.
The following example shows how to query InfluxDB, group the table data, and then calculate an aggregate value for each group:
The following example shows how to query InfluxDB, and then use PyArrow to group the table data and calculate an aggregate value for each group:
{{% code-placeholders "BUCKET_NAME|API_TOKEN" %}}
```py
# pyarrow-example.py
from flightsql import FlightSQLClient
from influxdb_client_3 import InfluxDBClient3
import pandas
client = FlightSQLClient(host='cloud2.influxdata.com',
token='INFLUX_READ_WRITE_TOKEN',
metadata={'database': 'BUCKET_NAME'},
features={'metadata-reflection': 'true'})
def querySQL():
# Instantiate an InfluxDB client configured for a bucket
client = InfluxDBClient3(
"https://cloud2.influxdata.com",
database="BUCKET_NAME",
token="API_TOKEN")
info = client.execute('SELECT * FROM home')
# Execute the query to retrieve data formatted as a PyArrow Table
table = client.query(
'''SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
)
reader = client.do_get(info.endpoints[0].ticket)
client.close()
table = reader.read_all()
return table
table = querySQL()
# Use PyArrow to aggregate data
print(table.group_by('room').aggregate([('temp', 'mean')]))
```
{{% /code-placeholders %}}
Replace the following:
- {{% code-placeholder-key %}}`API_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-serverless/admin/tokens/) with read permissions on the buckets you want to query.
- {{% code-placeholder-key %}}`BUCKET_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [bucket](/influxdb/cloud-serverless/admin/tokens/) to query.
{{< expand-wrapper >}}
{{% expand "View example results" %}}
@ -137,9 +169,4 @@ room: [["Kitchen","Living Room"]]
{{% /expand %}}
{{< /expand-wrapper >}}
Replace the following:
- **`INFLUX_READ_WRITE_TOKEN`**: An InfluxDB token with _read_ permission to the bucket.
- **`BUCKET_NAME`**: The name of the InfluxDB bucket to query.
For more detail and examples, see the [PyArrow documentation](https://arrow.apache.org/docs/python/getstarted.html) and the [Apache Arrow Python Cookbook](https://arrow.apache.org/cookbook/py/data.html).

View File

@ -12,6 +12,27 @@ weight: 201
influxdb/cloud-serverless/tags: [python, gRPC, SQL, Flight SQL, client libraries]
aliases:
- /influxdb/cloud-serverless/reference/client-libraries/v3/pyinflux3/
list_code_example: >
```py
from influxdb_client_3 import InfluxDBClient3
# Instantiate an InfluxDB client configured for a bucket
client = InfluxDBClient3(
"https://cloud2.influxdata.com",
database="BUCKET_NAME",
token="API_TOKEN")
# Execute the query and retrieve data formatted as a PyArrow Table
table = client.query(
'''SELECT *
FROM home
WHERE time >= now() - INTERVAL '90 days'
ORDER BY time'''
)
```
---
The InfluxDB v3 [`influxdb3-python` Python client library](https://github.com/InfluxCommunity/influxdb3-python)