107 lines
3.6 KiB
Markdown
107 lines
3.6 KiB
Markdown
[Apache Arrow Python bindings](https://arrow.apache.org/docs/python/index.html) integrate with Python scripts and applications to query data stored in InfluxDB.
|
|
|
|
> [!Note]
|
|
> #### Use InfluxDB 3 client libraries
|
|
>
|
|
> We recommend using the [`influxdb3-python` Python client library](/influxdb3/version/reference/client-libraries/v3/python/) for integrating InfluxDB 3 with your Python application code.
|
|
>
|
|
> [InfluxDB 3 client libraries](/influxdb3/version/reference/client-libraries/v3/) wrap Apache Arrow Flight clients
|
|
> and provide convenient methods for [writing](/influxdb3/version/write-data/api-client-libraries/), [querying](/influxdb3/version/query-data/execute-queries/), and processing data stored in {{% product-name %}}.
|
|
> Client libraries can query using SQL or InfluxQL.
|
|
|
|
The following examples show how to use the `pyarrow.flight` and `pandas` Python modules to query and format data stored in an {{% product-name %}} database:
|
|
|
|
{{< code-tabs-wrapper >}}
|
|
{{% code-tabs %}}
|
|
[SQL](#sql-python)
|
|
[InfluxQL](#influxql-python)
|
|
{{% /code-tabs %}}
|
|
{{% code-tab-content %}}
|
|
<!-- BEGIN SQL -->
|
|
{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}}
|
|
```python
|
|
# Using pyarrow>=12.0.0 FlightClient
|
|
from pyarrow.flight import FlightClient, Ticket, FlightCallOptions
|
|
import json
|
|
import pandas
|
|
import tabulate
|
|
|
|
# Downsampling query groups data into 2-hour bins
|
|
sql="""
|
|
SELECT DATE_BIN(INTERVAL '2 hours', time) AS time,
|
|
room,
|
|
selector_max(temp, time)['value'] AS 'max temp',
|
|
selector_min(temp, time)['value'] AS 'min temp',
|
|
avg(temp) AS 'average temp'
|
|
FROM home
|
|
GROUP BY
|
|
1,
|
|
room
|
|
ORDER BY room, 1"""
|
|
|
|
flight_ticket = Ticket(json.dumps({
|
|
"namespace_name": "DATABASE_NAME",
|
|
"sql_query": sql,
|
|
"query_type": "sql"
|
|
}))
|
|
|
|
token = (b"authorization", bytes(f"Bearer DATABASE_TOKEN".encode('utf-8')))
|
|
options = FlightCallOptions(headers=[token])
|
|
client = FlightClient(f"grpc+tls://{{< influxdb/host >}}:443")
|
|
|
|
reader = client.do_get(flight_ticket, options)
|
|
arrow_table = reader.read_all()
|
|
# Use pyarrow and pandas to view and analyze data
|
|
data_frame = arrow_table.to_pandas()
|
|
print(data_frame.to_markdown())
|
|
```
|
|
{{% /code-placeholders %}}
|
|
<!-- END SQL -->
|
|
{{% /code-tab-content %}}
|
|
{{% code-tab-content %}}
|
|
<!-- BEGIN INFLUXQL -->
|
|
{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}}
|
|
```python
|
|
# Using pyarrow>=12.0.0 FlightClient
|
|
from pyarrow.flight import FlightClient, Ticket, FlightCallOptions
|
|
import json
|
|
import pandas
|
|
import tabulate
|
|
|
|
# Downsampling query groups data into 2-hour bins
|
|
influxql="""
|
|
SELECT FIRST(temp)
|
|
FROM home
|
|
WHERE room = 'kitchen'
|
|
AND time >= now() - 100d
|
|
AND time <= now() - 10d
|
|
GROUP BY time(2h)"""
|
|
|
|
flight_ticket = Ticket(json.dumps({
|
|
"namespace_name": "DATABASE_NAME",
|
|
"sql_query": influxql,
|
|
"query_type": "influxql"
|
|
}))
|
|
|
|
token = (b"authorization", bytes(f"Bearer DATABASE_TOKEN".encode('utf-8')))
|
|
options = FlightCallOptions(headers=[token])
|
|
client = FlightClient(f"grpc+tls://{{< influxdb/host >}}:443")
|
|
|
|
reader = client.do_get(flight_ticket, options)
|
|
arrow_table = reader.read_all()
|
|
# Use pyarrow and pandas to view and analyze data
|
|
data_frame = arrow_table.to_pandas()
|
|
print(data_frame.to_markdown())
|
|
```
|
|
{{% /code-placeholders %}}
|
|
<!-- END INFLUXQL -->
|
|
{{% /code-tab-content %}}
|
|
{{< /code-tabs-wrapper >}}
|
|
|
|
Replace the following:
|
|
|
|
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: your {{% product-name %}} database
|
|
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}:
|
|
a [database token](/influxdb3/version/admin/tokens/database/)
|
|
with sufficient permissions to the specified database
|