docs-v2/content/influxdb3/clustered/process-data/tools/pyarrow.md

---
title: Use the PyArrow library to analyze data
list_title: PyArrow
description: >
  Use [PyArrow](https://arrow.apache.org/docs/python/) to read and analyze InfluxDB query results.
weight: 201
menu:
  influxdb3_clustered:
    parent: Use data analysis tools
    name: Use PyArrow
    identifier: analyze_with_pyarrow
influxdb3/clustered/tags: [analysis, arrow, pyarrow, python]
related:
  - /influxdb3/clustered/process-data/tools/pandas/
  - /influxdb3/clustered/query-data/sql/
  - /influxdb3/clustered/query-data/execute-queries/client-libraries/python/
aliases:
  - /influxdb3/clustered/visualize-data/pyarrow/
list_code_example: |
  ```py
  ...
  table = client.query(
          '''SELECT *
            FROM home
            WHERE time >= now() - INTERVAL '90 days'
            ORDER BY time'''
        )
  table.group_by('room').aggregate([('temp', 'mean')])
  ```
---

Use [PyArrow](https://arrow.apache.org/docs/python/) to read and analyze query
results from {{% product-name %}}.
The PyArrow library provides efficient computation, aggregation, serialization,
and conversion of Arrow format data.

> Apache Arrow is a development platform for in-memory analytics. It contains a set of technologies that enable
> big data systems to store, process and move data fast.
>
> The Arrow Python bindings (also named “PyArrow”) have first-class integration with NumPy, pandas, and built-in Python objects. They are based on the C++ implementation of Arrow.
> {{% caption %}}[PyArrow documentation](https://arrow.apache.org/docs/python/index.html){{% /caption %}}

<!-- TOC -->

- [Install prerequisites](#install-prerequisites)
- [Use PyArrow to read query results](#use-pyarrow-to-read-query-results)
- [Use PyArrow to analyze data](#use-pyarrow-to-analyze-data)
  - [Group and aggregate data](#group-and-aggregate-data)

<!-- /TOC -->

## Install prerequisites

The examples in this guide assume using a Python virtual environment and the InfluxDB 3 [`influxdb3-python` Python client library](/influxdb3/clustered/reference/client-libraries/v3/python/).
For more information, see how to [get started using Python to query InfluxDB](/influxdb3/clustered/query-data/execute-queries/client-libraries/python/).

Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.

## Use PyArrow to read query results

The following example shows how to use `influxdb3-python` and `pyarrow` to query InfluxDB and view Arrow data as a PyArrow `Table`.

1.  In your editor, copy and paste the following sample code to a new file--for example, `pyarrow-example.py`:

    {{% tabs-wrapper %}}
{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}}
```py
# pyarrow-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

def querySQL():

  # Instantiate an InfluxDB client configured for a database
  client = InfluxDBClient3(
    "https://{{< influxdb/host >}}",
    database="DATABASE_NAME",
    token="DATABASE_TOKEN")

  # Execute the query to retrieve all record batches in the stream formatted as a PyArrow Table.
  table = client.query(
    '''SELECT *
      FROM home
      WHERE time >= now() - INTERVAL '90 days'
      ORDER BY time'''
  )

  client.close()

print(querySQL())
```
{{% /code-placeholders %}}
  {{% /tabs-wrapper %}}

2.  Replace the following configuration values:

    - {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}:
      a [database token](/influxdb3/clustered/admin/tokens/#database-tokens)
      with read permissions on the databases you want to query
    - {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: the name of the [database](/influxdb3/clustered/admin/databases/) to query

3. In your terminal, use the Python interpreter to run the file:

    ```sh
    python pyarrow-example.py
    ```

The `InfluxDBClient3.query()` method sends the query request, and then returns a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) that contains all the Arrow record batches from the response stream.

Next, [use PyArrow to analyze data](#use-pyarrow-to-analyze-data).

## Use PyArrow to analyze data

### Group and aggregate data

With a `pyarrow.Table`, you can use values in a column as _keys_ for grouping.

The following example shows how to query InfluxDB, and then use PyArrow to group the table data and calculate an aggregate value for each group:

{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}}
```py
# pyarrow-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

def querySQL():

  # Instantiate an InfluxDB client configured for a database
  client = InfluxDBClient3(
    "https://{{< influxdb/host >}}",
    database="DATABASE_NAME",
    token="DATABASE_TOKEN")

  # Execute the query to retrieve data
  # formatted as a PyArrow Table
  table = client.query(
    '''SELECT *
      FROM home
      WHERE time >= now() - INTERVAL '90 days'
      ORDER BY time'''
  )

  client.close()

  return table

table = querySQL()

# Use PyArrow to aggregate data
print(table.group_by('room').aggregate([('temp', 'mean')]))
```
{{% /code-placeholders %}}

Replace the following:

- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}:
  a [database token](/influxdb3/clustered/admin/tokens/#database-tokens)
  with read permissions on the databases you want to query
- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}:
  the name of the [database](/influxdb3/clustered/admin/databases/) to query

{{< expand-wrapper >}}
{{% expand "View example results" %}}
```arrow
pyarrow.Table
temp_mean: double
room: string
----
temp_mean: [[22.581987577639747,22.10807453416151]]
room: [["Kitchen","Living Room"]]
```
{{% /expand %}}
{{< /expand-wrapper >}}

For more detail and examples, see the [PyArrow documentation](https://arrow.apache.org/docs/python/getstarted.html) and the [Apache Arrow Python Cookbook](https://arrow.apache.org/cookbook/py/data.html).