docs-v2/content/influxdb/clustered/process-data/tools/pandas.md

---
title: Use pandas to analyze data
list_title: pandas
seotitle: Use Python and pandas to analyze and visualize data
description: >
  Use the [pandas](https://pandas.pydata.org/) Python data analysis library
  to analyze and visualize time series data stored in InfluxDB Clustered.
weight: 101
menu:
  influxdb_clustered:
    parent: Use data analysis tools
    name: Use pandas
    identifier: analyze-with-pandas
influxdb/clustered/tags: [analysis, pandas, pyarrow, python]
aliases:
  - /influxdb/clustered/visualize-data/pandas/
related:
  - /influxdb/clustered/query-data/execute-queries/client-libraries/python/
list_code_example: |
    ```py
    ...
    dataframe = reader.read_pandas()
    dataframe = dataframe.set_index('time')

    print(dataframe.index)

    resample = dataframe.resample("1H")

    resample['temp'].mean()
    ```
---

Use [pandas](https://pandas.pydata.org/), the Python data analysis library, to process, analyze, and visualize data
stored in an {{% product-name %}} database.

> **pandas** is an open source, BSD-licensed library providing high-performance,
> easy-to-use data structures and data analysis tools for the Python programming language.
>
> {{% caption %}}[pandas documentation](https://pandas.pydata.org/docs/){{% /caption %}}

<!-- TOC -->

- [Install prerequisites](#install-prerequisites)
- [Install pandas](#install-pandas)
- [Use PyArrow to convert query results to pandas](#use-pyarrow-to-convert-query-results-to-pandas)
- [Use pandas to analyze data](#use-pandas-to-analyze-data)
  - [View data information and statistics](#view-data-information-and-statistics)
  - [Downsample time series](#downsample-time-series)

<!-- /TOC -->

## Install prerequisites

The examples in this guide assume using a Python virtual environment and the InfluxDB v3 [`influxdb3-python` Python client library](/influxdb/clustered/reference/client-libraries/v3/python/).
For more information, see how to [get started using Python to query InfluxDB](/influxdb/clustered/query-data/execute-queries/client-libraries/python/).

Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow.

## Install pandas

To use pandas, you need to install and import the `pandas` library.

In your terminal, use `pip` to install `pandas` in your active [Python virtual environment](/influxdb/clustered/query-data/execute-queries/client-libraries/python/#create-a-project-virtual-environment):

```sh
pip install pandas
```

## Use PyArrow to convert query results to pandas

The following steps use Python, `influxdb3-python`, and `pyarrow` to query InfluxDB and stream Arrow data to a pandas `DataFrame`.

1.  In your editor, copy and paste the following code to a new file--for example, `pandas-example.py`:

    {{% tabs-wrapper %}}
{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}}
```py
# pandas-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

# Instantiate an InfluxDB client configured for a database
client = InfluxDBClient3(
  "https://{{< influxdb/host >}}",
  database="DATABASE_NAME",
  token="DATABASE_TOKEN")

# Execute the query to retrieve all record batches in the stream
# formatted as a PyArrow Table.
table = client.query(
  '''SELECT *
    FROM home
    WHERE time >= now() - INTERVAL '90 days'
    ORDER BY time'''
)

client.close()

# Convert the PyArrow Table to a pandas DataFrame.
dataframe = table.to_pandas()

print(dataframe)
```
{{% /code-placeholders %}}
    {{% /tabs-wrapper %}}

2.  Replace the following configuration values:

    - {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: the name of the [database](/influxdb/clustered/admin/databases/) to query
    - {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}:
      a [database token](/influxdb/clustered/admin/tokens/#database-tokens)
      with _read_ permission on the specified database

3.  In your terminal, use the Python interpreter to run the file:

    ```sh
    python pandas-example.py
    ```

The example calls the following methods:

- [`InfluxDBClient3.query()`](/influxdb/clustered/reference/client-libraries/v3/python/#influxdbclient3query): sends the query request and returns a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) that contains all the Arrow record batches from the response stream.

- [`pyarrow.Table.to_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas): Creates a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame) from the data in the PyArrow `Table`.

{{% influxdb/custom-timestamps %}}
{{% expand-wrapper %}}
{{% expand "View example results" %}}
```sh
    co   hum         room  temp                time
0    0  35.9  Living Room  21.1 2022-01-02 11:46:40
1    0  35.9      Kitchen  21.0 2022-01-02 11:46:40
2    0  36.2      Kitchen  23.0 2022-01-02 12:46:40
3    0  35.9  Living Room  21.4 2022-01-02 12:46:40
4    0  36.1      Kitchen  22.7 2022-01-02 13:46:40
5    0  36.0  Living Room  21.8 2022-01-02 13:46:40
6    0  36.0      Kitchen  22.4 2022-01-02 14:46:40
7    0  36.0  Living Room  22.2 2022-01-02 14:46:40
8    0  36.0      Kitchen  22.5 2022-01-02 15:46:40
9    0  35.9  Living Room  22.2 2022-01-02 15:46:40
10   1  36.5      Kitchen  22.8 2022-01-02 16:46:40
11   0  36.0  Living Room  22.4 2022-01-02 16:46:40
12   1  36.3      Kitchen  22.8 2022-01-02 17:46:40
13   0  36.1  Living Room  22.3 2022-01-02 17:46:40
14   3  36.2      Kitchen  22.7 2022-01-02 18:46:40
15   1  36.1  Living Room  22.3 2022-01-02 18:46:40
16   7  36.0      Kitchen  22.4 2022-01-02 19:46:40
17   4  36.0  Living Room  22.4 2022-01-02 19:46:40
18   9  36.0      Kitchen  22.7 2022-01-02 20:46:40
19   5  35.9  Living Room  22.6 2022-01-02 20:46:40
20  18  36.9      Kitchen  23.3 2022-01-02 21:46:40
21   9  36.2  Living Room  22.8 2022-01-02 21:46:40
22  22  36.6      Kitchen  23.1 2022-01-02 22:46:40
23  14  36.3  Living Room  22.5 2022-01-02 22:46:40
24  26  36.5      Kitchen  22.7 2022-01-02 23:46:40
25  17  36.4  Living Room  22.2 2022-01-02 23:46:40
```
{{% /expand %}}
{{% /expand-wrapper %}}
{{% /influxdb/custom-timestamps %}}

Next, [use pandas to analyze data](#use-pandas-to-analyze-data).

## Use pandas to analyze data

- [View data information and statistics](#view-data-information-and-statistics)
- [Downsample time series](#downsample-time-series)

### View data information and statistics

The following example shows how to use pandas `DataFrame` methods to transform and summarize data stored in {{% product-name %}}.

{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}}
```py
# pandas-example.py

from influxdb_client_3 import InfluxDBClient3
import pandas

# Instantiate an InfluxDB client configured for a database
client = InfluxDBClient3(
  "https://{{< influxdb/host >}}",
  database="DATABASE_NAME",
  token="DATABASE_TOKEN")

# Execute the query to retrieve all record batches in the stream
# formatted as a PyArrow Table.
table = client.query(
  '''SELECT *
    FROM home
    WHERE time >= now() - INTERVAL '90 days'
    ORDER BY time'''
)

client.close()

# Convert the PyArrow Table to a pandas DataFrame.
dataframe = table.to_pandas()

# Print information about the results DataFrame,
# including the index dtype and columns, non-null values, and memory usage.
dataframe.info()

# Calculate descriptive statistics that summarize the distribution of the results.
print(dataframe.describe())

# Extract a DataFrame column.
print(dataframe['temp'])

# Print the DataFrame in Markdown format.
print(dataframe.to_markdown())
```
{{% /code-placeholders %}}

Replace the following configuration values:

- {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: the name of the InfluxDB [database](/influxdb/clustered/admin/databases/) to query
- {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}:
  a [database token](/influxdb/clustered/admin/tokens/#database-tokens)
  with read permission on the specified database
  
### Downsample time series

The pandas library provides extensive features for working with time series data.

The [`pandas.DataFrame.resample()` method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html) downsamples and upsamples data to time-based groups--for example:

```py
# pandas-example.py

...

# Use the `time` column to generate a DatetimeIndex for the DataFrame
dataframe = dataframe.set_index('time')

# Print information about the index
print(dataframe.index)

# Downsample data into 1-hour groups based on the DatetimeIndex
resample = dataframe.resample("1H")

# Print a summary that shows the start time and average temp for each group
print(resample['temp'].mean())
```

{{% influxdb/custom-timestamps %}}
{{< expand-wrapper >}}
{{% expand "View example results" %}}
```sh
time
2023-07-16 22:00:00          NaN
2023-07-16 23:00:00    22.600000
2023-07-17 00:00:00    22.513889
2023-07-17 01:00:00    22.208333
2023-07-17 02:00:00    22.300000
...
Freq: H, Name: temp, Length: 469323, dtype: float64
```
{{% /expand %}}
{{< /expand-wrapper >}}
{{% /influxdb/custom-timestamps %}}

For more detail and examples, see the [pandas documentation](https://pandas.pydata.org/docs/index.html).