97 lines
4.0 KiB
Markdown
97 lines
4.0 KiB
Markdown
---
|
|
title: Summarize query results and data distribution
|
|
description: >
|
|
Query data stored in InfluxDB and use tools like pandas to summarize the results schema and distribution.
|
|
menu:
|
|
influxdb3_clustered:
|
|
name: Summarize data
|
|
parent: Process & visualize data
|
|
weight: 101
|
|
influxdb3/clustered/tags: [analysis, pandas, pyarrow, python, schema]
|
|
related:
|
|
- /influxdb3/clustered/query-data/execute-queries/client-libraries/python/
|
|
---
|
|
|
|
Query data stored in InfluxDB and use tools like pandas to summarize the results schema and distribution.
|
|
|
|
> [!Note]
|
|
> #### Sample data
|
|
>
|
|
> The following examples use the sample data written in the
|
|
> [Get started writing data guide](/influxdb3/clustered/get-started/write/).
|
|
> To run the example queries and return results,
|
|
> [write the sample data](/influxdb3/clustered/get-started/write/#write-line-protocol-to-influxdb)
|
|
> to your {{% product-name %}} database before running the example queries.
|
|
|
|
### View data information and statistics
|
|
|
|
#### Using Python and pandas
|
|
|
|
The following example uses the [InfluxDB client library for Python](/influxdb3/clustered/reference/client-libraries/v3/python/) to query an {{% product-name %}} database,
|
|
and then uses pandas [`DataFrame.info()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html) and [`DataFrame.describe()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html) methods to summarize the schema and distribution of the data.
|
|
|
|
1. In your editor, create a file (for example, `pandas-example.py`) and enter the following sample code:
|
|
<!-- tabs-wrapper allows code-placeholders to work when indented -->
|
|
{{% tabs-wrapper %}}
|
|
{{% code-placeholders "DATABASE_TOKEN|DATABASE_NAME" %}}
|
|
```py
|
|
# pandas-example.py
|
|
|
|
import influxdb_client_3 as InfluxDBClient3
|
|
import pandas
|
|
|
|
client = InfluxDBClient3.InfluxDBClient3(token='DATABASE_TOKEN',
|
|
host='{{< influxdb/host >}}',
|
|
database='DATABASE_NAME',
|
|
org="",
|
|
write_options=SYNCHRONOUS)
|
|
|
|
table = client.query("select * from home where room like '%'")
|
|
dataframe = table.to_pandas()
|
|
|
|
# Print information about the results DataFrame,
|
|
# including the index dtype and columns, non-null values, and memory usage.
|
|
dataframe.info()
|
|
|
|
# Calculate descriptive statistics that summarize the distribution of the results.
|
|
print(dataframe.describe())
|
|
```
|
|
{{% /code-placeholders %}}
|
|
{{% /tabs-wrapper %}}
|
|
|
|
2. Enter the following command in your terminal to execute the file using the Python interpreter:
|
|
|
|
```sh
|
|
python pandas-example.py
|
|
```
|
|
|
|
The output is similar to the following:
|
|
|
|
```sh
|
|
<class 'pandas.core.frame.DataFrame'>
|
|
RangeIndex: 411 entries, 0 to 410
|
|
Data columns (total 8 columns):
|
|
# Column Non-Null Count Dtype
|
|
--- ------ -------------- -----
|
|
0 co 405 non-null float64
|
|
1 host 2 non-null object
|
|
2 hum 406 non-null float64
|
|
3 room 411 non-null object
|
|
4 sensor 1 non-null object
|
|
5 sensor_id 2 non-null object
|
|
6 temp 411 non-null float64
|
|
7 time 411 non-null datetime64[ns]
|
|
dtypes: datetime64[ns](1), float64(3), object(4)
|
|
memory usage: 25.8+ KB
|
|
|
|
co hum temp time
|
|
count 405.000000 406.000000 411.000000 411
|
|
mean 5.320988 35.860591 23.803893 2008-06-12 13:33:49.074302208
|
|
min 0.000000 20.200000 18.400000 1970-01-01 00:00:01.641024
|
|
25% 0.000000 35.900000 22.200000 1970-01-01 00:00:01.685054600
|
|
50% 1.000000 36.000000 22.500000 2023-03-21 05:46:40
|
|
75% 9.000000 36.300000 22.800000 2023-07-15 21:34:10
|
|
max 26.000000 80.000000 74.000000 2023-07-17 02:07:00
|
|
std 7.640154 3.318794 8.408807 NaN
|
|
```
|