--- title: Summarize query results and data distribution description: > Query data stored in InfluxDB and use tools like pandas to summarize the results schema and distribution. menu: influxdb_cloud_dedicated: name: Summarize data parent: Process & visualize data weight: 101 influxdb/cloud-dedicated/tags: [analysis, pandas, pyarrow, python, schema] related: - /influxdb/cloud-dedicated/query-data/execute-queries/client-libraries/python/ --- Query data stored in InfluxDB and use tools like pandas to summarize the results schema and distribution. {{% note %}} #### Sample data The following examples use the sample data written in the [Get started writing data guide](/influxdb/cloud-dedicated/get-started/write/). To run the example queries and return results, [write the sample data](/influxdb/cloud-dedicated/get-started/write/#write-line-protocol-to-influxdb) to your {{% product-name %}} database before running the example queries. {{% /note %}} ### View data information and statistics #### Using Python and pandas The following example uses the [InfluxDB client library for Python](/influxdb/cloud-dedicated/reference/client-libraries/v3/python/) to query an {{% product-name %}} database, and then uses pandas [`DataFrame.info()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.info.html) and [`DataFrame.describe()`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.describe.html) methods to summarize the schema and distribution of the data. 1. In your editor, create a file (for example, `pandas-example.py`) and enter the following sample code: {{% tabs-wrapper %}} {{% code-placeholders "DATABASE_TOKEN|DATABASE_NAME" %}} ```py # pandas-example.py import influxdb_client_3 as InfluxDBClient3 import pandas client = InfluxDBClient3.InfluxDBClient3(token='DATABASE_TOKEN', host='{{< influxdb/host >}}', database='DATABASE_NAME', org="", write_options=SYNCHRONOUS) table = client.query("select * from home where room like '%'") dataframe = table.to_pandas() # Print information about the results DataFrame, # including the index dtype and columns, non-null values, and memory usage. dataframe.info() # Calculate descriptive statistics that summarize the distribution of the results. print(dataframe.describe()) ``` {{% /code-placeholders %}} {{% /tabs-wrapper %}} 2. Enter the following command in your terminal to execute the file using the Python interpreter: ```sh python pandas-example.py ``` The output is similar to the following: ```sh RangeIndex: 411 entries, 0 to 410 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 co 405 non-null float64 1 host 2 non-null object 2 hum 406 non-null float64 3 room 411 non-null object 4 sensor 1 non-null object 5 sensor_id 2 non-null object 6 temp 411 non-null float64 7 time 411 non-null datetime64[ns] dtypes: datetime64[ns](1), float64(3), object(4) memory usage: 25.8+ KB co hum temp time count 405.000000 406.000000 411.000000 411 mean 5.320988 35.860591 23.803893 2008-06-12 13:33:49.074302208 min 0.000000 20.200000 18.400000 1970-01-01 00:00:01.641024 25% 0.000000 35.900000 22.200000 1970-01-01 00:00:01.685054600 50% 1.000000 36.000000 22.500000 2023-03-21 05:46:40 75% 9.000000 36.300000 22.800000 2023-07-15 21:34:10 max 26.000000 80.000000 74.000000 2023-07-17 02:07:00 std 7.640154 3.318794 8.408807 NaN ```