diff --git a/content/influxdb/cloud-dedicated/process-data/tools/pandas.md b/content/influxdb/cloud-dedicated/process-data/tools/pandas.md index fcb044666..6da4e80d8 100644 --- a/content/influxdb/cloud-dedicated/process-data/tools/pandas.md +++ b/content/influxdb/cloud-dedicated/process-data/tools/pandas.md @@ -1,11 +1,11 @@ --- title: Use pandas to analyze data list_title: pandas -seotitle: Use Python and pandas to analyze data +seotitle: Use Python and pandas to analyze and visualize data description: > Use the [pandas](https://pandas.pydata.org/) Python data analysis library - to analyze time series data stored in InfluxDB. -weight: 201 + to analyze and visualize time series data stored in InfluxDB Cloud Dedicated. +weight: 101 menu: influxdb_cloud_dedicated: parent: Use data analysis tools @@ -51,10 +51,10 @@ stored in InfluxDB. ## Install prerequisites -The examples in this guide assume using a Python virtual environment and the Flight SQL library for Python. -Installing `flightsql-dbapi` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow. +The examples in this guide assume using a Python virtual environment and the InfluxDB v3 [`influxdb3-python` Python client library](/influxdb/cloud-dedicated/reference/client-libraries/v3/python/). +For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/). -For more information, see how to [get started querying InfluxDB with Python and flightsql-dbapi](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/) +Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow. ## Install pandas @@ -68,35 +68,47 @@ pip install pandas ## Use PyArrow to convert query results to pandas -The following steps use Python, `flightsql-dbapi`, and `pyarrow` to query InfluxDB and stream Arrow data to a pandas `DataFrame`. +The following steps use Python, `influxdb3-python`, and `pyarrow` to query InfluxDB and stream Arrow data to a pandas `DataFrame`. 1. In your editor, copy and paste the following code to a new file--for example, `pandas-example.py`: - ```py - # pandas-example.py + {{% tabs-wrapper %}} +{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}} +```py +# pandas-example.py - from flightsql import FlightSQLClient - import pandas +from influxdb_client_3 import InfluxDBClient3 +import pandas - client = FlightSQLClient(host='cluster-id.influxdb.io', - token='INFLUX_READ_WRITE_TOKEN', - metadata={'database': 'INFLUX_DATABASE'}, - features={'metadata-reflection': 'true'}) +# Instantiate an InfluxDB client configured for a database +client = InfluxDBClient3( + "https://cluster-id.influxdb.io", + database="DATABASE_NAME", + token="DATABASE_TOKEN") - info = client.execute("SELECT * FROM home") +# Execute the query to retrieve all record batches in the stream +# formatted as a PyArrow Table. +table = client.query( + '''SELECT * + FROM home + WHERE time >= now() - INTERVAL '90 days' + ORDER BY time''' +) - reader = client.do_get(info.endpoints[0].ticket) +client.close() - # Read all record batches in the stream to a pandas DataFrame - dataframe = reader.read_pandas() +# Convert the PyArrow Table to a pandas DataFrame. +dataframe = table.to_pandas() - dataframe.info() - ``` +print(dataframe) +``` +{{% /code-placeholders %}} + {{% /tabs-wrapper %}} 2. Replace the following configuration values: - - **`INFLUX_READ_WRITE_TOKEN`**: Your InfluxDB token with read permissions on the databases you want to query. - - **`INFLUX_DATABASE`**: The name of your InfluxDB database. + - {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-dedicated/admin/tokens/) with read permissions on the databases you want to query. + - {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [database](/influxdb/cloud-dedicated/admin/databases/) to query. 3. In your terminal, use the Python interpreter to run the file: @@ -104,10 +116,47 @@ The following steps use Python, `flightsql-dbapi`, and `pyarrow` to query Influx python pandas-example.py ``` -The `pyarrow.flight.FlightStreamReader` [`read_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightStreamReader.html#pyarrow.flight.FlightStreamReader.read_pandas) method: +The example calls the following methods: -- Takes the same options as [`pyarrow.Table.to_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas). -- Reads all Arrow record batches in the stream to a `pyarrow.Table` and then converts the `Table` to a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame). +- [`InfluxDBClient3.query()`](/influxdb/cloud-dedicated/reference/client-libraries/v3/python/#influxdbclient3query): sends the query request and returns a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) that contains all the Arrow record batches from the response stream. + +- [`pyarrow.Table.to_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas): Creates a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame) from the data in the PyArrow `Table`. + +{{% influxdb/custom-timestamps %}} +{{% expand-wrapper %}} +{{% expand "View example results" %}} +```sh + co hum room temp time +0 0 35.9 Living Room 21.1 2022-01-02 11:46:40 +1 0 35.9 Kitchen 21.0 2022-01-02 11:46:40 +2 0 36.2 Kitchen 23.0 2022-01-02 12:46:40 +3 0 35.9 Living Room 21.4 2022-01-02 12:46:40 +4 0 36.1 Kitchen 22.7 2022-01-02 13:46:40 +5 0 36.0 Living Room 21.8 2022-01-02 13:46:40 +6 0 36.0 Kitchen 22.4 2022-01-02 14:46:40 +7 0 36.0 Living Room 22.2 2022-01-02 14:46:40 +8 0 36.0 Kitchen 22.5 2022-01-02 15:46:40 +9 0 35.9 Living Room 22.2 2022-01-02 15:46:40 +10 1 36.5 Kitchen 22.8 2022-01-02 16:46:40 +11 0 36.0 Living Room 22.4 2022-01-02 16:46:40 +12 1 36.3 Kitchen 22.8 2022-01-02 17:46:40 +13 0 36.1 Living Room 22.3 2022-01-02 17:46:40 +14 3 36.2 Kitchen 22.7 2022-01-02 18:46:40 +15 1 36.1 Living Room 22.3 2022-01-02 18:46:40 +16 7 36.0 Kitchen 22.4 2022-01-02 19:46:40 +17 4 36.0 Living Room 22.4 2022-01-02 19:46:40 +18 9 36.0 Kitchen 22.7 2022-01-02 20:46:40 +19 5 35.9 Living Room 22.6 2022-01-02 20:46:40 +20 18 36.9 Kitchen 23.3 2022-01-02 21:46:40 +21 9 36.2 Living Room 22.8 2022-01-02 21:46:40 +22 22 36.6 Kitchen 23.1 2022-01-02 22:46:40 +23 14 36.3 Living Room 22.5 2022-01-02 22:46:40 +24 26 36.5 Kitchen 22.7 2022-01-02 23:46:40 +25 17 36.4 Living Room 22.2 2022-01-02 23:46:40 +``` +{{% /expand %}} +{{% /expand-wrapper %}} +{{% /influxdb/custom-timestamps %}} Next, [use pandas to analyze data](#use-pandas-to-analyze-data). @@ -118,25 +167,34 @@ Next, [use pandas to analyze data](#use-pandas-to-analyze-data). ### View data information and statistics -The following example uses the DataFrame `info()` and `describe()` -methods to print information about the DataFrame. +The following example shows how to use pandas `DataFrame` methods to transform and summarize data stored in {{% cloud-name %}}. +{{% code-placeholders "DATABASE_NAME|DATABASE_TOKEN" %}} ```py # pandas-example.py -from flightsql import FlightSQLClient +from influxdb_client_3 import InfluxDBClient3 import pandas -client = FlightSQLClient(host='cluster-id.influxdb.io', - token='INFLUX_READ_WRITE_TOKEN', - metadata={'database': 'INFLUX_DATABASE'}, - features={'metadata-reflection': 'true'}) +# Instantiate an InfluxDB client configured for a database +client = InfluxDBClient3( + "https://cluster-id.influxdb.io", + database="DATABASE_NAME", + token="DATABASE_TOKEN") -info = client.execute("SELECT * FROM home") +# Execute the query to retrieve all record batches in the stream +# formatted as a PyArrow Table. +table = client.query( + '''SELECT * + FROM home + WHERE time >= now() - INTERVAL '90 days' + ORDER BY time''' +) -reader = client.do_get(info.endpoints[0].ticket) +client.close() -dataframe = reader.read_pandas() +# Convert the PyArrow Table to a pandas DataFrame. +dataframe = table.to_pandas() # Print information about the results DataFrame, # including the index dtype and columns, non-null values, and memory usage. @@ -144,8 +202,20 @@ dataframe.info() # Calculate descriptive statistics that summarize the distribution of the results. print(dataframe.describe()) -``` +# Extract a DataFrame column. +print(dataframe['temp']) + +# Print the DataFrame in Markdown format. +print(dataframe.to_markdown()) +``` +{{% /code-placeholders %}} + +Replace the following configuration values: + + - {{% code-placeholder-key %}}`DATABASE_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-dedicated/admin/tokens/) with read permissions on the databases you want to query. + - {{% code-placeholder-key %}}`DATABASE_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [database](/influxdb/cloud-dedicated/admin/databases/) to query. + ### Downsample time series The pandas library provides extensive features for working with time series data. @@ -153,19 +223,9 @@ The pandas library provides extensive features for working with time series data The [`pandas.DataFrame.resample()` method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html) downsamples and upsamples data to time-based groups--for example: ```py -from flightsql import FlightSQLClient -import pandas +# pandas-example.py -client = FlightSQLClient(host='cluster-id.influxdb.io', - token='INFLUX_READ_WRITE_TOKEN', - metadata={'database': 'INFLUX_DATABASE'}, - features={'metadata-reflection': 'true'}) - -info = client.execute("SELECT * FROM home") - -reader = client.do_get(info.endpoints[0].ticket) - -dataframe = reader.read_pandas() +... # Use the `time` column to generate a DatetimeIndex for the DataFrame dataframe = dataframe.set_index('time') @@ -180,24 +240,21 @@ resample = dataframe.resample("1H") print(resample['temp'].mean()) ``` +{{% influxdb/custom-timestamps %}} {{< expand-wrapper >}} {{% expand "View example results" %}} ```sh -time -1970-01-01 00:00:00 22.374138 -1970-01-01 01:00:00 NaN -1970-01-01 02:00:00 NaN -1970-01-01 03:00:00 NaN -1970-01-01 04:00:00 NaN - ... +time 2023-07-16 22:00:00 NaN 2023-07-16 23:00:00 22.600000 2023-07-17 00:00:00 22.513889 2023-07-17 01:00:00 22.208333 2023-07-17 02:00:00 22.300000 +... Freq: H, Name: temp, Length: 469323, dtype: float64 ``` {{% /expand %}} {{< /expand-wrapper >}} +{{% /influxdb/custom-timestamps %}} For more detail and examples, see the [pandas documentation](https://pandas.pydata.org/docs/index.html). diff --git a/content/influxdb/cloud-dedicated/process-data/tools/pyarrow.md b/content/influxdb/cloud-dedicated/process-data/tools/pyarrow.md index 148127b4b..01ddba7cd 100644 --- a/content/influxdb/cloud-dedicated/process-data/tools/pyarrow.md +++ b/content/influxdb/cloud-dedicated/process-data/tools/pyarrow.md @@ -52,7 +52,7 @@ and conversion of Arrow format data. ## Install prerequisites The examples in this guide assume using a Python virtual environment and the InfluxDB v3 [`influxdb3-python` Python client library](/influxdb/cloud-dedicated/reference/client-libraries/v3/python/). -For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/) +For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/). Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow. @@ -78,7 +78,7 @@ def querySQL(): database="DATABASE_NAME", token="DATABASE_TOKEN") - # Execute the query to retrieve data formatted as a PyArrow Table + # Execute the query to retrieve all record batches in the stream formatted as a PyArrow Table. table = client.query( '''SELECT * FROM home @@ -131,7 +131,8 @@ def querySQL(): database="DATABASE_NAME", token="DATABASE_TOKEN") - # Execute the query to retrieve data formatted as a PyArrow Table + # Execute the query to retrieve data + # formatted as a PyArrow Table table = client.query( '''SELECT * FROM home diff --git a/content/influxdb/cloud-serverless/process-data/tools/pandas.md b/content/influxdb/cloud-serverless/process-data/tools/pandas.md index 97b55919e..5ae4a9458 100644 --- a/content/influxdb/cloud-serverless/process-data/tools/pandas.md +++ b/content/influxdb/cloud-serverless/process-data/tools/pandas.md @@ -4,7 +4,7 @@ list_title: pandas seotitle: Use Python and pandas to analyze and visualize data description: > Use the [pandas](https://pandas.pydata.org/) Python data analysis library - to analyze and visualize data stored in InfluxDB Cloud Serverless. + to analyze and visualize time series data stored in InfluxDB Cloud Serverless. weight: 101 menu: influxdb_cloud_serverless: @@ -53,10 +53,10 @@ stored in an InfluxDB Cloud Serverless bucket. ## Install prerequisites -The examples in this guide assume using a Python virtual environment and the Flight SQL library for Python. -Installing `flightsql-dbapi` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow. +The examples in this guide assume using a Python virtual environment and the InfluxDB v3 [`influxdb3-python` Python client library](/influxdb/cloud-serverless/reference/client-libraries/v3/python/). +For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-serverless/query-data/execute-queries/flight-sql/python/). -For more information, see how to [get started querying InfluxDB with Python and flightsql-dbapi](/influxdb/cloud-serverless/query-data/sql/execute-queries/python/) +Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow. ## Install pandas @@ -70,35 +70,47 @@ pip install pandas ## Use PyArrow to convert query results to pandas -The following steps use Python, `flightsql-dbapi`, and `pyarrow` to query InfluxDB and stream Arrow data to a pandas `DataFrame`. +The following steps use Python, `influxdb3-python`, and `pyarrow` to query InfluxDB and stream Arrow data to a pandas `DataFrame`. 1. In your editor, copy and paste the following code to a new file--for example, `pandas-example.py`: - ```py - # pandas-example.py + {{% tabs-wrapper %}} +{{% code-placeholders "BUCKET_NAME|API_TOKEN" %}} +```py +# pandas-example.py - from flightsql import FlightSQLClient - import pandas +from influxdb_client_3 import InfluxDBClient3 +import pandas - client = FlightSQLClient(host='cloud2.influxdata.com', - token='INFLUX_READ_WRITE_TOKEN', - metadata={'database': 'INFLUX_BUCKET'}, - features={'metadata-reflection': 'true'}) +# Instantiate an InfluxDB client configured for a bucket +client = InfluxDBClient3( + "https://cloud2.influxdata.com", + database="BUCKET_NAME", + token="API_TOKEN") - info = client.execute("SELECT * FROM home") +# Execute the query to retrieve all record batches in the stream +# formatted as a PyArrow Table. +table = client.query( + '''SELECT * + FROM home + WHERE time >= now() - INTERVAL '90 days' + ORDER BY time''' +) - reader = client.do_get(info.endpoints[0].ticket) +client.close() - # Read all record batches in the stream to a pandas DataFrame - dataframe = reader.read_pandas() +# Convert the PyArrow Table to a pandas DataFrame. +dataframe = table.to_pandas() - dataframe.info() - ``` +print(dataframe) +``` +{{% /code-placeholders %}} + {{% /tabs-wrapper %}} 2. Replace the following configuration values: - - **`INFLUX_READ_WRITE_TOKEN`**: An InfluxDB token with _read_ permission to the bucket. - - **`INFLUX_BUCKET`**: The name of the InfluxDB bucket to query. + - {{% code-placeholder-key %}}`API_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-serverless/admin/tokens/) with read permissions on the buckets you want to query. + - {{% code-placeholder-key %}}`BUCKET_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [bucket](/influxdb/cloud-serverless/admin/buckets/) to query. 3. In your terminal, use the Python interpreter to run the file: @@ -106,39 +118,85 @@ The following steps use Python, `flightsql-dbapi`, and `pyarrow` to query Influx python pandas-example.py ``` -The `pyarrow.flight.FlightStreamReader` [`read_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.flight.FlightStreamReader.html#pyarrow.flight.FlightStreamReader.read_pandas) method: +The example calls the following methods: -- Takes the same options as [`pyarrow.Table.to_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas). -- Reads all Arrow record batches in the stream to a `pyarrow.Table` and then converts the `Table` to a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame). +- [`InfluxDBClient3.query()`](/influxdb/cloud-serverless/reference/client-libraries/v3/python/#influxdbclient3query): Sends the query request and returns a [`pyarrow.Table`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html) that contains all the Arrow record batches from the response stream. + +- [`pyarrow.Table.to_pandas()`](https://arrow.apache.org/docs/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas): Creates a [`pandas.DataFrame`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame) from the data in the PyArrow `Table`. + +{{% influxdb/custom-timestamps %}} +{{% expand-wrapper %}} +{{% expand "View example results" %}} +```sh + co hum room temp time +0 0 35.9 Living Room 21.1 2022-01-02 11:46:40 +1 0 35.9 Kitchen 21.0 2022-01-02 11:46:40 +2 0 36.2 Kitchen 23.0 2022-01-02 12:46:40 +3 0 35.9 Living Room 21.4 2022-01-02 12:46:40 +4 0 36.1 Kitchen 22.7 2022-01-02 13:46:40 +5 0 36.0 Living Room 21.8 2022-01-02 13:46:40 +6 0 36.0 Kitchen 22.4 2022-01-02 14:46:40 +7 0 36.0 Living Room 22.2 2022-01-02 14:46:40 +8 0 36.0 Kitchen 22.5 2022-01-02 15:46:40 +9 0 35.9 Living Room 22.2 2022-01-02 15:46:40 +10 1 36.5 Kitchen 22.8 2022-01-02 16:46:40 +11 0 36.0 Living Room 22.4 2022-01-02 16:46:40 +12 1 36.3 Kitchen 22.8 2022-01-02 17:46:40 +13 0 36.1 Living Room 22.3 2022-01-02 17:46:40 +14 3 36.2 Kitchen 22.7 2022-01-02 18:46:40 +15 1 36.1 Living Room 22.3 2022-01-02 18:46:40 +16 7 36.0 Kitchen 22.4 2022-01-02 19:46:40 +17 4 36.0 Living Room 22.4 2022-01-02 19:46:40 +18 9 36.0 Kitchen 22.7 2022-01-02 20:46:40 +19 5 35.9 Living Room 22.6 2022-01-02 20:46:40 +20 18 36.9 Kitchen 23.3 2022-01-02 21:46:40 +21 9 36.2 Living Room 22.8 2022-01-02 21:46:40 +22 22 36.6 Kitchen 23.1 2022-01-02 22:46:40 +23 14 36.3 Living Room 22.5 2022-01-02 22:46:40 +24 26 36.5 Kitchen 22.7 2022-01-02 23:46:40 +25 17 36.4 Living Room 22.2 2022-01-02 23:46:40 +``` +{{% /expand %}} +{{% /expand-wrapper %}} +{{% /influxdb/custom-timestamps %}} Next, [use pandas to analyze data](#use-pandas-to-analyze-data). ## Use pandas to analyze data -- [View information and statistics for data](#view-information-and-statistics-for-data) +- [View data information and statistics](#view-data-information-and-statistics) - [Downsample time series](#downsample-time-series) ### View data information and statistics -The following example uses the DataFrame `info()` and `describe()` -methods to print information about the DataFrame. +The following example shows how to use pandas `DataFrame` methods to transform and summarize data stored in {{% cloud-name %}}. +{{% code-placeholders "BUCKET_NAME|API_TOKEN" %}} ```py # pandas-example.py -from flightsql import FlightSQLClient +from influxdb_client_3 import InfluxDBClient3 import pandas -client = FlightSQLClient(host='cloud2.influxdata.com', - token='INFLUX_READ_WRITE_TOKEN', - metadata={'database': 'INFLUX_BUCKET'}, - features={'metadata-reflection': 'true'}) +# Instantiate an InfluxDB client configured for a bucket +client = InfluxDBClient3( + "https://cloud2.influxdata.com", + database="BUCKET_NAME", + token="API_TOKEN") -info = client.execute("SELECT * FROM home") +# Execute the query to retrieve all record batches in the stream +# formatted as a PyArrow Table. +table = client.query( + '''SELECT * + FROM home + WHERE time >= now() - INTERVAL '90 days' + ORDER BY time''' +) -reader = client.do_get(info.endpoints[0].ticket) +client.close() -dataframe = reader.read_pandas() +# Convert the PyArrow Table to a pandas DataFrame. +dataframe = table.to_pandas() # Print information about the results DataFrame, # including the index dtype and columns, non-null values, and memory usage. @@ -146,8 +204,20 @@ dataframe.info() # Calculate descriptive statistics that summarize the distribution of the results. print(dataframe.describe()) -``` +# Extract a DataFrame column. +print(dataframe['temp']) + +# Print the DataFrame in Markdown format. +print(dataframe.to_markdown()) +``` +{{% /code-placeholders %}} + +Replace the following configuration values: + + - {{% code-placeholder-key %}}`API_TOKEN`{{% /code-placeholder-key %}}: An InfluxDB [token](/influxdb/cloud-serverless/admin/tokens/) with read permissions on the buckets you want to query. + - {{% code-placeholder-key %}}`BUCKET_NAME`{{% /code-placeholder-key %}}: The name of the InfluxDB [bucket](/influxdb/cloud-serverless/admin/buckets/) to query. + ### Downsample time series The pandas library provides extensive features for working with time series data. @@ -155,19 +225,9 @@ The pandas library provides extensive features for working with time series data The [`pandas.DataFrame.resample()` method](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html) downsamples and upsamples data to time-based groups--for example: ```py -from flightsql import FlightSQLClient -import pandas +# pandas-example.py -client = FlightSQLClient(host='cloud2.influxdata.com', - token='INFLUX_READ_WRITE_TOKEN', - metadata={'database': 'INFLUX_BUCKET'}, - features={'metadata-reflection': 'true'}) - -info = client.execute("SELECT * FROM home") - -reader = client.do_get(info.endpoints[0].ticket) - -dataframe = reader.read_pandas() +... # Use the `time` column to generate a DatetimeIndex for the DataFrame dataframe = dataframe.set_index('time') @@ -182,24 +242,21 @@ resample = dataframe.resample("1H") print(resample['temp'].mean()) ``` +{{% influxdb/custom-timestamps %}} {{< expand-wrapper >}} {{% expand "View example results" %}} ```sh -time -1970-01-01 00:00:00 22.374138 -1970-01-01 01:00:00 NaN -1970-01-01 02:00:00 NaN -1970-01-01 03:00:00 NaN -1970-01-01 04:00:00 NaN - ... +time 2023-07-16 22:00:00 NaN 2023-07-16 23:00:00 22.600000 2023-07-17 00:00:00 22.513889 2023-07-17 01:00:00 22.208333 2023-07-17 02:00:00 22.300000 +... Freq: H, Name: temp, Length: 469323, dtype: float64 ``` {{% /expand %}} {{< /expand-wrapper >}} +{{% /influxdb/custom-timestamps %}} For more detail and examples, see the [pandas documentation](https://pandas.pydata.org/docs/index.html). diff --git a/content/influxdb/cloud-serverless/process-data/tools/pyarrow.md b/content/influxdb/cloud-serverless/process-data/tools/pyarrow.md index 469751ef1..af882bec4 100644 --- a/content/influxdb/cloud-serverless/process-data/tools/pyarrow.md +++ b/content/influxdb/cloud-serverless/process-data/tools/pyarrow.md @@ -53,7 +53,7 @@ and conversion of Arrow format data. ## Install prerequisites The examples in this guide assume using a Python virtual environment and the InfluxDB v3 [`influxdb3-python` Python client library](/influxdb/cloud-serverless/reference/client-libraries/v3/python/). -For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-dedicated/query-data/execute-queries/flight-sql/python/) +For more information, see how to [get started using Python to query InfluxDB](/influxdb/cloud-serverless/query-data/execute-queries/flight-sql/python/). Installing `influxdb3-python` also installs the [`pyarrow`](https://arrow.apache.org/docs/python/index.html) library that provides Python bindings for Apache Arrow. @@ -79,7 +79,7 @@ def querySQL(): database="BUCKET_NAME", token="API_TOKEN") - # Execute the query to retrieve data formatted as a PyArrow Table + # Execute the query to retrieve all record batches in the stream formatted as a PyArrow Table. table = client.query( '''SELECT * FROM home @@ -132,7 +132,8 @@ def querySQL(): database="BUCKET_NAME", token="API_TOKEN") - # Execute the query to retrieve data formatted as a PyArrow Table + # Execute the query to retrieve data + # formatted as a PyArrow Table table = client.query( '''SELECT * FROM home