docs-v2

8.8 KiB

Raw Blame History

Integrate {{< product-name >}} with Snowflake and other Iceberg-compatible tools without the need for complex ETL processes. Export time series data snapshots from InfluxDB into Apache Iceberg format and query it from Snowflake.

Key Benefits

Efficient data access: Query your data directly from Snowflake.
Cost-effective storage: Optimize data retention and minimize storage costs.
Supports AI and ML workloads: Enhance machine learning applications by making time-series data accessible in Snowflake.

Prerequisites

Before you begin, ensure you have the following:

A Snowflake account with necessary permissions.
Access to an external object store (such as AWS S3).
Familiarity with Apache Iceberg and Snowflake.

Integrate InfluxDB 3 with Snowflake

Configure external storage
Set up a catalog integration in Snowflake
Export InfluxDB data to Iceberg format
Create an Iceberg table in Snowflake
Query your data in Snowflake

Create a Snowflake external stage

Use the CREATE STAGE Snowflake SQL command to set up an external storage location (such as AWS S3) to store Iceberg table data and metadata--for example:

Example: Configure an S3 stage in Snowflake

CREATE STAGE my_s3_stage 
URL='s3://my-bucket/'
STORAGE_INTEGRATION=my_storage_integration;

### Set up a catalog integration in Snowflake

Set up a catalog integration in Snowflake to manage and load Iceberg tables efficiently.

#### Example: Create a catalog integration in Snowflake

```sql
CREATE CATALOG INTEGRATION my_catalog_integration
  CATALOG_SOURCE = 'OBJECT_STORE'
  TABLE_FORMAT = 'ICEBERG'
  ENABLED = TRUE;

For more information, refer to the Snowflake documentation.

Export InfluxDB data to Iceberg format

Use InfluxData's Iceberg exporter to convert and export your time-series data from your {{% product-name omit="Clustered" %}} cluster to the Iceberg table format.

Example: Export data using Iceberg exporter

This example assumes the following:

You have followed the example for writing and querying data in the IOx README.
You've configured compaction to trigger more quickly with these environment variables:
- INFLUXDB_IOX_COMPACTION_MIN_NUM_L0_FILES_TO_COMPACT=1
- INFLUXDB_IOX_COMPACTION_MIN_NUM_L1_FILES_TO_COMPACT=1
You have a config.json.

Example `config.json`

{
    "exports": [
        {
            "namespace": "company_sensors",
            "table_name": "cpu"
        }
    ]
}

Running the export command

$ influxdb_iox iceberg export \
  --catalog-dsn postgresql://postgres@localhost:5432/postgres \
  --source-object-store file 
  --source-data-dir ~/.influxdb_iox/object_store \
  --sink-object-store file \
  --sink-data-dir /tmp/iceberg \
  --export-config-path config.json

The export command outputs an absolute path to an Iceberg metadata file:

/tmp/iceberg/company_sensors/cpu/metadata/v1.metadata.json

Example: Querying the exported metadata using DuckDB

$ duckdb
D SELECT * FROM iceberg_scan('/tmp/iceberg/metadata/v1.metadata.json') LIMIT 1;
┌───────────┬──────────────────────┬─────────────────────┬─────────────┬───┬────────────┬───────────────┬─────────────┬────────────────────┬────────────────────┐
│    cpu    │         host         │        time         │ usage_guest │ … │ usage_nice │ usage_softirq │ usage_steal │    usage_system    │     usage_user     │
│  varchar  │       varchar        │      timestamp      │   double    │   │   double   │    double     │   double    │       double       │       double       │
├───────────┼──────────────────────┼─────────────────────┼─────────────┼───┼────────────┼───────────────┼─────────────┼────────────────────┼────────────────────┤
│ cpu-total │ Andrews-MBP.hsd1.m…  │ 2020-06-11 16:52:00 │         0.0 │ … │        0.0 │           0.0 │         0.0 │ 1.1173184357541899 │ 0.9435133457479826 │
├───────────┴──────────────────────┴─────────────────────┴─────────────┴───┴────────────┴───────────────┴─────────────┴────────────────────┴────────────────────┤
│ 1 rows                                                                                                                                   13 columns (9 shown) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

Next, create an Iceberg table in Snowflake.

Create an Iceberg table in Snowflake

After exporting the data, create an Iceberg table in Snowflake.

Example: Create an Iceberg table in Snowflake

CREATE ICEBERG TABLE my_iceberg_table
  EXTERNAL_VOLUME = 'my_external_volume'
  METADATA_FILE_PATH = 's3://my-bucket/path/to/metadata.json';

Ensure that EXTERNAL_VOLUME and METADATA_FILE_PATH point to your external storage and metadata file.

Query the Iceberg table from Snowflake

Once the Iceberg table is set up, you can query it using standard SQL in Snowflake.

Example: Query the Iceberg table

SELECT * FROM my_iceberg_table
WHERE timestamp > '2025-01-01';

Interfaces for using Iceberg integration

Use the CLI to trigger snapshot exports
Use the API to manage and configure snapshots
Use SQL in Snowflake to query Iceberg tables

Use the CLI to trigger snapshot exports

Example: Enable Iceberg feature and export a snapshot

# Enable Iceberg feature
$ influxctl enable-iceberg

# Export a snapshot
$ influxctl export --namespace foo --table bar

Use the API to manage and configure snapshots

Use the {{% product-name %}} HTTP API to export snapshots and check status.

Example: Export a snapshot

This example demonstrates how to export a snapshot of your data from InfluxDB to an Iceberg table using the HTTP API.

Method: POST
Endpoint: /snapshots/export
Request body:

{
  "namespace": "foo",
  "table": "bar"
}

The POST request to the /snapshots/export endpoint triggers the export of data from the specified namespace and table in InfluxDB to an Iceberg table. The request body specifies the namespace (foo) and the table (bar) to be exported.

Example: Check snapshot status

This example shows how to check the status of an ongoing or completed snapshot export using the HTTP API.

Method: GET
Endpoint: /snapshots/status

The GET request to the /snapshots/status endpoint retrieves the status of the snapshot export. This can be used to monitor the progress of the export or verify its completion.

Considerations and limitations

When exporting data from InfluxDB to an Iceberg table, keep the following considerations and limitations in mind:

Data consistency: Ensure that the exported data in the Iceberg table is consistent with the source data in InfluxDB.
Performance: Query performance may vary based on data size and query complexity.
Feature support: Some advanced features of InfluxDB may not be fully supported in Snowflake through Iceberg integration.

8.8 KiB Raw Blame History