8.8 KiB
Integrate {{< product-name >}} with Snowflake and other Iceberg-compatible tools without the need for complex ETL processes. Export time series data snapshots from InfluxDB into Apache Iceberg format and query it from Snowflake.
Key Benefits
- Efficient data access: Query your data directly from Snowflake.
- Cost-effective storage: Optimize data retention and minimize storage costs.
- Supports AI and ML workloads: Enhance machine learning applications by making time-series data accessible in Snowflake.
Prerequisites
Before you begin, ensure you have the following:
- A Snowflake account with necessary permissions.
- Access to an external object store (such as AWS S3).
- Familiarity with Apache Iceberg and Snowflake.
Integrate InfluxDB 3 with Snowflake
- Configure external storage
- Set up a catalog integration in Snowflake
- Export InfluxDB data to Iceberg format
- Create an Iceberg table in Snowflake
- Query your data in Snowflake
Create a Snowflake external stage
Use the CREATE STAGE
Snowflake SQL command to set up an external storage location
(such as AWS S3) to store Iceberg table data and metadata--for example:
Example: Configure an S3 stage in Snowflake
CREATE STAGE my_s3_stage
URL='s3://my-bucket/'
STORAGE_INTEGRATION=my_storage_integration;
### Set up a catalog integration in Snowflake
Set up a catalog integration in Snowflake to manage and load Iceberg tables efficiently.
#### Example: Create a catalog integration in Snowflake
```sql
CREATE CATALOG INTEGRATION my_catalog_integration
CATALOG_SOURCE = 'OBJECT_STORE'
TABLE_FORMAT = 'ICEBERG'
ENABLED = TRUE;
For more information, refer to the Snowflake documentation.
Export InfluxDB data to Iceberg format
Use InfluxData's Iceberg exporter to convert and export your time-series data from your {{% product-name omit="Clustered" %}} cluster to the Iceberg table format.
Example: Export data using Iceberg exporter
This example assumes the following:
- You have followed the example for writing and querying data in the IOx README.
- You've configured compaction to trigger more quickly with these environment variables:
INFLUXDB_IOX_COMPACTION_MIN_NUM_L0_FILES_TO_COMPACT=1
INFLUXDB_IOX_COMPACTION_MIN_NUM_L1_FILES_TO_COMPACT=1
- You have a
config.json
.
Example config.json
{
"exports": [
{
"namespace": "company_sensors",
"table_name": "cpu"
}
]
}
Running the export command
$ influxdb_iox iceberg export \
--catalog-dsn postgresql://postgres@localhost:5432/postgres \
--source-object-store file
--source-data-dir ~/.influxdb_iox/object_store \
--sink-object-store file \
--sink-data-dir /tmp/iceberg \
--export-config-path config.json
The export command outputs an absolute path to an Iceberg metadata file:
/tmp/iceberg/company_sensors/cpu/metadata/v1.metadata.json
Example: Querying the exported metadata using DuckDB
$ duckdb
D SELECT * FROM iceberg_scan('/tmp/iceberg/metadata/v1.metadata.json') LIMIT 1;
┌───────────┬──────────────────────┬─────────────────────┬─────────────┬───┬────────────┬───────────────┬─────────────┬────────────────────┬────────────────────┐
│ cpu │ host │ time │ usage_guest │ … │ usage_nice │ usage_softirq │ usage_steal │ usage_system │ usage_user │
│ varchar │ varchar │ timestamp │ double │ │ double │ double │ double │ double │ double │
├───────────┼──────────────────────┼─────────────────────┼─────────────┼───┼────────────┼───────────────┼─────────────┼────────────────────┼────────────────────┤
│ cpu-total │ Andrews-MBP.hsd1.m… │ 2020-06-11 16:52:00 │ 0.0 │ … │ 0.0 │ 0.0 │ 0.0 │ 1.1173184357541899 │ 0.9435133457479826 │
├───────────┴──────────────────────┴─────────────────────┴─────────────┴───┴────────────┴───────────────┴─────────────┴────────────────────┴────────────────────┤
│ 1 rows 13 columns (9 shown) │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
Next, create an Iceberg table in Snowflake.
Create an Iceberg table in Snowflake
After exporting the data, create an Iceberg table in Snowflake.
Example: Create an Iceberg table in Snowflake
CREATE ICEBERG TABLE my_iceberg_table
EXTERNAL_VOLUME = 'my_external_volume'
METADATA_FILE_PATH = 's3://my-bucket/path/to/metadata.json';
Ensure that EXTERNAL_VOLUME
and METADATA_FILE_PATH
point to your external storage and metadata file.
Query the Iceberg table from Snowflake
Once the Iceberg table is set up, you can query it using standard SQL in Snowflake.
Example: Query the Iceberg table
SELECT * FROM my_iceberg_table
WHERE timestamp > '2025-01-01';
Interfaces for using Iceberg integration
- Use the CLI to trigger snapshot exports
- Use the API to manage and configure snapshots
- Use SQL in Snowflake to query Iceberg tables
Use the CLI to trigger snapshot exports
Example: Enable Iceberg feature and export a snapshot
# Enable Iceberg feature
$ influxctl enable-iceberg
# Export a snapshot
$ influxctl export --namespace foo --table bar
Use the API to manage and configure snapshots
Use the {{% product-name %}} HTTP API to export snapshots and check status.
Example: Export a snapshot
This example demonstrates how to export a snapshot of your data from InfluxDB to an Iceberg table using the HTTP API.
- Method:
POST
- Endpoint:
/snapshots/export
- Request body:
{
"namespace": "foo",
"table": "bar"
}
The POST
request to the /snapshots/export
endpoint triggers the export of data from the specified namespace and table in InfluxDB to an Iceberg table. The request body specifies the namespace (foo
) and the table (bar
) to be exported.
Example: Check snapshot status
This example shows how to check the status of an ongoing or completed snapshot export using the HTTP API.
- Method:
GET
- Endpoint:
/snapshots/status
The GET
request to the /snapshots/status
endpoint retrieves the status of the snapshot export. This can be used to monitor the progress of the export or verify its completion.
Considerations and limitations
When exporting data from InfluxDB to an Iceberg table, keep the following considerations and limitations in mind:
- Data consistency: Ensure that the exported data in the Iceberg table is consistent with the source data in InfluxDB.
- Performance: Query performance may vary based on data size and query complexity.
- Feature support: Some advanced features of InfluxDB may not be fully supported in Snowflake through Iceberg integration.