69939a5ae2
This improves performance of the the file output mode, which should make it easier to improve the performance of the core generation logic. Benchmarked via: ``` time \ ./target/release/iox_data_generator \ --spec iox_data_generator/schemas/fully-supported.toml \ --output /tmp/out \ --start '1 month ago' ``` Before: ``` Submitted 271608 total points real 10.912 10911567us user 3.129 3129032us sys 6.257 6257340us cpu 86% mem 7152 KiB ``` After: ``` Submitted 271588 total points real 2.291 2291364us user 1.969 1969357us sys 0.058 58030us cpu 88% mem 7104 KiB ``` That's 21.0% of the previous time. |
||
---|---|---|
.. | ||
benches | ||
schemas | ||
src | ||
Cargo.toml | ||
README.md |
README.md
iox_data_generator
The iox_data_generator
tool creates random data points according to a specification and loads them
into an iox
instance to simulate real data.
To build and run, first install Rust. Then from root of the influxdb_iox
repo run:
cargo build --release
And the built binary has command line help:
./target/release/iox_data_generator --help
For examples of specifications see the schemas folder
Use with two IOx servers and Kafka
The data generator tool can be used to simulate data being written to IOx in various shapes. This is how to set up a local experiment for profiling or debugging purposes using a database in two IOx instances: one writing to Kafka and one reading from Kafka.
If you're profiling IOx, be sure you've compiled and are running a release build using either:
cargo build --release
./target/release/influxdb_iox run --server-id 1
or:
cargo run --release -- run --server-id 1
Server ID is the only required attribute for running IOx; see influxdb_iox run --help
for all the
other configuration options for the server you may want to set for your experiment. Note that the
default HTTP API address is 127.0.0.1:8080
unless you set something different with --api-bind
and the default gRPC address is 127.0.0.1:8082
unless you set something different using
--grpc-bind
.
For the Kafka setup, you'll need to start two IOx servers, so you'll need to set the bind addresses for at least one of them. Here's an example of the two commands to run:
cargo run --release -- run --server-id 1
cargo run --release -- run --server-id 2 --api-bind 127.0.0.1:8084 --grpc-bind 127.0.0.1:8086
You'll also need to run a Kafka instance. There's a Docker compose script in the influxdb_iox repo you can run with:
docker-compose -f docker/ci-kafka-docker-compose.yml up kafka
The Kafka instance will be accessible from 127.0.0.1:9093
if you run it with this script.
Once you have the two IOx servers and one Kafka instance running, create a database with a name in
the format [orgname]_[bucketname]
. For example, create a database in IOx named mlb_pirates
, and
the org you'll use in the data generator will be mlb
and the bucket will be pirates
. The
DatabaseRules
defined in src/bin/create_database.rs
will set up a database in the "writer" IOx
instance to write to Kafka and the database in the "reader" IOx instance to read from Kafka if
you run it with:
cargo run --release -p iox_data_generator --bin create_database -- --writer 127.0.0.1:8082 --reader 127.0.0.1:8086 mlb_pirates
This script adds 3 rows to a writer_test
table because this issue with the Kafka Consumer
needing data before it can find partitions.
Once the database is created, decide what kind of data you would like to send it. You can use an
existing data generation schema in the schemas
directory or create a new one, perhaps starting
from an existing schema as a guide. In this example, we're going to use
iox_data_generator/schemas/cap-write.toml
.
Next, run the data generation tool as follows:
cargo run --release -p iox_data_generator -- --spec iox_data_generator/schemas/cap-write.toml --continue --host 127.0.0.1:8080 --token arbitrary --org mlb --bucket pirates
--spec iox_data_generator/schemas/cap-write.toml
sets the schema you want to use to generate the data--continue
means the data generation tool should generate data everysampling_interval
(which is set in the schema) until we stop it--host 127.0.0.1:8080
means to write to the writer IOx server running at the default HTTP API address of127.0.0.1:8080
(note this is NOT the gRPC address used by thecreate_database
command)--token arbitrary
- the data generator requires a token value but IOx doesn't use it, so this can be any value.--org mlb
is the part of the database name you created before the_
--bucket pirates
is the part of the database name you created after the_
You should be able to use influxdb_iox sql -h http://127.0.0.1:8086
to connect to the gRPC of the reader
then use database mlb_pirates;
and query the tables to see that the data is being inserted. That
is,
# in your influxdb_iox checkout
cargo run --release -- sql -h http://127.0.0.1:8086
Connecting to the writer instance won't show any data.