influxdb/iox_data_generator
Carol (Nichols || Goulding) b982bdaf2f
fix: Derive Eq when we derive PartialEq and members can derive Eq
Allow this in generated code that we don't control, though.

Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
2022-08-11 15:04:06 -04:00
..
benches docs: Improving Readability r2 (#4781) 2022-06-04 16:46:57 +00:00
schemas docs: Improving Readability r2 (#4781) 2022-06-04 16:46:57 +00:00
src fix: Derive Eq when we derive PartialEq and members can derive Eq 2022-08-11 15:04:06 -04:00
Cargo.toml chore(deps): Bump serde_json from 1.0.82 to 1.0.83 (#5297) 2022-08-04 14:28:29 +00:00
README.md refactor: remove write buffer direction 2021-11-26 12:38:40 +01:00

README.md

iox_data_generator

The iox_data_generator tool creates random data points according to a specification and loads them into an iox instance to simulate real data.

To build and run, first install Rust. Then from root of the influxdb_iox repo run:

cargo build --release

And the built binary has command line help:

./target/release/iox_data_generator --help

For examples of specifications see the schemas folder. The full_example is the most comprehensive with comments and example output.

Use with two IOx servers and Kafka

The data generator tool can be used to simulate data being written to IOx in various shapes. This is how to set up a local experiment for profiling or debugging purposes using a database in two IOx instances: one writing to Kafka and one reading from Kafka.

If you're profiling IOx, be sure you've compiled and are running a release build using either:

cargo build --release
./target/release/influxdb_iox run database --server-id 1

or:

cargo run --release -- run database --server-id 1

Server ID is the only required attribute for running IOx; see influxdb_iox run database --help for all the other configuration options for the server you may want to set for your experiment. Note that the default HTTP API address is 127.0.0.1:8080 unless you set something different with --api-bind and the default gRPC address is 127.0.0.1:8082 unless you set something different using --grpc-bind.

For the Kafka setup, you'll need to start two IOx servers, so you'll need to set the bind addresses for at least one of them. Here's an example of the two commands to run:

cargo run --release -- run router --server-id 1
cargo run --release -- run database --server-id 2 --api-bind 127.0.0.1:8084 --grpc-bind 127.0.0.1:8086

You'll also need to run a Kafka instance. There's a Docker compose script in the influxdb_iox repo you can run with:

docker-compose -f docker/ci-kafka-docker-compose.yml up kafka

The Kafka instance will be accessible from 127.0.0.1:9093 if you run it with this script.

Once you have the two IOx servers and one Kafka instance running, create a database with a name in the format [orgname]_[bucketname]. For example, create a database in IOx named mlb_pirates, and the org you'll use in the data generator will be mlb and the bucket will be pirates. The DatabaseRules defined in src/bin/create_database.rs will set up a database in the "writer" IOx instance to write to Kafka and the database in the "reader" IOx instance to read from Kafka if you run it with:

cargo run --release -p iox_data_generator --bin create_database -- --writer 127.0.0.1:8082 --reader 127.0.0.1:8086 mlb_pirates

This script adds 3 rows to a writer_test table because this issue with the Kafka Consumer needing data before it can find partitions.

Once the database is created, decide what kind of data you would like to send it. You can use an existing data generation schema in the schemas directory or create a new one, perhaps starting from an existing schema as a guide. In this example, we're going to use iox_data_generator/schemas/cap-write.toml.

Next, run the data generation tool as follows:

cargo run --release -p iox_data_generator -- --spec iox_data_generator/schemas/cap-write.toml --continue --host 127.0.0.1:8080 --token arbitrary --org mlb --bucket pirates
  • --spec iox_data_generator/schemas/cap-write.toml sets the schema you want to use to generate the data
  • --continue means the data generation tool should generate data every sampling_interval (which is set in the schema) until we stop it
  • --host 127.0.0.1:8080 means to write to the writer IOx server running at the default HTTP API address of 127.0.0.1:8080 (note this is NOT the gRPC address used by the create_database command)
  • --token arbitrary - the data generator requires a token value but IOx doesn't use it, so this can be any value.
  • --org mlb is the part of the database name you created before the _
  • --bucket pirates is the part of the database name you created after the _

You should be able to use influxdb_iox sql -h http://127.0.0.1:8086 to connect to the gRPC of the reader then use database mlb_pirates; and query the tables to see that the data is being inserted. That is,

# in your influxdb_iox checkout
cargo run --release -- sql -h http://127.0.0.1:8086

Connecting to the writer instance won't show any data.