Merge pull request #7560 from influxdata/cn/remove-obsolete-docs-infra

fix: Remove outdated documentation and infrastructure having to do with Kafka
pull/24376/head
kodiakhq[bot] 2023-04-14 17:25:22 +00:00 committed by GitHub
commit e2b1acf1c0
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 25 additions and 208 deletions

View File

@ -229,8 +229,6 @@ jobs:
# setup multiple docker images (see https://circleci.com/docs/2.0/configuration-reference/#docker)
docker:
- image: quay.io/influxdb/rust:ci
- image: vectorized/redpanda:v22.1.5
command: redpanda start --overprovisioned --smp 1 --memory 1G --reserve-memory 0M
- image: postgres
environment:
POSTGRES_HOST_AUTH_METHOD: trust
@ -247,7 +245,6 @@ jobs:
# Run integration tests
TEST_INTEGRATION: 1
INFLUXDB_IOX_INTEGRATION_LOCAL: 1
KAFKA_CONNECT: "localhost:9092"
POSTGRES_USER: postgres
TEST_INFLUXDB_IOX_CATALOG_DSN: "postgres://postgres@localhost/iox_shared"
# When removing this, also remove the ignore on the test in trogging/src/cli.rs

View File

@ -1,3 +1,2 @@
# Ignore everything
**
!docker/redpanda.gpg

View File

@ -42,26 +42,6 @@ Here are useful metrics
| query_access_pruned_chunks_total | pruned_chunks | Number of chunks of a table pruned while running queries |
| query_access_pruned_rows_total | pruned_rows | Number of chunks of a table pruned while running queries |
### Read buffer RUB
| Metric name | Code Name | Description |
| --- | --- | --- |
| read_buffer_column_total | columns_total | Total number of columns in read buffer |
| read_buffer_column_values | column_values_total | Total number of values stored in read buffer column encodings, further segmented by nullness |
| read_buffer_column_raw_bytes | column_raw_bytes_total | Estimated uncompressed data size for read buffer columns, further segmented by nullness |
### Ingest Request (from Kafka to Query Server)
| Metric name | Code Name | Description |
| --- | --- | --- |
| write_buffer_ingest_requests_total | red | Total number of write requests |
| write_buffer_read_bytes_total | bytes_read | Total number of write requested bytes |
| write_buffer_last_sequence_number | last_sequence_number | sequence number of last write request |
| write_buffer_sequence_number_lag | sequence_number_lag | The difference between the the last sequence number available (e.g. Kafka offset) and (= minus) last consumed sequence number |
| write_buffer_last_min_ts | last_min_ts | Minimum timestamp of last write as unix timestamp in nanoseconds |
| write_buffer_last_max_ts | last_max_ts | Maximum timestamp of last write as unix timestamp in nanoseconds |
| write_buffer_last_ingest_ts | last_ingest_ts | Last seen ingest timestamp as unix timestamp in nanoseconds |
### jemalloc
| Metric name | Code Name | Description |
| --- | --- | --- |

View File

@ -24,11 +24,7 @@ The end to end tests are run using the `cargo test --test end_to_end` command, a
`TEST_INTEGRATION` and `TEST_INFLUXDB_IOX_CATALOG_DSN` environment variables. NOTE if you don't set
these variables the tests will "pass" locally (really they will be skipped).
By default, the integration tests for the Kafka-based write buffer are not run. To run these
you need to set the `KAFKA_CONNECT` environment variable and `TEST_INTEGRATION=1`.
For example, you can run this docker compose to get redpanda (a kafka-compatible message queue)
and postgres running:
For example, you can run this docker compose to get postgres running:
```shell
docker-compose -f integration-docker-compose.yml up
@ -38,12 +34,11 @@ In another terminal window, you can run:
```shell
export TEST_INTEGRATION=1
export KAFKA_CONNECT=localhost:9092
export TEST_INFLUXDB_IOX_CATALOG_DSN=postgresql://postgres@localhost:5432/postgres
cargo test --workspace
```
Or for just the end-to-end tests (and not general tests or kafka):
Or for just the end-to-end tests (and not general tests):
```shell
TEST_INTEGRATION=1 TEST_INFLUXDB_IOX_CATALOG_DSN=postgresql://postgres@localhost:5432/postgres cargo test --test end_to_end
@ -72,31 +67,6 @@ You can also see more logging using the `LOG_FILTER` variable. For example:
LOG_FILTER=debug,sqlx=warn,h2=warn
```
## Object storage
### To run the tests or not run the tests
If you are testing integration with some or all of the object storage options, you'll have more
setup to do.
By default, `cargo test -p object_store` does not run any tests that actually contact
any cloud services: tests that do contact the services will silently pass.
To run integration tests, use `TEST_INTEGRATION=1 cargo test -p object_store`, which will run the
tests that contact the cloud services and fail them if the required environment variables aren't
set.
### Configuration differences when running the tests
When running `influxdb_iox run`, you can pick one object store to use. When running the tests, you
can run them against all the possible object stores. There's still only one `INFLUXDB_IOX_BUCKET`
variable, though, so that will set the bucket name for all configured object stores. Use the same
bucket name when setting up the different services.
Other than possibly configuring multiple object stores, configuring the tests to use the object
store services is the same as configuring the server to use an object store service. See the output
of `influxdb_iox run --help` for instructions.
## InfluxDB 2 Client
The `influxdb2_client` crate may be used by people using InfluxDB 2.0 OSS, and should be compatible

View File

@ -19,22 +19,18 @@ cd influxdb_iox
cargo build --release --features=pprof
```
You can also install the `influxdb_iox` command locally via
You can also install the `influxdb_iox` command locally via
```shell
cd influxdb_iox
cargo install --path influxdb_iox
```
## Step 2: Start kafka and postgres
## Step 2: Start postgres
Now, start up kafka and postgres locally in docker containers:
Now, start up postgres locally in a docker container:
```shell
# get rskafka from https://github.com/influxdata/rskafka
cd rskafka
# Run kafka on localhost:9010
docker-compose -f docker-compose-kafka.yml up &
# now run postgres
# Run postgres
docker run -p 5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust postgres &
```
@ -47,19 +43,13 @@ you have postgres running locally on port 5432).
```shell
# initialize the catalog
INFLUXDB_IOX_WRITE_BUFFER_TYPE=kafka \
INFLUXDB_IOX_WRITE_BUFFER_ADDR=localhost:9010 \
INFLUXDB_IOX_WRITE_BUFFER_AUTO_CREATE_TOPICS=10 \
INFLUXDB_IOX_CATALOG_DSN=postgres://postgres@localhost:5432/postgres \
OBJECT_STORE=file \
DATABASE_DIRECTORY=~/data_dir \
LOG_FILTER=debug \
./target/release/influxdb_iox catalog setup
# initialize the kafka topic
INFLUXDB_IOX_WRITE_BUFFER_TYPE=kafka \
INFLUXDB_IOX_WRITE_BUFFER_ADDR=localhost:9010 \
INFLUXDB_IOX_WRITE_BUFFER_AUTO_CREATE_TOPICS=10 \
# initialize the topic
INFLUXDB_IOX_CATALOG_DSN=postgres://postgres@localhost:5432/postgres \
OBJECT_STORE=file \
DATABASE_DIRECTORY=~/data_dir \
@ -67,10 +57,10 @@ LOG_FILTER=debug \
./target/release/influxdb_iox catalog topic update iox-shared
```
## Inspecting Catalog and Kafka / Redpanda state
## Inspecting Catalog state
Depending on what you are trying to do, you may want to inspect the
catalog and/or the contents of Kafka / Redpands.
catalog.
You can run psql like this to inspect the catalog:
```shell
@ -111,41 +101,13 @@ postgres=# \d
postgres=#
```
You can mess with redpanda using `docker exec redpanda-0 rpk` like this:
```shell
$ docker exec redpanda-0 rpk topic list
NAME PARTITIONS REPLICAS
iox-shared 1 1
```
# Step 4: Run the services
## Run Router on port 8080/8081 (http/grpc)
```shell
INFLUXDB_IOX_BIND_ADDR=localhost:8080 \
INFLUXDB_IOX_GRPC_BIND_ADDR=localhost:8081 \
INFLUXDB_IOX_WRITE_BUFFER_TYPE=kafka \
INFLUXDB_IOX_WRITE_BUFFER_ADDR=localhost:9010 \
INFLUXDB_IOX_WRITE_BUFFER_AUTO_CREATE_TOPICS=10 \
INFLUXDB_IOX_CATALOG_DSN=postgres://postgres@localhost:5432/postgres \
OBJECT_STORE=file \
DATABASE_DIRECTORY=~/data_dir \
LOG_FILTER=info \
./target/release/influxdb_iox run router
```
## Run Ingester on port 8083/8083 (http/grpc)
```shell
INFLUXDB_IOX_BIND_ADDR=localhost:8083 \
INFLUXDB_IOX_GRPC_BIND_ADDR=localhost:8084 \
INFLUXDB_IOX_WRITE_BUFFER_TYPE=kafka \
INFLUXDB_IOX_WRITE_BUFFER_ADDR=localhost:9010 \
xINFLUXDB_IOX_WRITE_BUFFER_AUTO_CREATE_TOPICS=10 \
INFLUXDB_IOX_SHARD_INDEX_RANGE_START=0 \
INFLUXDB_IOX_SHARD_INDEX_RANGE_END=0 \
INFLUXDB_IOX_PAUSE_INGEST_SIZE_BYTES=5000000000 \
INFLUXDB_IOX_PERSIST_MEMORY_THRESHOLD_BYTES=4000000000 \
INFLUXDB_IOX_CATALOG_DSN=postgres://postgres@localhost:5432/postgres \
@ -153,15 +115,26 @@ INFLUXDB_IOX_MAX_HTTP_REQUEST_SIZE=100000000 \
OBJECT_STORE=file \
DATABASE_DIRECTORY=~/data_dir \
LOG_FILTER=info \
./target/release/influxdb_iox run ingester
./target/release/influxdb_iox run ingester2
```
## Run Router on port 8080/8081 (http/grpc)
```shell
INFLUXDB_IOX_BIND_ADDR=localhost:8080 \
INFLUXDB_IOX_GRPC_BIND_ADDR=localhost:8081 \
INFLUXDB_IOX_CATALOG_DSN=postgres://postgres@localhost:5432/postgres \
OBJECT_STORE=file \
DATABASE_DIRECTORY=~/data_dir \
LOG_FILTER=info \
./target/release/influxdb_iox run router2
```
# Step 5: Ingest data
You can load data using the influxdb_iox client:
```shell
influxdb_iox --host=http://localhost:8080 -v write test_db test_fixtures/lineproto/*.lp
influxdb_iox --host=http://localhost:8080 -v write test_db test_fixtures/lineproto/*.lp
```
Now you can post data to `http://localhost:8080` with your favorite load generating tool
@ -180,21 +153,14 @@ data. The default settings at the time of this writing would result in
posting fairly large requests (necessitating the
`INFLUXDB_IOX_MAX_HTTP_REQUEST_SIZE` setting above)
# Step 6: Profile
See [`profiling.md`](./profiling.md).
# Step 7: Clean up local state
If you find yourself needing to clean up postgres / kafka state use these commands:
If you find yourself needing to clean up postgres state, use this command:
```shell
docker ps -a -q | xargs docker stop
docker rm rskafka_proxy_1
docker rm rskafka_kafka-0_1
docker rm rskafka_kafka-1_1
docker rm rskafka_kafka-2_1
docker rm rskafka_zookeeper_1
docker volume rm rskafka_kafka_0_data rskafka_kafka_1_data rskafka_kafka_2_data rskafka_zookeeper_data
```

View File

@ -1,12 +1,5 @@
version: "3.9"
services:
redpanda:
pull_policy: always
image: docker.vectorized.io/vectorized/redpanda:latest
ports:
- 9092:9092
- 9644:9644
command: start --overprovisioned --smp 1 --memory 1G --reserve-memory 0M --node-id 0 --check=false
postgres:
pull_policy: always
image: postgres:latest

View File

@ -17,91 +17,3 @@ And the built binary has command line help:
For examples of specifications see the [schemas folder](schemas). The [full_example](schemas/full_example.toml) is the
most comprehensive with comments and example output.
## Use with two IOx servers and Kafka
The data generator tool can be used to simulate data being written to IOx in various shapes. This
is how to set up a local experiment for profiling or debugging purposes using a database in two IOx
instances: one writing to Kafka and one reading from Kafka.
If you're profiling IOx, be sure you've compiled and are running a release build using either:
```
cargo build --release
./target/release/influxdb_iox run database --server-id 1
```
or:
```
cargo run --release -- run database --server-id 1
```
Server ID is the only required attribute for running IOx; see `influxdb_iox run database --help` for all the
other configuration options for the server you may want to set for your experiment. Note that the
default HTTP API address is `127.0.0.1:8080` unless you set something different with `--api-bind`
and the default gRPC address is `127.0.0.1:8082` unless you set something different using
`--grpc-bind`.
For the Kafka setup, you'll need to start two IOx servers, so you'll need to set the bind addresses
for at least one of them. Here's an example of the two commands to run:
```
cargo run --release -- run router --server-id 1
cargo run --release -- run database --server-id 2 --api-bind 127.0.0.1:8084 --grpc-bind 127.0.0.1:8086
```
You'll also need to run a Kafka instance. There's a Docker compose script in the influxdb_iox
repo you can run with:
```
docker-compose -f docker/ci-kafka-docker-compose.yml up kafka
```
The Kafka instance will be accessible from `127.0.0.1:9093` if you run it with this script.
Once you have the two IOx servers and one Kafka instance running, create a database with a name in
the format `[orgname]_[bucketname]`. For example, create a database in IOx named `mlb_pirates`, and
the org you'll use in the data generator will be `mlb` and the bucket will be `pirates`. The
`DatabaseRules` defined in `src/bin/create_database.rs` will set up a database in the "writer" IOx
instance to write to Kafka and the database in the "reader" IOx instance to read from Kafka if
you run it with:
```
cargo run --release -p iox_data_generator --bin create_database -- --writer 127.0.0.1:8082 --reader 127.0.0.1:8086 mlb_pirates
```
This script adds 3 rows to a `writer_test` table because [this issue with the Kafka Consumer
needing data before it can find partitions](https://github.com/influxdata/influxdb_iox/issues/2189).
Once the database is created, decide what kind of data you would like to send it. You can use an
existing data generation schema in the `schemas` directory or create a new one, perhaps starting
from an existing schema as a guide. In this example, we're going to use
`iox_data_generator/schemas/cap-write.toml`.
Next, run the data generation tool as follows:
```
cargo run --release -p iox_data_generator -- --spec iox_data_generator/schemas/cap-write.toml --continue --host 127.0.0.1:8080 --token arbitrary --org mlb --bucket pirates
```
- `--spec iox_data_generator/schemas/cap-write.toml` sets the schema you want to use to generate the data
- `--continue` means the data generation tool should generate data every `sampling_interval` (which
is set in the schema) until we stop it
- `--host 127.0.0.1:8080` means to write to the writer IOx server running at the default HTTP API address
of `127.0.0.1:8080` (note this is NOT the gRPC address used by the `create_database` command)
- `--token arbitrary` - the data generator requires a token value but IOx doesn't use it, so this
can be any value.
- `--org mlb` is the part of the database name you created before the `_`
- `--bucket pirates` is the part of the database name you created after the `_`
You should be able to use `influxdb_iox sql -h http://127.0.0.1:8086` to connect to the gRPC of the reader
then `use database mlb_pirates;` and query the tables to see that the data is being inserted. That
is,
```
# in your influxdb_iox checkout
cargo run --release -- sql -h http://127.0.0.1:8086
```
Connecting to the writer instance won't show any data.