Merge pull request #7560 from influxdata/cn/remove-obsolete-docs-infra

fix: Remove outdated documentation and infrastructure having to do with Kafka
2023-04-14 17:25:22 +00:00 · 2023-04-14 17:25:22 +00:00 · e2b1acf1c0
parent d55d41b174 bc3b69ef3f
commit e2b1acf1c0
7 changed files with 25 additions and 208 deletions
--- a/.circleci/config.yml
+++ b/.circleci/config.yml
@ -229,8 +229,6 @@ jobs:
    # setup multiple docker images (see https://circleci.com/docs/2.0/configuration-reference/#docker)
    docker:
      - image: quay.io/influxdb/rust:ci
-      - image: vectorized/redpanda:v22.1.5
-        command: redpanda start --overprovisioned --smp 1 --memory 1G --reserve-memory 0M
      - image: postgres
        environment:
          POSTGRES_HOST_AUTH_METHOD: trust
@ -247,7 +245,6 @@ jobs:
      # Run integration tests
      TEST_INTEGRATION: 1
      INFLUXDB_IOX_INTEGRATION_LOCAL: 1
-      KAFKA_CONNECT: "localhost:9092"
      POSTGRES_USER: postgres
      TEST_INFLUXDB_IOX_CATALOG_DSN: "postgres://postgres@localhost/iox_shared"
      # When removing this, also remove the ignore on the test in trogging/src/cli.rs
--- a/docker/Dockerfile.ci.dockerignore
+++ b/docker/Dockerfile.ci.dockerignore
@ -1,3 +1,2 @@
 # Ignore everything
 **
-!docker/redpanda.gpg
--- a/docs/metrics.md
+++ b/docs/metrics.md
@ -42,26 +42,6 @@ Here are useful metrics
 | query_access_pruned_chunks_total | pruned_chunks | Number of chunks of a table pruned while running queries |
 | query_access_pruned_rows_total  | pruned_rows | Number of chunks of a table pruned while running queries |

-
-### Read buffer RUB
-| Metric name |  Code Name | Description |
-| --- | --- | --- |
-| read_buffer_column_total | columns_total | Total number of columns in read buffer |
-| read_buffer_column_values | column_values_total | Total number of values stored in read buffer column encodings, further segmented by nullness |
-| read_buffer_column_raw_bytes | column_raw_bytes_total | Estimated uncompressed data size for read buffer columns, further segmented by nullness |
-
-
-### Ingest Request (from Kafka to Query Server)
-| Metric name |  Code Name | Description |
-| --- | --- | --- |
-| write_buffer_ingest_requests_total | red | Total number of write requests |
-| write_buffer_read_bytes_total | bytes_read | Total number of write requested bytes |
-| write_buffer_last_sequence_number | last_sequence_number | sequence number of last write request |
-| write_buffer_sequence_number_lag  |  sequence_number_lag | The difference between the the last sequence number available (e.g. Kafka offset) and (= minus) last consumed sequence number |
-| write_buffer_last_min_ts | last_min_ts | Minimum timestamp of last write as unix timestamp in nanoseconds |
-| write_buffer_last_max_ts | last_max_ts | Maximum timestamp of last write as unix timestamp in nanoseconds |
-| write_buffer_last_ingest_ts | last_ingest_ts | Last seen ingest timestamp as unix timestamp in nanoseconds |
-
 ### jemalloc
 | Metric name |  Code Name | Description |
 | --- | --- | --- |
--- a/docs/testing.md
+++ b/docs/testing.md
@ -24,11 +24,7 @@ The end to end tests are run using the `cargo test --test end_to_end` command, a
 `TEST_INTEGRATION` and `TEST_INFLUXDB_IOX_CATALOG_DSN` environment variables. NOTE if you don't set
 these variables the tests will "pass" locally (really they will be skipped).

-By default, the integration tests for the Kafka-based write buffer are not run. To run these
-you need to set the `KAFKA_CONNECT` environment variable and `TEST_INTEGRATION=1`.
-
-For example, you can run this docker compose to get redpanda (a kafka-compatible message queue)
-and postgres running:
+For example, you can run this docker compose to get postgres running:

 ```shell
 docker-compose -f integration-docker-compose.yml up
@ -38,12 +34,11 @@ In another terminal window, you can run:

 ```shell
 export TEST_INTEGRATION=1
-export KAFKA_CONNECT=localhost:9092
 export TEST_INFLUXDB_IOX_CATALOG_DSN=postgresql://postgres@localhost:5432/postgres
 cargo test --workspace
 ```

-Or for just the end-to-end tests (and not general tests or kafka):
+Or for just the end-to-end tests (and not general tests):

 ```shell
 TEST_INTEGRATION=1 TEST_INFLUXDB_IOX_CATALOG_DSN=postgresql://postgres@localhost:5432/postgres cargo test --test end_to_end
@ -72,31 +67,6 @@ You can also see more logging using the `LOG_FILTER` variable. For example:
 LOG_FILTER=debug,sqlx=warn,h2=warn
 ```

-## Object storage
-
-### To run the tests or not run the tests
-
-If you are testing integration with some or all of the object storage options, you'll have more
-setup to do.
-
-By default, `cargo test -p object_store` does not run any tests that actually contact
-any cloud services: tests that do contact the services will silently pass.
-
-To run integration tests, use `TEST_INTEGRATION=1 cargo test -p object_store`, which will run the
-tests that contact the cloud services and fail them if the required environment variables aren't
-set.
-
-### Configuration differences when running the tests
-
-When running `influxdb_iox run`, you can pick one object store to use. When running the tests, you
-can run them against all the possible object stores. There's still only one `INFLUXDB_IOX_BUCKET`
-variable, though, so that will set the bucket name for all configured object stores. Use the same
-bucket name when setting up the different services.
-
-Other than possibly configuring multiple object stores, configuring the tests to use the object
-store services is the same as configuring the server to use an object store service. See the output
-of `influxdb_iox run --help` for instructions.
-
 ## InfluxDB 2 Client

 The `influxdb2_client` crate may be used by people using InfluxDB 2.0 OSS, and should be compatible
--- a/docs/underground_guide.md
+++ b/docs/underground_guide.md
@ -19,22 +19,18 @@ cd influxdb_iox
 cargo build --release --features=pprof
 ```

-You can also install the `influxdb_iox` command locally via 
+You can also install the `influxdb_iox` command locally via

 ```shell
 cd influxdb_iox
 cargo install --path influxdb_iox
 ```

-## Step 2: Start kafka and postgres
+## Step 2: Start postgres

-Now, start up kafka and postgres locally in docker containers:
+Now, start up postgres locally in a docker container:
 ```shell
-# get rskafka from https://github.com/influxdata/rskafka
-cd rskafka
-# Run kafka on localhost:9010
-docker-compose -f docker-compose-kafka.yml up &
-# now run postgres
+# Run postgres
 docker run -p 5432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust postgres &
 ```

@ -47,19 +43,13 @@ you have postgres running locally on port 5432).

 ```shell
 # initialize the catalog
-INFLUXDB_IOX_WRITE_BUFFER_TYPE=kafka \
-INFLUXDB_IOX_WRITE_BUFFER_ADDR=localhost:9010 \
-INFLUXDB_IOX_WRITE_BUFFER_AUTO_CREATE_TOPICS=10 \
 INFLUXDB_IOX_CATALOG_DSN=postgres://postgres@localhost:5432/postgres \
 OBJECT_STORE=file \
 DATABASE_DIRECTORY=~/data_dir \
 LOG_FILTER=debug \
 ./target/release/influxdb_iox catalog setup

-# initialize the kafka topic
-INFLUXDB_IOX_WRITE_BUFFER_TYPE=kafka \
-INFLUXDB_IOX_WRITE_BUFFER_ADDR=localhost:9010 \
-INFLUXDB_IOX_WRITE_BUFFER_AUTO_CREATE_TOPICS=10 \
+# initialize the topic
 INFLUXDB_IOX_CATALOG_DSN=postgres://postgres@localhost:5432/postgres \
 OBJECT_STORE=file \
 DATABASE_DIRECTORY=~/data_dir \
@ -67,10 +57,10 @@ LOG_FILTER=debug \
 ./target/release/influxdb_iox catalog topic update iox-shared
 ```

-## Inspecting Catalog and Kafka / Redpanda state
+## Inspecting Catalog state

 Depending on what you are trying to do, you may want to inspect the
-catalog and/or the contents of Kafka / Redpands.
+catalog.

 You can run psql like this to inspect the catalog:
 ```shell
@ -111,41 +101,13 @@ postgres=# \d
 postgres=#
 ```

-You can mess with redpanda using `docker exec redpanda-0 rpk` like this:
-
-```shell
-$ docker exec redpanda-0 rpk topic list
-NAME        PARTITIONS  REPLICAS
-iox-shared  1           1
-```
-
-
 # Step 4: Run the services

-## Run Router on port 8080/8081 (http/grpc)
-```shell
-INFLUXDB_IOX_BIND_ADDR=localhost:8080 \
-INFLUXDB_IOX_GRPC_BIND_ADDR=localhost:8081 \
-INFLUXDB_IOX_WRITE_BUFFER_TYPE=kafka \
-INFLUXDB_IOX_WRITE_BUFFER_ADDR=localhost:9010 \
-INFLUXDB_IOX_WRITE_BUFFER_AUTO_CREATE_TOPICS=10 \
-INFLUXDB_IOX_CATALOG_DSN=postgres://postgres@localhost:5432/postgres \
-OBJECT_STORE=file \
-DATABASE_DIRECTORY=~/data_dir \
-LOG_FILTER=info \
-./target/release/influxdb_iox run router
-```
-
-
 ## Run Ingester on port 8083/8083 (http/grpc)
+
 ```shell
 INFLUXDB_IOX_BIND_ADDR=localhost:8083 \
 INFLUXDB_IOX_GRPC_BIND_ADDR=localhost:8084 \
-INFLUXDB_IOX_WRITE_BUFFER_TYPE=kafka \
-INFLUXDB_IOX_WRITE_BUFFER_ADDR=localhost:9010 \
-xINFLUXDB_IOX_WRITE_BUFFER_AUTO_CREATE_TOPICS=10 \
-INFLUXDB_IOX_SHARD_INDEX_RANGE_START=0 \
-INFLUXDB_IOX_SHARD_INDEX_RANGE_END=0 \
 INFLUXDB_IOX_PAUSE_INGEST_SIZE_BYTES=5000000000 \
 INFLUXDB_IOX_PERSIST_MEMORY_THRESHOLD_BYTES=4000000000 \
 INFLUXDB_IOX_CATALOG_DSN=postgres://postgres@localhost:5432/postgres \
@ -153,15 +115,26 @@ INFLUXDB_IOX_MAX_HTTP_REQUEST_SIZE=100000000 \
 OBJECT_STORE=file \
 DATABASE_DIRECTORY=~/data_dir \
 LOG_FILTER=info \
-./target/release/influxdb_iox run ingester
+./target/release/influxdb_iox run ingester2
 ```

+## Run Router on port 8080/8081 (http/grpc)
+
+```shell
+INFLUXDB_IOX_BIND_ADDR=localhost:8080 \
+INFLUXDB_IOX_GRPC_BIND_ADDR=localhost:8081 \
+INFLUXDB_IOX_CATALOG_DSN=postgres://postgres@localhost:5432/postgres \
+OBJECT_STORE=file \
+DATABASE_DIRECTORY=~/data_dir \
+LOG_FILTER=info \
+./target/release/influxdb_iox run router2
+```

 # Step 5: Ingest data

 You can load data using the influxdb_iox client:
 ```shell
-influxdb_iox  --host=http://localhost:8080 -v write test_db test_fixtures/lineproto/*.lp
+influxdb_iox --host=http://localhost:8080 -v write test_db test_fixtures/lineproto/*.lp
 ```

 Now you can post data to `http://localhost:8080` with your favorite load generating tool
@ -180,21 +153,14 @@ data. The default settings at the time of this writing would result in
 posting fairly large requests (necessitating the
 `INFLUXDB_IOX_MAX_HTTP_REQUEST_SIZE` setting above)

-
 # Step 6: Profile

 See [`profiling.md`](./profiling.md).

-
 # Step 7: Clean up local state

-If you find yourself needing to clean up postgres / kafka state use these commands:
+If you find yourself needing to clean up postgres state, use this command:
+
 ```shell
 docker ps -a -q | xargs docker stop
-docker rm rskafka_proxy_1
-docker rm rskafka_kafka-0_1
-docker rm rskafka_kafka-1_1
-docker rm rskafka_kafka-2_1
-docker rm rskafka_zookeeper_1
-docker volume rm  rskafka_kafka_0_data rskafka_kafka_1_data rskafka_kafka_2_data rskafka_zookeeper_data
 ```
--- a/integration-docker-compose.yml
+++ b/integration-docker-compose.yml
@ -1,12 +1,5 @@
 version: "3.9"
 services:
-  redpanda:
-    pull_policy: always
-    image: docker.vectorized.io/vectorized/redpanda:latest
-    ports:
-      - 9092:9092
-      - 9644:9644
-    command: start --overprovisioned --smp 1 --memory 1G --reserve-memory 0M --node-id 0 --check=false
  postgres:
    pull_policy: always
    image: postgres:latest
--- a/iox_data_generator/README.md
+++ b/iox_data_generator/README.md
@ -17,91 +17,3 @@ And the built binary has command line help:

 For examples of specifications see the [schemas folder](schemas). The [full_example](schemas/full_example.toml) is the
 most comprehensive with comments and example output.
-
-## Use with two IOx servers and Kafka
-
-The data generator tool can be used to simulate data being written to IOx in various shapes. This
-is how to set up a local experiment for profiling or debugging purposes using a database in two IOx
-instances: one writing to Kafka and one reading from Kafka.
-
-If you're profiling IOx, be sure you've compiled and are running a release build using either:
-
-```
-cargo build --release
-./target/release/influxdb_iox run database --server-id 1
-```
-
-or:
-
-```
-cargo run --release -- run database --server-id 1
-```
-
-Server ID is the only required attribute for running IOx; see `influxdb_iox run database --help` for all the
-other configuration options for the server you may want to set for your experiment. Note that the
-default HTTP API address is `127.0.0.1:8080` unless you set something different with `--api-bind`
-and the default gRPC address is `127.0.0.1:8082` unless you set something different using
-`--grpc-bind`.
-
-For the Kafka setup, you'll need to start two IOx servers, so you'll need to set the bind addresses
-for at least one of them. Here's an example of the two commands to run:
-
-```
-cargo run --release -- run router --server-id 1
-cargo run --release -- run database --server-id 2 --api-bind 127.0.0.1:8084 --grpc-bind 127.0.0.1:8086
-```
-
-You'll also need to run a Kafka instance. There's a Docker compose script in the influxdb_iox
-repo you can run with:
-
-```
-docker-compose -f docker/ci-kafka-docker-compose.yml up kafka
-```
-
-The Kafka instance will be accessible from `127.0.0.1:9093` if you run it with this script.
-
-Once you have the two IOx servers and one Kafka instance running, create a database with a name in
-the format `[orgname]_[bucketname]`. For example, create a database in IOx named `mlb_pirates`, and
-the org you'll use in the data generator will be `mlb` and the bucket will be `pirates`. The
-`DatabaseRules` defined in `src/bin/create_database.rs` will set up a database in the "writer" IOx
-instance to write to Kafka and the database in the "reader" IOx instance to read from Kafka if
-you run it with:
-
-```
-cargo run --release -p iox_data_generator --bin create_database -- --writer 127.0.0.1:8082 --reader 127.0.0.1:8086 mlb_pirates
-```
-
-This script adds 3 rows to a `writer_test` table because [this issue with the Kafka Consumer
-needing data before it can find partitions](https://github.com/influxdata/influxdb_iox/issues/2189).
-
-Once the database is created, decide what kind of data you would like to send it. You can use an
-existing data generation schema in the `schemas` directory or create a new one, perhaps starting
-from an existing schema as a guide. In this example, we're going to use
-`iox_data_generator/schemas/cap-write.toml`.
-
-Next, run the data generation tool as follows:
-
-```
-cargo run --release -p iox_data_generator -- --spec iox_data_generator/schemas/cap-write.toml --continue --host 127.0.0.1:8080 --token arbitrary --org mlb --bucket pirates
-```
-
- `--spec iox_data_generator/schemas/cap-write.toml` sets the schema you want to use to generate the data
- `--continue` means the data generation tool should generate data every `sampling_interval` (which
-  is set in the schema) until we stop it
- `--host 127.0.0.1:8080` means to write to the writer IOx server running at the default HTTP API address
-  of `127.0.0.1:8080` (note this is NOT the gRPC address used by the `create_database` command)
- `--token arbitrary` - the data generator requires a token value but IOx doesn't use it, so this
-  can be any value.
- `--org mlb` is the part of the database name you created before the `_`
- `--bucket pirates` is the part of the database name you created after the `_`
-
-You should be able to use `influxdb_iox sql -h http://127.0.0.1:8086` to connect to the gRPC of the reader
-then `use database mlb_pirates;` and query the tables to see that the data is being inserted. That
-is,
-
-```
-# in your influxdb_iox checkout
-cargo run --release -- sql -h http://127.0.0.1:8086
-```
-
-Connecting to the writer instance won't show any data.