Commit Graph

12607 Commits (614ff35998623026bd78b682dbce79827942e639)

Author SHA1 Message Date
dependabot[bot] 89d8207784
chore(deps): Bump io-lifetimes from 1.0.10 to 1.0.11 (#7865)
Bumps [io-lifetimes](https://github.com/sunfishcode/io-lifetimes) from 1.0.10 to 1.0.11.
- [Commits](https://github.com/sunfishcode/io-lifetimes/compare/v1.0.10...v1.0.11)

---
updated-dependencies:
- dependency-name: io-lifetimes
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-25 07:49:53 +00:00
Dom deb0c52fed
Merge pull request #7862 from influxdata/dom/remove-catalog-metric-accessor
refactor(catalog): mark metrics() as test only
2023-05-24 16:48:47 +01:00
Dom Dwyer 2094b45c10
refactor(catalog): mark metrics() as test only
This method is used to enable tests - it's never intended to be used in
production code to access the underlying metric registry. The Catalog
trait is responsible for Catalog things, not acting as a dependency
injection for metrics.

The only current use of this is in test code, so no changes needed.
2023-05-24 17:38:10 +02:00
kodiakhq[bot] 2eebf2a0a7
Merge pull request #7790 from influxdata/cn/store
feat: Set, store, and use custom namespace and table partitions on write
2023-05-24 14:43:10 +00:00
Carol (Nichols || Goulding) d91b75526f
fix: Clarify that the expect is on the Option, not the Result 2023-05-24 10:36:52 -04:00
Carol (Nichols || Goulding) e67e336a88
docs: Explain why the partition template types are implemented the way they are 2023-05-24 10:36:52 -04:00
Carol (Nichols || Goulding) efc817c2a8
fix: Remove From impl, leaving TablePartitionTemplateOverride::new as only creation mechanism
This makes it clearer that you do or do not have a custom table override
(in the first argument to `new`).
2023-05-24 10:36:52 -04:00
Carol (Nichols || Goulding) 46f7e3e48a
fix: Handle potential for data race in catalog table insertion by re-fetching if detected 2023-05-24 10:36:52 -04:00
Carol (Nichols || Goulding) 90cb4b6ed9
refactor: Extract a function for handling a table missing from the namespace cache 2023-05-24 10:36:52 -04:00
Carol (Nichols || Goulding) 73b09d895f
feat: Store and handle NULL partition_template database values
Treat them as the default partition template in the application, but
save space and avoid having to backfill the tables by having the
database values be NULL when no custom template has been specified.
2023-05-24 10:36:52 -04:00
Carol (Nichols || Goulding) c8712bbc90
fix: Add a fixture test encoding and documenting default partition template assumptions 2023-05-24 10:36:52 -04:00
Carol (Nichols || Goulding) fb53faaa2f
refactor: Only use Partitioner::default and derive it 2023-05-24 10:34:31 -04:00
Carol (Nichols || Goulding) aab0acc16a
fix: Panic if attempting to partition on a non-tag column 2023-05-24 10:34:31 -04:00
Carol (Nichols || Goulding) 42804a20bc
fix: Switch to using Sqlite when encoding so there's no extra 1 in the JSON 2023-05-24 10:34:30 -04:00
Carol (Nichols || Goulding) d713ba935a
refactor: Reduce duplication of encode/decode implementations
This is much less gobbledygook.
2023-05-24 10:34:30 -04:00
Carol (Nichols || Goulding) c479ed184d
refactor: Rearrange definitions in the partition_template module
Move the application types to the top, which puts all the sqlx
conversion gobbledygook at the end because it's an internal
implementation detail I'm about to refactor

Git probably isn't going to display this in a super obvious way, but
this commit is only moving code around, not changing any of it
2023-05-24 10:34:30 -04:00
Carol (Nichols || Goulding) a22d809cdf
test: Create an overridden namespace, and create a table from it (no override), read it back and assert the expected partitioning scheme is derived 2023-05-24 10:34:30 -04:00
Carol (Nichols || Goulding) 2ab3ea03b8
test: Create a default (not overridden) namespace, read it back, assert the expected partitioning scheme is derived 2023-05-24 10:34:30 -04:00
Carol (Nichols || Goulding) 9c0faa66f0
feat: Set a table partition template explicitly or from the namespace
And use the table partition template when partitioning writes to that
table.
2023-05-24 10:34:30 -04:00
Carol (Nichols || Goulding) 604bab9508
fix: Make Table create_or_get be only create 2023-05-24 10:34:30 -04:00
Carol (Nichols || Goulding) afb3838437
feat: Optionally supply the namespace partition template when creating a namespace 2023-05-24 10:10:34 -04:00
Carol (Nichols || Goulding) 47157015d9
feat: Add columns to store the partition templates 2023-05-24 10:10:34 -04:00
Carol (Nichols || Goulding) 6f92bccc99
feat: Use protobuf for PartitionTemplate in CreateNamespace gRPC API
The service implementation doesn't use this field yet.
2023-05-24 10:10:34 -04:00
Marco Neumann 29dccdc61a
Merge pull request #7859 from influxdata/crepererum/clean_up_parquet_indices
refactor: remove ununused `parquet_file` indices
2023-05-24 13:51:28 +02:00
Marco Neumann b71564f455 refactor: remove ununused `parquet_file` indices
Remove unused Postgres indices. This lower database load but also gives
us room to install actually useful indices (see #7842).

To detect which indices are used, I've used the following query (on the
actual write/master replicate in eu-central-1):

```sql
SELECT
    n.nspname                                      AS namespace_name,
    t.relname                                      AS table_name,
    pg_size_pretty(pg_relation_size(t.oid))        AS table_size,
    t.reltuples::bigint                            AS num_rows,
    psai.indexrelname                              AS index_name,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END  AS "unique",
    psai.idx_scan                                  AS number_of_scans,
    psai.idx_tup_read                              AS tuples_read,
    psai.idx_tup_fetch                             AS tuples_fetched
FROM
    pg_index i
    INNER JOIN pg_class t               ON t.oid = i.indrelid
    INNER JOIN pg_namespace n           ON n.oid = t.relnamespace
    INNER JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid
WHERE
    n.nspname = 'iox_catalog' AND t.relname = 'parquet_file'
ORDER BY 1, 2, 5;
```

At `2023-05-23T16:00:00Z`:

```text
 namespace_name |  table_name  | table_size | num_rows  |                    index_name                    | index_size | unique | number_of_scans |  tuples_read   | tuples_fetched
----------------+--------------+------------+-----------+--------------------------------------------------+------------+--------+-----------------+----------------+----------------
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_deleted_at_idx                      | 5398 MB    | N      |      1693383413 | 21036174283392 |    21336337964
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_partition_created_idx               | 11 GB      | N      |        34190874 |     4749070532 |       61934212
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_partition_idx                       | 2032 MB    | N      |      1612961601 |  9935669905489 |  8611676799872
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_pkey                                | 7135 MB    | Y      |       453927041 |      454181262 |      453894565
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_shard_compaction_delete_created_idx | 14 GB      | N      |               0 |              0 |              0
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_shard_compaction_delete_idx         | 8767 MB    | N      |               2 |          30717 |           4860
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_file_table_idx                           | 1602 MB    | N      |         9136844 |   341839537275 |          27551
 iox_catalog    | parquet_file | 31 GB      | 120985000 | parquet_location_unique                          | 4989 MB    | Y      |       332341872 |           3123 |           3123
```

At `2023-05-24T09:50:00Z` (i.e. nearly 18h later):

```text
 namespace_name |  table_name  | table_size | num_rows  |                    index_name                    | index_size | unique | number_of_scans |  tuples_read   | tuples_fetched
----------------+--------------+------------+-----------+--------------------------------------------------+------------+--------+-----------------+----------------+----------------
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_deleted_at_idx                      | 5448 MB    | N      |      1693485804 | 21409285169862 |    21364369704
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_partition_created_idx               | 11 GB      | N      |        34190874 |     4749070532 |       61934212
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_partition_idx                       | 2044 MB    | N      |      1615214409 | 10159380553599 |  8811036969123
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_pkey                                | 7189 MB    | Y      |       455128165 |      455382386 |      455095624
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_shard_compaction_delete_created_idx | 14 GB      | N      |               0 |              0 |              0
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_shard_compaction_delete_idx         | 8849 MB    | N      |               2 |          30717 |           4860
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_file_table_idx                           | 1618 MB    | N      |         9239071 |   348304417343 |          27551
 iox_catalog    | parquet_file | 31 GB      | 123869328 | parquet_location_unique                          | 5043 MB    | Y      |       343484617 |           3123 |           3123
```

The cluster currently is under load and all components are running.
Conclusion:

- `parquet_file_deleted_at_idx`: Used, likely by the GC. We could
  probably shrink this index by binning `deleted_at` (within the index,
  not within the actual database table), but let's do this in a later PR.
- `parquet_file_partition_created_idx`: Unused and huge (`created_at` is
  NOT binned). So let's remove it.
- `parquet_file_partition_idx`: Used, likely by the compactor and
  querier because we currently don't have a better index (see #7842 as
  well). This includes deleted files as well which is somewhat
  pointless. May become obsolete after #7842, not touching for now.
- `parquet_file_pkey`: Primary key. We should probably use the object
  store UUID as a primary key BTW, which would also make the GC faster.
  Not touching for now.
- `parquet_file_shard_compaction_delete_created_idx`: Huge unused index.
  Shards don't exist anymore. Delete it.
- `parquet_file_shard_compaction_delete_idx`: Same as
  `parquet_file_shard_compaction_delete_created_idx`.
- `parquet_file_table_idx`: Used but is somewhat too large because it
  contains deleted files. Might become obsolete after #7842, don't touch
  for now.
- `parquet_location_unique`: See note `parquet_file_pkey`, it's
  pointless to have two IDs here. Not touching for now but this is a
  potential future improvement.

So we remove:

- `parquet_file_partition_created_idx`
- `parquet_file_shard_compaction_delete_created_idx`
- `parquet_file_shard_compaction_delete_idx`
2023-05-24 12:10:22 +02:00
Marco Neumann bc18c6dc5f
refactor: re-land #7815. (#7852)
* refactor: consolidate pruning code

Let's have a single chunk pruning implementation in our code, not two.

Also removes a bit of crust from `QueryChunk` since it is technically no
longer responsible for pruning (this part has been pushed into the
querier for early pruning and bits for the `iox_query_influxrpc` for
some RPC shenanigans).

* test: regression test for incident

* fix: chunk pruning

* docs: add some test notes
2023-05-24 09:46:49 +00:00
dependabot[bot] 24a4f36d24
chore(deps): Bump proptest from 1.1.0 to 1.2.0 (#7857)
Bumps [proptest](https://github.com/proptest-rs/proptest) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/proptest-rs/proptest/releases)
- [Changelog](https://github.com/proptest-rs/proptest/blob/master/CHANGELOG.md)
- [Commits](https://github.com/proptest-rs/proptest/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: proptest
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-24 09:21:32 +00:00
Marco Neumann 103e814f22
refactor: clean up catalog `parquet_files` interface (#7853)
* feat: `ParquetFileRepo::list_all`

* refactor: remove `ParquetFileRepo::list_by_table`

* refactor: simlify `ParquetFileRepo::list_by_table`

* refactor: remove `ParquetFileRepo::count`

* refactor: remove `ParquetFileRepo::update_compaction_level`

* refactor: remove `ParquetFileRepo::exists`

* fix: test

---------

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2023-05-24 09:15:03 +00:00
dependabot[bot] b7fbfa6fb2
chore(deps): Bump criterion from 0.4.0 to 0.5.0 (#7856)
Bumps [criterion](https://github.com/bheisler/criterion.rs) from 0.4.0 to 0.5.0.
- [Changelog](https://github.com/bheisler/criterion.rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/bheisler/criterion.rs/compare/0.4.0...0.5.0)

---
updated-dependencies:
- dependency-name: criterion
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-05-24 09:08:37 +00:00
Marco Neumann 6729b5681a
fix(ingester): re-transmit schema over flight if it changes (#7812)
* fix(ingester): re-transmit schema over flight if it changes

Fixes https://github.com/influxdata/idpe/issues/17408 .

So a `[Sendable]RecordBatchStream` contains `RecordBatch`es of the SAME
schema. When the ingester crafts a response for a specific partition,
this is also almost always the case however when there's a persist job
running (I think) it may have multiple snapshots for a partition. These
snapshots may have different schemas (since the ingester only creates
columns if the contain any data). Now the current implementation munches
all these snapshots into a single stream, and hands them over to arrow
flight which has a high-perf encode routine (i.e. it does not re-check
every single schema) so it sends the schema once and then sends the data
for every batch (the data only, schema data is NOT repeated). On the
receiver side (= querier) we decode that data and get confused why on
earth some batches have a different column count compared to the schema.

For the OG ingester I carefully crafted the response to ensure that we
do not run into this problem, but apparently a number of rewrites and
refactors broke that. So here is the fix:

- remove the stream that isn't really as stream (and cannot error)
- for each partition go over the `RecordBatch`es and chunk them
  according to the schema (because this check is likely cheaper than
  re-transmitting the schema for every `RecordBatch`)
- adjust a bunch of testing code to cope with this

* refactor: nicify code

* test: adjust test
2023-05-23 14:27:11 +00:00
kodiakhq[bot] 43078576b8
Merge pull request #7839 from influxdata/dom/cleanup-deps
build: remove unused dependencies from loads of crates
2023-05-23 13:01:47 +00:00
Dom Dwyer 94203287f0
test: fix line number test
This test failed because it references line numbers that changed.
2023-05-23 14:55:44 +02:00
Dom Dwyer e61fb3a78c
test: remove line numbers from asserts
I don't think the tests are that specific that they need to assert the
line.
2023-05-23 14:55:43 +02:00
Dom Dwyer 928a4d163e
build: remove unused dependencies from crates
This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).

I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!

    https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html

This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
2023-05-23 14:55:43 +02:00
Dom Dwyer 983a3b44b8
ci: add missing lints to service_grpc_schema
This crate was missing some of the common lints we use everywhere else.
2023-05-23 14:55:42 +02:00
Dom Dwyer 9696b75476
ci: add missing lints to service_grpc_namespace
This crate was missing some of the common lints we use everywhere else.
2023-05-23 14:55:42 +02:00
Dom Dwyer b783bb1967
refactor(lints): add missing lints to service_grpc_influxrpc
Adds the standard lints to service_grpc_influxrpc and fixes any lint
failures.

Note this doesn't include the normal "document things" lint, because
there's a load of missing docs
2023-05-23 14:55:42 +02:00
Dom Dwyer 6acf7f10dd
refactor(lints): add missing lints to service_grpc_flight
Adds the standard lints to service_grpc_flight and fixes any lint
failures.

Note this doesn't include the normal "document things" lint, because
there's a load of missing docs
2023-05-23 14:55:41 +02:00
Dom Dwyer bc95c70144
refactor(lints): add missing lints to service_common
Adds the standard lints to service_common and fixes any lint failures.

Note this doesn't include the normal "document things" lint, because
there's a load of missing docs
2023-05-23 14:55:41 +02:00
Dom Dwyer 7bddba8eae
ci: add missing lints to schema
This crate was missing some of the common lints we use everywhere else.
2023-05-23 14:55:40 +02:00
Dom Dwyer 4f84b2122c
refactor(lints): add missing lints to parquet_to_line_protocol
Adds the standard lints to parquet_to_line_protocol and fixes any lint
failures.

Note this doesn't include the normal "document things" lint, because
there's a load of missing docs
2023-05-23 14:55:40 +02:00
Dom Dwyer a6147dd03b
ci: add missing lints to object_store_metrics
This crate was missing some of the common lints we use everywhere else.
2023-05-23 14:55:39 +02:00
Dom Dwyer 29ab3a2913
refactor(lints): add missing lints to logfmt
Adds the standard lints to logfmt and fixes any lint failures.

Note this doesn't include the normal "document things" lint, because
there's a load of missing docs
2023-05-23 14:55:39 +02:00
Dom Dwyer 17c5f8d0b5
ci: add missing lints to ioxd_test
This crate was missing some of the common lints we use everywhere else.
2023-05-23 14:55:39 +02:00
Dom Dwyer 86b5dd20f5
ci: add missing lints to iox_query_influxrpc
This crate was missing some of the common lints we use everywhere else.
2023-05-23 14:55:38 +02:00
Dom Dwyer 45ddeaa25e
refactor(lints): add missing lints to ioxd_querier
Adds the standard lints to ioxd_querier and fixes any lint failures.

Note this doesn't include the normal "document things" lint, because
there's a load of missing docs
2023-05-23 14:55:38 +02:00
Dom Dwyer e33c17c6f7
refactor(lints): add missing lints to ioxd_ingester
Adds the standard lints to ioxd_ingester and fixes any lint failures.

Note this doesn't include the normal "document things" lint, because
there's a load of missing docs
2023-05-23 14:55:37 +02:00
Dom Dwyer e15b57a3aa
refactor(lints): add missing lints to ioxd_compactor
Adds the standard lints to ioxd_compactor and fixes any lint failures.

Note this doesn't include the normal "document things" lint, because
there's a load of missing docs
2023-05-23 14:55:37 +02:00
Dom Dwyer 8f49aabc56
refactor(lints): add missing lints to ioxd_common
Adds the standard lints to ioxd_common and fixes any lint failures.

Note this doesn't include the normal "document things" lint, because
there's a load of missing docs
2023-05-23 14:55:37 +02:00
Dom Dwyer adb135d47c
ci: add missing lints to iox_query_influxrpc
This crate was missing some of the common lints we use everywhere else.
2023-05-23 14:55:36 +02:00