Commit Graph

7389 Commits (3aa3ebe0e8369aa964b8151c3d2db1883c663859)

Author SHA1 Message Date
Paul Dix 3aa3ebe0e8
chore: add compactor logging (#4207) 2022-04-01 18:51:01 -04:00
kodiakhq[bot] 403ae51099
Merge pull request #4138 from influxdata/cn/sort-key
feat: Compute a sort key in the ingester
2022-04-01 19:34:36 +00:00
kodiakhq[bot] b561f06c9e
Merge branch 'main' into cn/sort-key 2022-04-01 19:26:58 +00:00
Nga Tran 77ad4a7dad
feat: replace a compactor constant with an CLI config param (#4204) 2022-04-01 17:50:43 +00:00
Carol (Nichols || Goulding) d41adf074f
test: Add assertions for sort keys 2022-04-01 13:13:04 -04:00
Nga Tran a6eb83d47d
feat: compact small contiguous files of the same partition even if they do not overlap (#4197)
* feat: compact small contiguous files of the same partition even if they do not overlap

* test: more tests

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-04-01 15:26:43 +00:00
Luke Bond ea865b63f4
fix: create_or_get_multi for column in catalog now enforces limits (#4179)
* fix: create_or_get_multi for column in catalog now enforces limits

fix: create_or_get_multi for column in catalog now enforces limits
chore: reorder catalog column create fns to be next to each other
test: add failing test for multi col insert w/ limits

test: bend catalog mem impl to match postgres for tests

fix: postgres column insert many column type error checks

chore: clippy

* test: assert column counts in partial column insert test

* chore: add some sql comments to the monster multicolumn insert query; s/RIGHT/INNER/ join

* chore: adding comments to clarify partial failure behaviour of multi col insert

* test: add tests for create_or_get_many columns in catalog

* test: forgot how macros work for a moment

* test: service limit test handles partial update of cols

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-01 10:59:43 +00:00
Paul Dix 6479e1fc8e
fix: add indexes to parquet_file (#4198)
Add indexes so compactor can find candidate partitions and specific partition files quickly.
Limit number of level 0 files returned for determining candidates. This should ensure that if comapction is very backed up, it will be able to work through the backlog without evaluating the entire world.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-01 09:59:39 +00:00
dependabot[bot] e8b0655ac8
chore(deps): Bump clap from 3.1.6 to 3.1.7 (#4199)
Bumps [clap](https://github.com/clap-rs/clap) from 3.1.6 to 3.1.7.
- [Release notes](https://github.com/clap-rs/clap/releases)
- [Changelog](https://github.com/clap-rs/clap/blob/master/CHANGELOG.md)
- [Commits](https://github.com/clap-rs/clap/compare/v3.1.6...v3.1.7)

---
updated-dependencies:
- dependency-name: clap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-01 09:49:30 +00:00
Carol (Nichols || Goulding) f4b5fa1b5e
feat: Implement distinct counts in terms of distinct values
For one record batch.

Connects to #4194.
2022-03-31 16:46:27 -04:00
Carol (Nichols || Goulding) 832495a7c9
feat: Implement ingester compute_sort_key similarly to query compute_sort_key
And add a test that currently fails because this implementation doesn't
include actually computing the cardinalities.

Connects to #4194.
2022-03-31 16:35:16 -04:00
Carol (Nichols || Goulding) 9d83554f20
feat: Get the sort key from the schema and data in the QueryableBatch
Connects to #4194.
2022-03-31 16:34:48 -04:00
Carol (Nichols || Goulding) 9043966443
docs: Fix some typos in comments as I noticed them 2022-03-31 16:34:47 -04:00
Andrew Lamb d37af1a7f5
fix: include git sha (again) in release build (#4193)
* fix: error if git-sha can not be found

* refactor: move main to influxdb_iox

* fix: fmt
2022-03-31 19:14:21 +00:00
Andrew Lamb 532d227d11
refactor: extract router2 into ioxd_router2 (#4183)
* refactor: extract router2 from ioxd

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-03-31 17:39:05 +00:00
Andrew Lamb 367e926d35
refactor: extract querier into ioxd_querier (#4182)
* refactor: extract querier into ioxd_querier

* fix: dep
2022-03-31 16:03:31 +00:00
Andrew Lamb a384448b92
refactor: rename Sequence::id and Sequence::number field names (#4190)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-31 15:17:58 +00:00
Marco Neumann 5bebc73e3f
fix: consider "in-between" tombstones as processed (#4187)
Abstract
========
We need to be careful w/ tombstone that fall exactly in sequence number range of a parquet file.

Current Bug
===========
Imagine the following order of events:

1. Router creates write at sequence number 1:
   - `table,selector=1 payload=1 1`
   - `table,selector=2 payload=2 2`
2. Ingester pulls write, waits a bit and persists it to parquet file 1:
   - `table,selector=1 payload=1 1`
   - `table,selector=2 payload=2 2`
4. Router creates write at sequence number 2:
   - `table,selector=1 payload=3 3`
   - `table,selector=2 payload=4 4`
5. Ingester pulls write
6. Router create delete at sequencer number 3: full time range, `selector=1`
7. Ingeser pulls delete and creates tombstone 1
8. Router creates write at sequence number 4:
   - `table,selector=1 payload=5 5`
   - `table,selector=2 payload=6 6`
9. Ingester pulls write
10. Ingester persists parquet file 2:
    - `table,selector=2 payload=4 4`
    - `table,selector=1 payload=5 5`
    - `table,selector=2 payload=6 6`

When reading parquet file 2, the tombstone MUST NOT be applied. Otherwise `table,selector=1 payload=5 5` will be
deleted.

Notes
=====
Technically this issue also applies to files created by the compactor, however the compactor marks tombstones as
processed that fall into the sequence number range. It even does that in a single transaction:

fc4635a334/compactor/src/compact.rs (L821-L861)

Alternative
===========
An alternative solution would be if the ingester would mark tombstones that it materialized during persistence as
"processed" (tombstone 1 for parquet file 2 in the example above). However "processed" markers are currently a mere
optimization and don't affect correctness, which is nice for caching on the querier side as well as reasoning.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-31 15:09:58 +00:00
Nga Tran 9c50a4c9fb
test: replace find_and_compact with compact_partition in tests (#4185)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-31 13:51:22 +00:00
Andrew Lamb de6505c801
fix: retry catalog list operation (#4188) 2022-03-31 13:07:00 +00:00
Andrew Lamb a1df864283
feat: Support 'SHOW NAMESPACES' in sql repl (#4164)
* feat: Support `SHOW NAMESPACES` in sql repl

* feat: add basic support to clients

* fix: add get_namespaces service test

* fix: proper error handling

* test: end to end test for namespace client

* refactor: Use QuerierDatabase rather than Catalog

* refactor: remove unused function
2022-03-31 12:57:33 +00:00
dependabot[bot] fc4635a334
chore(deps): Bump lock_api from 0.4.6 to 0.4.7 (#4186)
Bumps [lock_api](https://github.com/Amanieu/parking_lot) from 0.4.6 to 0.4.7.
- [Release notes](https://github.com/Amanieu/parking_lot/releases)
- [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md)
- [Commits](https://github.com/Amanieu/parking_lot/compare/lock_api-0.4.6...lock_api-0.4.7)

---
updated-dependencies:
- dependency-name: lock_api
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-31 07:45:03 +00:00
Nga Tran ddc2c8304f
fix: have the compaction level set correctly (#4184)
* fix: have the compaction level set correctly, especially for compacted file from the compactor

* fix: typo
2022-03-30 21:23:40 +00:00
Andrew Lamb d1940c0c47
fix: flaky `error_converting_data_from_write_buffer_to_sequenced_entry_is_reported` (#4181) 2022-03-30 19:00:35 +00:00
Paul Dix 04d961e70d
feat: wire up compactor scheduler and config (#4139)
Add configuration options for compactor for the max size of level 0 files and split percentage.
Add metrics for compaction to track the number of candidates, compactions, and durations.
Add functions to separate identifying partitions to compact from running compaction.
Make compaction run in smaller chunks, specifically per partition.
Update compaction to automatically promote level 0 files that are non-overlapping without waiting some period of time.

Closes #4120

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 17:45:24 +00:00
Raphael Taylor-Davies bfe24b3418
feat: add protobuf-compiler to CI image (#4180)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 17:36:27 +00:00
Andrew Lamb 92da65a065
feat: Add end to end tests for querier and schema client (#4178)
* refactor: split up ingester schema test, add mini cluster

* feat: add schema cli test

* feat: add end to end test for querier

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 17:07:32 +00:00
Andrew Lamb 22b24bdab3
chore: Update datafusion again (#4148)
* chore: update datafusoon

* refactor: Update for DataFusion API changes

* chore: TEMP TEMP change df to local copy

* chore: Update to datafusion again

* fix: Update Cargo.lock

* fix: logical conflict
2022-03-30 16:51:48 +00:00
Marko Mikulicic 2c47d77a5b
fix: Backfill namespace_id in schema migration (#4177)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 16:31:26 +00:00
Edd Robinson 7a437387d9
feat: add KeySortCapability capability (#4176) 2022-03-30 15:57:03 +00:00
Marco Neumann b1af5b3f44
feat: query log system table for querier (#4157)
* feat: query log system table for querier

Closes #4084.

* fix: typo

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: extend

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 15:38:11 +00:00
Marco Neumann 5c6583e09a
feat: `CacheBackend::is_empty` (#4174)
* test: run generic tests for dual cache backend

* feat: `CacheBackend::is_empty`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 15:28:59 +00:00
kodiakhq[bot] a1339aa194
Merge pull request #4175 from influxdata/dom/schema-metrics
feat: schema validation metrics
2022-03-30 14:17:42 +00:00
Andrew Lamb f2a3dd58b2
refactor: rename `influxdb_ioxd` to `ioxd` (#4162)
* refactor: rename influxdb_ioxd to ioxd

* fix: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 13:31:27 +00:00
Marco Neumann 20bbb88dc5
refactor: remove table name from `TableSummary` (#4170)
This allows us to remove the table name from the low-level chunk
representations (like `ParquetFile`, RUB, ...) since table names are
already tracked by the higher-level data structures (e.g. catalog,
catalog chunk) that manage the low-level chunk representations.

This is similar to #4167.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 13:24:00 +00:00
Dom Dwyer bc01121c51 refactor(router2): no Arc for instrumentation
Removes needless Arc for the InstrumentationDecorator.
2022-03-30 13:02:21 +00:00
Dom Dwyer 91730d6a1c feat: schema validation conflict/limit metrics
Emit metric counters tracking the number of schema conflicts, and number
of service limit errors observed.
2022-03-30 11:42:58 +00:00
Marko Mikulicic e7ab1908e5
fix: Use a bigger circleci machine for build_dev (#4171) 2022-03-30 09:59:07 +00:00
Marco Neumann 036626a576
refactor: remove partition key from `ParquetChunk` (#4167)
The parquet chunk is always wrapped into some higher-level data
structure (e.g. a catalog chunk, a partition, ...) that knows exactly
"where" the chunk is located. There is no need for the parquet chunk to
back-reference container-level attributes. In the contrary:
double-bookkeeping makes the code more complex and costs additional
memory.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 09:24:56 +00:00
Marko Mikulicic f3aad7a731
feat: Capture more extreme tail latency buckets for dml handler (#4168) 2022-03-30 09:16:33 +00:00
dependabot[bot] 9950bcee27
chore(deps): Bump indexmap from 1.8.0 to 1.8.1 (#4166)
Bumps [indexmap](https://github.com/bluss/indexmap) from 1.8.0 to 1.8.1.
- [Release notes](https://github.com/bluss/indexmap/releases)
- [Changelog](https://github.com/bluss/indexmap/blob/master/RELEASES.md)
- [Commits](https://github.com/bluss/indexmap/compare/1.8.0...1.8.1)

---
updated-dependencies:
- dependency-name: indexmap
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-30 08:49:07 +00:00
Nga Tran bfd5568acf
fix: make sure the QueryableParquetChunks are always sorted correctly (#4163)
* fix: make sure the chunks are always sorted correctly

* fix: output

* chore: Apply suggestions from code review

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: make new function for new chunk id

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-29 21:36:45 +00:00
Marco Neumann 2b76c31157
refactor: make statistics null counts optional (#4160)
Min/max values and distinct counts are already optional, so let's make
the null counts optional as well. This will be helpful for NG to deal w/
partial statistics (e.g. we only populate stats for the time column).

Note that the total count is still mandatory, but we normally have the
chunk/file-level row count at hand.
2022-03-29 17:47:57 +00:00
Marco Neumann 7d947c79d5
refactor: small query tests clean up (#4156)
* refactor: make NG query test generation more flexible

* refactor: rename OG-specfic query tests

* docs: explain chunk stage generation in NG query tests

* fix: typo
2022-03-29 14:00:34 +00:00
Andrew Lamb 4ca52e5ae0
refactor: Extract common, OG database and router out of influxdb_ioxd (#4149)
* refactor: Extract common, OG database and router out of influxdb_ioxd

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-03-29 13:07:19 +00:00
kodiakhq[bot] 99b6be9c2b
Merge pull request #4131 from influxdata/cn/delete-old
feat: Clean up old parquet files in the catalog and object storage
2022-03-29 12:37:06 +00:00
Carol (Nichols || Goulding) db5cd70c77
fix: logical merge conflict, unused import 2022-03-29 08:29:23 -04:00
Carol (Nichols || Goulding) 79447aed33
fix: Logical merge conflict, missing namespace_id in test setup 2022-03-29 08:28:51 -04:00
Carol (Nichols || Goulding) 5c8a80dca6
fix: Add an index to parquet_file to_delete 2022-03-29 08:15:26 -04:00
Carol (Nichols || Goulding) 4a51c9eda6
feat: Add a garbage collector to be called in a background loop
Fixes #3954.
2022-03-29 08:15:26 -04:00