Commit Graph

126 Commits (5ac4785e197f0f35676a0723d5e9e5c2757e2756)

Author SHA1 Message Date
Carol (Nichols || Goulding) 94dcde4996
fix: Do fewer queries for metadata
By adding another _with_metadata catalog function. Also introduce a new
type rather than passing around tuples everywhere.
2022-04-13 10:43:20 -04:00
Carol (Nichols || Goulding) bba4251363
fix: Remove duplication in metric name 2022-04-13 10:43:19 -04:00
Carol (Nichols || Goulding) 02fee3b84f
feat: Request parquet metadata from the catalog when needed only 2022-04-13 10:43:19 -04:00
Carol (Nichols || Goulding) ec25620b73
feat: Add a catalog method for requesting a parquet file's metadata 2022-04-13 10:43:19 -04:00
Carol (Nichols || Goulding) ee56ebf0e3
feat: Store metadata in catalog, but don't fetch by default 2022-04-13 10:43:19 -04:00
Dom Dwyer 02f87e8484 refactor: reduce level_0 query limit
Reduce the query limit from 10,000 to 1,000 to help reduce query
execution time.
2022-04-05 15:14:56 +01:00
Paul Dix 81d41f81a1
fix: ingester replay logic (#4212)
Fix the ingester to track the max persisted sequence number per partition.
Ensure replay takes in data from unpersisted partitions.
Simplify the table persist info to not return a max persisted sequence number for the table as that information isn't needed.
2022-04-04 18:04:34 +00:00
kodiakhq[bot] e2439c0a4f
Merge branch 'main' into cn/sort-key-catalog 2022-04-04 16:54:48 +00:00
Dom Dwyer 61bc9c83ad refactor: add table_id index on column_name
After checking the postgres workload for the catalog in prod, this
missing index was noted as the cause of unexpectedly expensive plans for
simple queries.
2022-04-04 13:04:25 +01:00
dependabot[bot] dc9632114c
chore(deps): Bump pretty_assertions from 1.2.0 to 1.2.1 (#4213)
Bumps [pretty_assertions](https://github.com/colin-kiegel/rust-pretty-assertions) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/colin-kiegel/rust-pretty-assertions/releases)
- [Changelog](https://github.com/colin-kiegel/rust-pretty-assertions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/colin-kiegel/rust-pretty-assertions/compare/v1.2.0...v1.2.1)

---
updated-dependencies:
- dependency-name: pretty_assertions
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-04 10:53:31 +00:00
Carol (Nichols || Goulding) cbf7888435
feat: Add Partition update_sort_key method to catalog 2022-04-01 15:45:51 -04:00
Carol (Nichols || Goulding) c9bc70f03a
feat: Add optional sort_key column to partition table
Connects to #4195.
2022-04-01 15:45:51 -04:00
Luke Bond ea865b63f4
fix: create_or_get_multi for column in catalog now enforces limits (#4179)
* fix: create_or_get_multi for column in catalog now enforces limits

fix: create_or_get_multi for column in catalog now enforces limits
chore: reorder catalog column create fns to be next to each other
test: add failing test for multi col insert w/ limits

test: bend catalog mem impl to match postgres for tests

fix: postgres column insert many column type error checks

chore: clippy

* test: assert column counts in partial column insert test

* chore: add some sql comments to the monster multicolumn insert query; s/RIGHT/INNER/ join

* chore: adding comments to clarify partial failure behaviour of multi col insert

* test: add tests for create_or_get_many columns in catalog

* test: forgot how macros work for a moment

* test: service limit test handles partial update of cols

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-01 10:59:43 +00:00
Paul Dix 6479e1fc8e
fix: add indexes to parquet_file (#4198)
Add indexes so compactor can find candidate partitions and specific partition files quickly.
Limit number of level 0 files returned for determining candidates. This should ensure that if comapction is very backed up, it will be able to work through the backlog without evaluating the entire world.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-01 09:59:39 +00:00
Nga Tran ddc2c8304f
fix: have the compaction level set correctly (#4184)
* fix: have the compaction level set correctly, especially for compacted file from the compactor

* fix: typo
2022-03-30 21:23:40 +00:00
Paul Dix 04d961e70d
feat: wire up compactor scheduler and config (#4139)
Add configuration options for compactor for the max size of level 0 files and split percentage.
Add metrics for compaction to track the number of candidates, compactions, and durations.
Add functions to separate identifying partitions to compact from running compaction.
Make compaction run in smaller chunks, specifically per partition.
Update compaction to automatically promote level 0 files that are non-overlapping without waiting some period of time.

Closes #4120

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 17:45:24 +00:00
Marko Mikulicic 2c47d77a5b
fix: Backfill namespace_id in schema migration (#4177)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 16:31:26 +00:00
Carol (Nichols || Goulding) 79447aed33
fix: Logical merge conflict, missing namespace_id in test setup 2022-03-29 08:28:51 -04:00
Carol (Nichols || Goulding) 5c8a80dca6
fix: Add an index to parquet_file to_delete 2022-03-29 08:15:26 -04:00
Carol (Nichols || Goulding) f3f792fd08
feat: Add namespace_id to the parquet_files table; object store paths need it 2022-03-29 08:15:26 -04:00
Carol (Nichols || Goulding) 39a1d1b26f
feat: Delete parquet files marked to be deleted before a specified time
Connects to #3954.
2022-03-29 08:13:06 -04:00
Nga Tran 80b7e9cce1
feat: delete fully processed tombstones & integration tests for find_and_compact (#4116)
* feat: remove fully processed tombstones

* test: first few tests

* fix: delete SQL

* fix: test how IN (...) works in PG

* fix: test how IN (?) works in PG

* fix: test how IN (?) works in PG

* fix: dynamically add  IN (?, ?, ...)

* fix: dynamically add  IN (?, ?, ...) & its dynamic values

* fix: add argument directly in the SQL

* test: more tests for catalog read and update functions

* chore: move a subfunction to make it easier to read)

* test: first test for find_can_compact but disabled due to bug

* test: integration tests and a bug fix for find_and_compact

* chore: cleanup

* refactor: address review comments

* fix: put 2 delete processed  tombstones and tombstones in a transaction
2022-03-28 18:35:54 +00:00
dependabot[bot] 4f9515ffba
chore(deps): Bump async-trait from 0.1.52 to 0.1.53 (#4141)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.52 to 0.1.53.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.52...0.1.53)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-28 08:55:24 +00:00
dependabot[bot] e5bbc74f7a
chore(deps): Bump paste from 1.0.6 to 1.0.7 (#4140)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.6 to 1.0.7.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.6...1.0.7)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-28 08:44:01 +00:00
Dom Dwyer 8e85846db6 refactor: lowercase error messages
Lowercases the error messages in the big iox_catalog Error enum for
better composition of messages (no random capitalisation in
glued-together strings, which is common with wrapped errors).
2022-03-25 11:33:27 +00:00
Carol (Nichols || Goulding) 67e13a7c34
fix: Change to_delete column on parquet_files to be a time (#4117)
Set to_delete to the time the file was marked as deleted rather than
true.

Fixes #4059.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:47:27 +00:00
Carol (Nichols || Goulding) 2749c37d02
fix: Query for tombstones in a time range, not for a particular parquet file
The compactor at this point is still querying for each file; this is an
intermediate step
2022-03-23 09:52:00 -04:00
Carol (Nichols || Goulding) 87dc2981f6
feat: Query for tombstones relevant to a parquet file
Connects to #3948.
2022-03-23 09:52:00 -04:00
Marco Neumann 55643945a1
refactor: `querier` w/o `db` (#4063)
* feat: `TombstoneRepo::list_by_table`

* feat: `ParquetFileRepo::list_by_table_not_to_delete`

* refactor: `querier` w/o `db`

Get the `querier` to work w/o relying on `db`. A few notes:

- Testing is kinda shallow, we really need to get `query_tests` working
  w/ `querier` (see #3934).
- We still run a sync loop for namespaces, tables and schemas. This will
  be a replaced by "update namespace incl. tables and schemas on demand".
  Note however that we cannot fetch single tables and schemas on demand
  at the moment, because DataFusion doesn't implement async schema
  inspection (only `scan` / "give me all the chunks" is async). I think
  that's OK for now and we can address this later.
- There is NO cache for parquet files and tombstones at the moment. For
  correctness, they need to be fetched in a single transaction (or we
  need a kinda tricky sequence number / logical clock tracking) and I am
  not sure yet how this makes sense when we have the ingester data wired
  up and predicates pushed down to the catalog (see next point). So
  let's measure first and then decide on a caching strategy for this.
- Predicates are currently NOT pushed down to the catalog. I'll need to
  figure out how to extract time range from generic DataFusion
  expressions to make that work (it's easier for InfluxRPC queries, but
  they are not tested at the moment, see first point).

Sorry that this commit is kinda huge. I initially planned to only
migrate the chunks away from `db` and leave the tables and schemas for a
follow-up PR, but the DataFusion trait structure (chunks are bound to
their tables) makes this kinda pointless.

Closes #3974.

* docs: explain what we're doing

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: mention tracking issues

* docs: explain what we're doing

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-03-21 16:58:00 +00:00
Carol (Nichols || Goulding) 8fd3d85634
refactor: Move add_parquet_file_with_tombstones from ingester to compactor 2022-03-21 10:16:57 -04:00
Marco Neumann 0779f81b6b
refactor: rework `TableCache (#4054)
* feat: `TableRepo::get_by_namespace_and_name`

* refactor: rework `TableCache`

- dual cache that can also map table names to IDs
- deal w/ missing tables w/o panics
- set proper timeouts to missing data

For #3974.

* test: extend table cache tests
2022-03-21 13:40:06 +00:00
Luke Bond da517bd8e2
feat: impl table & column limits in catalog (#3832)
fix: refactor table & col limit enforcement in catalog into single SQL statement

fix: borked rebase

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-18 13:54:07 +00:00
Dom Dwyer 0d4949cd1b refactor: lower pg idle connection timeout
Configure the postgres catalog to close unused connections after 1
minute, rather than 500s to introduce a bit of fluidity to pool of
connection acquires.
2022-03-17 13:44:59 +00:00
dependabot[bot] 3f0f090c4e
chore(deps): Bump pretty_assertions from 1.1.0 to 1.2.0 (#4024)
Bumps [pretty_assertions](https://github.com/colin-kiegel/rust-pretty-assertions) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/colin-kiegel/rust-pretty-assertions/releases)
- [Changelog](https://github.com/colin-kiegel/rust-pretty-assertions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/colin-kiegel/rust-pretty-assertions/commits)

---
updated-dependencies:
- dependency-name: pretty_assertions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-14 10:33:27 +00:00
Carol (Nichols || Goulding) 268138ceef
fix: Make SQL queries more consistent
- Use "SELECT *" when possible
- Left align
- Wrap at 100 chars
- Include semicolon
2022-03-13 20:28:12 -04:00
Carol (Nichols || Goulding) 8888e4c3a2
fix: Remove MAX_COMPACT_SIZE from the compaction queries 2022-03-13 20:09:30 -04:00
Carol (Nichols || Goulding) 1dacf567d9
feat: Add a function to the catalog to fetch level 1 parquet files
Fixes #3946.
2022-03-11 15:40:34 -05:00
Carol (Nichols || Goulding) f184b7023c
feat: Update specified parquet file records to compaction level 1
Fixes #3950.
2022-03-11 15:34:40 -05:00
Carol (Nichols || Goulding) fabd262442
feat: Add a function to the catalog to fetch level 0 parquet files
Connects to #3946.
2022-03-11 15:34:05 -05:00
Carol (Nichols || Goulding) ecd06c6ec3
fix: ParquetFileRepo create should be responsible for setting INITIAL_COMPACTION_LEVEL
When created in the catalog, parquet files should always have compaction
level 0. Updating the compaction level should always happen in the
compactor.

Only the catalog should need to know about the initial compaction level
value.
2022-03-10 13:51:18 -05:00
Carol (Nichols || Goulding) ff31407dce
refactor: Extract a ParquetFileParams type for create
This has the advantages of:

- Not needing to create fake parquet file IDs or fake deleted_at
  values that aren't used by create before insertion
- Not needing too many arguments for create
- Naming the arguments so it's easier to see what value is what
  argument, especially in tests
- Easier to reuse arguments or parts of arguments by using copies of
  params, which makes it easier to see differences, especially in tests
2022-03-10 13:51:18 -05:00
Paul Dix 27999ff72f
feat: add compaction_level and created_at to parquet_file (#3972) 2022-03-10 15:56:57 +00:00
Carol (Nichols || Goulding) 1f474bfbf0
test: Create the test database before running postgres iox_catalog tests 2022-03-09 10:43:30 -05:00
Carol (Nichols || Goulding) 8af2f60b59
fix: Run catalog setup as part of end-to-end test setup 2022-03-09 09:55:43 -05:00
Carol (Nichols || Goulding) 93b0cdbcc4
fix: Create the test database as part of ng server fixture startup 2022-03-09 09:55:43 -05:00
Carol (Nichols || Goulding) 880344494a
fix: Remove reference to AWS from postgres test comment 2022-03-09 09:55:42 -05:00
kodiakhq[bot] caba70f871
Merge branch 'main' into cn/not-database-url 2022-03-09 13:32:02 +00:00
Dom Dwyer d31576b90c perf: get_table_persist_info indexes for joins
Adds indexes to the JOINed fields to reduce execution cost, as the
TableRepo::get_table_persist_info() is currently by far the most
expensive catalog operation.
2022-03-08 12:12:47 +00:00
Marco Neumann db3f1e8db7
feat: wire up tombstones into querier (#3962)
* feat: `TombstoneRepo::list_by_namespace`

* test: model sequencer properly

* feat: wire up tombstones into querier

Closes #3932.

* refactor: `override_delete_predicates` => `set_delete_predicates`
2022-03-08 10:06:22 +00:00
Carol (Nichols || Goulding) 4765e447e3
chore: Wrap markdown at 100 columns 2022-03-07 11:02:58 -05:00