Commit Graph

425 Commits (71625043e2b393eecc803e7c30fb3554c7a7881c)

Author SHA1 Message Date
Carol (Nichols || Goulding) d7e75d43ea
fix: Make shard ID optional for compactor queries in RPC write mode 2022-12-16 17:28:53 -05:00
Luke Bond f419e2c378
feat: warm compaction (#6192)
* feat: warm compaction

chore: add missing warm compaction config

chore: tests for warm compaction

chore: modify count usage in warm compaction sql

chore: catalog test for warm compaction; sql fixes

feat: settable target level for compact w/ budget

chore: tests for warm compaction

chore: clarifying comments in warm compaction test

chore: fixed erroneous comment in catalog test

chore: improve warm compactor test by checking file exists

chore: tests for warm compaction

chore: warm compactor test tidy-ups

* chore: improve test for warm compaction

* chore: fix erroneous comment in warm compaction code
2022-12-16 15:59:45 +00:00
Luke Bond 1bc2003cf4 chore: simplify delete namespace sql query 2022-12-16 10:23:50 +00:00
Luke Bond a6036631ad chore: comment typo in catalog 2022-12-16 10:23:50 +00:00
Luke Bond 6263ca234a chore: delete ns postgres impl, test improvements, fix to mem impl 2022-12-16 10:23:50 +00:00
Luke Bond 3659be59c7 feat: delete namespace api mem impl
chore: tests for delete namespace; use unique ptn names in tests
2022-12-16 10:23:50 +00:00
dependabot[bot] e108a8b6c9
chore(deps): Bump paste from 1.0.9 to 1.0.10 (#6384)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.9 to 1.0.10.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.9...1.0.10)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-12-13 06:03:05 +00:00
dependabot[bot] a9db7581cd
chore(deps): Bump tokio from 1.21.2 to 1.22.0 (#6183)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.21.2 to 1.22.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.21.2...tokio-1.22.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-21 10:21:24 +00:00
Luke Bond 7c813c170a
feat: reintroduce compactor first file in partition exception (#6176)
* feat: compactor ignores max file count for first file

chore: typo in comment in compactor

* feat: restore special first file in partition compaction logic; add limit

* fix: calculation in compaction max file count

chore: clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 15:58:59 +00:00
Nga Tran 49a9565240
feat: gRPC that creates namespace (#6103)
* feat: create namespace API call in router

Co-authored-by: Nga Tran <nga-tran@live.com>

* chore: treat retention as ns except in CLI

* fix: overflow in nanosecond calc

* fix: retention test after changing it from hours to ns

* chore: comment clarification in cli; better response type for error in ns API

* fix: correct some rebase mistakes

* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const

* fix: ns autocreation test was wrong after rebase

* fix: mem catalog has default 1hr retention, accidently removed in rebase

* chore: remove mem catalogs default 1hr retention; make it settable in sets & router

Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-18 13:02:12 +00:00
Nga Tran 6f7b1e2e26
feat: reject writes that are outside the retention period (#6148)
* feat: reject writes that are outside the retention period

* feat: add retention validator into handler stack

* chore: Apply suggestions from code review

Co-authored-by: Dom <dom@itsallbroken.com>

* refactor: address review comments

* test: unit tests fot retention validation

* chore: address review comments

* test: more unit tests and integration tests

* refactor: make time inside retention period for emphemeral_mode test

* fix: 2 hours

Co-authored-by: Dom <dom@itsallbroken.com>
2022-11-17 20:55:58 +00:00
Carol (Nichols || Goulding) bdff4e8848
fix: Consistently use 'namespace' instead of 'database' in comments and other internal text 2022-11-11 15:46:04 -05:00
Nga Tran a3f2fe489c
refactor: remove retention_duration field from namespace catalog table (#6124) 2022-11-11 20:30:42 +00:00
Nga Tran 9c4266c503
refactor: first step to remove unused retention_duration (#6113)
* refactor: first step to remove unused retention_duration

* refactor: remove retenion_duration from update catalog

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-11-11 15:21:06 +00:00
Nga Tran 93e11d4c91
chore: Revert "feat: flag partitions for delete (#6075)" (#6111)
This reverts commit 77a2541172.
2022-11-10 17:01:39 +00:00
Nga Tran e81ff1f4d5
chore: Revert "feat: catalog delete old partitions (#6099)" (#6109)
This reverts commit 664b0578e9.
2022-11-10 15:31:16 +00:00
Luke Bond 664b0578e9
feat: catalog delete old partitions (#6099)
* feat: catalog delete old partitions

* chore: remove debug println

* chore: remove debug println

* chore: clippy

* chore: sql statement refactor for deleting partitions

* chore: improve delete partition test

* chore: clippy
2022-11-10 10:51:22 +00:00
Carol (Nichols || Goulding) 43687a86d2
fix: Remove lots of needless borrows that Clippy can now identify
Except for in generated code that we don't control.
2022-11-09 10:54:18 -05:00
Carol (Nichols || Goulding) 07505c8f72
fix: Remove needless borrows, thanks Clippy! 2022-11-09 10:54:18 -05:00
Nga Tran 77a2541172
feat: flag partitions for delete (#6075)
* feat: flag partition for delete

* fix: compare the right date and time

* chore: Run cargo hakari tasks

* chore: cleanup

* fix: typos

* chore: rust style tidy ups in catalog

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
2022-11-09 12:06:23 +00:00
Luke Bond dfb820615c
feat: deletion flagging in GC based on retention policy (#6073)
* feat: deletion flagging in GC based on retention policy

* chore: typo in comment

* fix: only soft delete parquet files that aren't yet soft deleted

* fix: guard against flakiness in catalog test

* chore: some better tests for parquet file delete flagging

Co-authored-by: Nga Tran <nga-tran@live.com>
2022-11-08 20:22:35 +00:00
kodiakhq[bot] 369937d68f
Merge branch 'main' into cn/order-by-insert 2022-11-07 19:18:13 +00:00
Carol (Nichols || Goulding) 74a40cc9bd
fix: Assert that there aren't two columns with the same name in the same batch
This shouldn't be possible; let's make sure we know if it happens!
2022-11-07 14:10:12 -05:00
Luke Bond 5e05fa52cf
feat: soft delete parquet files based on retention period (#6070) 2022-11-07 17:31:29 +00:00
Nga Tran 9356f2a1b9
feat: grpc for updating namespace retention period (#6041)
* refactor: make namespace folder for all namesapce's commands

* feat: WIP for add command to set retention period

* feat: more on updating retention period

* feat: grpc for update namespace retention period

* test: end to end test fpr namespace retention

* fix: lint proto

* chore: cleanup

* chore: kick CI run again

* fix: command hierachy

* chore: fix comments
2022-11-04 20:58:11 +00:00
Carol (Nichols || Goulding) 9b0af96927
docs: Add a link to the idpe issue
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2022-11-04 13:39:25 -04:00
Carol (Nichols || Goulding) d454c66b4b
fix: Use a HashMap for column lookup instead of Vec ordering
The checks for whether a column already exists with a different type
were relying on ordering of the input matching the ordering of the
columns returned from inserting the columns in Postgres.

Rather than trying to match the new ordering that is required to avoid
Postgres deadlocks, switch from a Vec to a HashMap and look up the
column type from the name.

This also reduces some allocations that weren't really needed.
2022-11-04 11:52:37 -04:00
Carol (Nichols || Goulding) a6634ada19
fix: Add an ORDER BY to the insert to prevent Postgres deadlocks
Fixes influxdata/idpe#16298.

Without this ORDER BY, concurrent writes that add many column records to
this table can deadlock because they grab locks to rows/index entries in
an arbitrary order to check the unique index.

By switching to a consistent order across all requests, inserts won't
get in a deadlock loop waiting for each other.

More info:

- <https://rcoh.svbtle.com/postgres-unique-constraints-can-cause-deadlock>
- <https://dba.stackexchange.com/a/195220/27897>
2022-11-04 11:52:37 -04:00
NGA-TRAN 498851eaf5 feat: add catalog columns needed for retention policy 2022-11-01 15:35:15 -04:00
Carol (Nichols || Goulding) 2e83e04eab
feat: Use workspace package metadata to reduce differences and repetition 2022-10-24 13:04:09 -04:00
Jake Goulding df2ba85661 feat: add the ability to delete a skipped compaction 2022-10-21 15:12:20 -04:00
dependabot[bot] b5574c07b7
chore(deps): Bump async-trait from 0.1.57 to 0.1.58 (#5904)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.57 to 0.1.58.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.57...0.1.58)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-19 09:40:26 +00:00
dependabot[bot] f3c27c5c71
chore(deps): Bump dotenvy from 0.15.5 to 0.15.6 (#5881)
Bumps [dotenvy](https://github.com/allan2/dotenvy) from 0.15.5 to 0.15.6.
- [Release notes](https://github.com/allan2/dotenvy/releases)
- [Changelog](https://github.com/allan2/dotenvy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/allan2/dotenvy/compare/v0.15.5...v0.15.6)

---
updated-dependencies:
- dependency-name: dotenvy
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-10-18 07:06:40 +00:00
Dom Dwyer afdc008855 fix: correct default limits 2022-10-14 16:05:56 +02:00
Dom Dwyer e4179605df test: assert default service limit values
Adds tests that assert the default service limit values.
2022-10-14 14:46:34 +02:00
Dom Dwyer 46bbee5423 refactor: reduce default column limit
Reduces the default number of columns allowed per-table, from 1,000 to
200.
2022-10-14 14:45:48 +02:00
Carol (Nichols || Goulding) efb964c390
feat: Enforce table column limits from the schema cache (#5819)
* fix: Avoid some allocations by collecting instead of inserting into a vec

* refactor: Encode that adding columns is for one table at a time

* test: Add another test of column limits

* test: Add below/above limit tests for create_or_get_many

* fix: Explicitly DO NOT check column limits when inserting many columns

* feat: Cache the max_columns_per_table on the NamespaceSchema

* feat: Add a function to validate column limits in-memory

* fix: Provide more useful information when over column limits

* fix: Swap types to remove intermediate allocation

* docs: Explain the interactions of the cache and the column limits

* test: Actually set up test that showcases column limit race condition

* fix: Allow writing to existing columns even if table is over column limit

Co-authored-by: Dom <dom@itsallbroken.com>
2022-10-14 11:34:17 +00:00
Dom Dwyer 3e70dc44a0 refactor(catalog): remove partition_info_by_id()
This method used to return a subset of partition metadata, and was used
exclusively for persistence in the ingester. It is now no longer
necessary.
2022-10-13 15:26:36 +02:00
Dom Dwyer 3e1e4c1f0b refactor: remove Table::get_table_persist_info()
Remove the now-redundant get_table_persist_info() implementations.
2022-10-13 13:44:50 +02:00
Dom Dwyer c4f542bbe2 refactor(ingester): remove tombstone support
This commit removes tombstone support from the ingester, and deletes
associated code/helpers/tests. This commit does NOT remove tombstone
support from any other service, but MAY include removing overlapping
test coverage.

This also removes the tombstone support from the Ingester -> Querier RPC
response message.

This has the nice side effect of removing a whole lot of thread spawning
in the ingester tests for the Executor, speeding everything up!
2022-10-11 13:10:04 +02:00
Dom Dwyer afcb96ae47 perf(ingester): deferred sort key lookup queries
This commit carries the SortKey in the PartitionData, and configures the
ingester to use deferred sort key lookups, smearing the lookups across a
fixed period of time after initialising the PartitionData, instead of
querying for the sort key at persist time.

This allows large numbers of PartitionData to be initialised without
causing a equally large spike in catalog load to resolve the sort key -
instead this load is spread out randomly to reduce peak query rps.
2022-10-06 16:39:54 +02:00
Nga Tran d171697fd7
feat: always pick cold partitions in next cycle even if it has been pa… (#5772)
* fix: always pick cold partitions in next cycle even if it has been partially compacted recently

* fix: comment

* fix: test output

* refactor: using var instead of literal

* fix: consider deleted L0s for recent writes

* chore: cleanup

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-30 15:54:00 +00:00
Dom Dwyer cd4087e00d style: add no todo!() or dbg!() lints
Some crates had theme, some not - lets be consistent and have the
compiler spot dbg!() and todo!() macro calls - they should never be in
prod code!
2022-09-29 13:10:07 +02:00
dependabot[bot] 227dde1dfc
chore(deps): Bump thiserror from 1.0.36 to 1.0.37 (#5753)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.36 to 1.0.37.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.36...1.0.37)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-29 10:37:14 +00:00
Dom Dwyer e19b88cae9 feat(catalog): most recent N partitions
Adds a Partition::most_recent_n() method to the catalog interface,
returning the N most recent partitions for a given set of shards.

The most recently created partitions are likely to be currently "hot"
for writes, and are cheap to list.
2022-09-27 16:22:00 +02:00
Nga Tran 75ff805ee2
feat: instead of adding num_files and memory budget into the reason text column, let us create differnt columns for them. We will be able to filter them easily (#5742)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-26 20:14:04 +00:00
dependabot[bot] b1740f45d6
chore(deps): Bump thiserror from 1.0.35 to 1.0.36 (#5737)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.35 to 1.0.36.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.35...1.0.36)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-26 14:44:36 +00:00
Carol (Nichols || Goulding) c8108f01e7
chore: Upgrade to Rust 1.64 (#5727)
* chore: Upgrade to Rust 1.64

* fix: Use iter find instead of a for loop, thanks clippy

* fix: Remove some needless borrows, thanks clippy

* fix: Use then_some rather than then with a closure, thanks clippy

* fix: Use iter retain rather than filter collect, thanks clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-22 18:04:00 +00:00
Marco Neumann 84e8a4ac41
fix: GC `parquet_file` table in batches (#5691)
* fix: GC `parquet_file` table in batches

Otherwise this transaction will never finish in prod.

* fix: GC shutdown

* refactor: use constant
2022-09-20 11:14:39 +00:00
dependabot[bot] b6fb481b0f
chore(deps): Bump dotenvy from 0.15.3 to 0.15.5 (#5689)
Bumps [dotenvy](https://github.com/allan2/dotenvy) from 0.15.3 to 0.15.5.
- [Release notes](https://github.com/allan2/dotenvy/releases)
- [Changelog](https://github.com/allan2/dotenvy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/allan2/dotenvy/compare/v0.15.3...v0.15.5)

---
updated-dependencies:
- dependency-name: dotenvy
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-20 05:28:47 +00:00
Marko Mikulicic 758649296b fix: Close old connection pool after swap 2022-09-20 01:56:42 +02:00
Marko Mikulicic 14a6b437d8
chore: Lower sqlx logging verbosity (#5681)
The default statement logging verbosity of the `sqlx` crate is INFO, which
is frankly surprising.

The reason we didn't bother with lowering this before is that the `sqlx` crate
emits logs using the `log` crate, and we're using the `tracing` crate for logging too.

We did bridge the two logging ecosystems with https://docs.rs/tracing-log/latest/tracing_log/
but until https://github.com/influxdata/influxdb_iox/pull/5680 the bridge wasn't really working
so we didn't notice the *very* verbose logs of sqlx sstatement logging (which log our whole SQL multiline statements as INFO logs...)
2022-09-19 22:56:05 +00:00
Carol (Nichols || Goulding) 414b0f02ca
fix: Use time helper methods in more places 2022-09-19 13:24:08 -04:00
Carol (Nichols || Goulding) c0c0349bc5
fix: Use typed Time values rather than ns 2022-09-19 12:59:20 -04:00
Carol (Nichols || Goulding) 0e23360da1
refactor: Add helper methods for computing times to TimeProvider 2022-09-19 11:34:43 -04:00
Dom Dwyer 66bf0ff272 refactor(db): NULLable persisted_sequence_number
Makes the partition.persisted_sequence_number column in the catalog DB
NULLable. 0 is a valid persisted sequence number.
2022-09-15 18:19:39 +02:00
Dom Dwyer 234d460fcb chore: rename update_persisted_sequence_number fn 2022-09-15 16:10:35 +02:00
Dom Dwyer d199a83355 feat(catalog): per-partition persist mark API
Adds the "persisted_sequence_number" field to the Partition model, and
updates the catalog API to read & update it.
2022-09-15 16:10:35 +02:00
Dom Dwyer c5ac17399a refactor(db): persist marker for partition table
Adds a migration to add a column "persisted_sequence_number" that
defines the inclusive upper-bound on sequencer writes materialised and
uploaded to object store for the partition.
2022-09-15 16:10:35 +02:00
Luke Bond b52865e018
feat: garbage collector now cleans up old parquet files (#5588)
* feat: garbage collector now cleans up old parquet files

* chore: clarifying comment in GC

* chore: typos in GC

* chore: typos in GC

* fix: cmdline arg in GC test needs updating after refactor

* fix: use select! on shutdown rx in GC

* fix: recalc cutoff in GD each loop

* chore: add delete_old that returns IDs only, for GC

* chore: use duration in GC args instead of usize days

* chore: GC lister runs forever w/ sleep; tests updated accordingly

* docs: fix link in GC comments to automatic link

* chore: test for delete_old_ids_only; refactor mem impl thereof

* chore: make GC test less flakey

* chore: make GC test less flakey

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-14 14:09:28 +00:00
dependabot[bot] b4a25fdb0e
chore(deps): Bump thiserror from 1.0.34 to 1.0.35 (#5629)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.34 to 1.0.35.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.34...1.0.35)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-14 12:54:12 +00:00
kodiakhq[bot] 85641efa6f
Merge branch 'main' into cn/infallible-estimated-bytes 2022-09-14 01:00:10 +00:00
Luke Bond ee3f172d45 chore: renamed DB migration for billing trigger 2022-09-13 16:29:14 +01:00
Luke Bond c8b545134e chore: add index to speed up billing_summary upsert 2022-09-13 16:22:44 +01:00
Luke Bond feae712881 fix: parquet_file billing trigger respects to_delete 2022-09-13 16:22:44 +01:00
Luke Bond 80661a5d1c chore: clippy 2022-09-13 16:22:44 +01:00
Luke Bond 10acaf4567 chore: added test for parquet file delete trigger 2022-09-13 16:22:44 +01:00
Luke Bond cc93b2c275 chore: add catalog trigger for billing 2022-09-13 16:22:44 +01:00
Carol (Nichols || Goulding) 224e3cec10
fix: Use ColumnType in errors rather than strings 2022-09-12 17:35:27 -04:00
Carol (Nichols || Goulding) 20e6d26aa9
refactor: Have sqlx decode ColumnTypes in the catalog 2022-09-12 16:50:25 -04:00
Carol (Nichols || Goulding) aba9759268
test: Correct expectations with skipped compactions 2022-09-12 13:13:29 -04:00
Carol (Nichols || Goulding) ef35f2e236
fix: Always parameterize the compaction level to postgres
So that we don't have hardcoded values in SQL that could get out of sync
2022-09-12 13:13:26 -04:00
Carol (Nichols || Goulding) da201ba87f
fix: Select by num of both l0 and l1 files for cold compaction
Now that we're going to compact level 1 files in to level 2 files as
well.
2022-09-12 13:13:26 -04:00
Carol (Nichols || Goulding) 6bba3fafaa
fix: If full compaction group has only 1 file, upgrade level
As opposed to running full compaction.

Makes the catalog function general and take the level as a parameter
rather than only upgrade to level 1.
2022-09-12 13:13:26 -04:00
Carol (Nichols || Goulding) 327446f0cd
fix: Change default cold hours threshold from 24 hours to 8
As requested in https://github.com/influxdata/influxdb_iox/issues/5330#issuecomment-1212468682
2022-09-12 13:13:26 -04:00
Carol (Nichols || Goulding) 20e7c4f4e5
test: Check all returned partitions to make sure the skipped one isn't there 2022-09-09 17:24:09 -04:00
Carol (Nichols || Goulding) c92aebd595
feat: Exclude skipped partitions from compaction candidacy
Connects to #5458.
2022-09-09 15:31:07 -04:00
Carol (Nichols || Goulding) fbe3e360d2
feat: Record skipped compactions in memory
Connects to #5458.
2022-09-09 15:31:07 -04:00
YIXIAO SHI 52ae60bf2e
chore: fix comment typo (#5551)
Co-authored-by: Dom <dom@itsallbroken.com>
2022-09-07 08:49:29 +00:00
Marco Neumann adeacf416c
ci: fix (#5569)
* ci: use same feature set in `build_dev` and `build_release`

* ci: also enable unstable tokio for `build_dev`

* chore: update tokio to 1.21 (to fix console-subscriber 0.1.8

* fix: "must use"
2022-09-06 14:13:28 +00:00
dependabot[bot] 9f0b0328f7
chore(deps): Bump thiserror from 1.0.33 to 1.0.34 (#5556)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.33 to 1.0.34.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.33...1.0.34)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-06 09:18:41 +00:00
Nga Tran dde65fa7ef
fix: remove timestamp functions from SQLs to be able to use index for improving performance (#5547) 2022-09-02 19:43:52 +00:00
Nga Tran cbfd37540a
feat: add index on parquet_file(shard_id, compaction_level, to_delete, created_at) (#5544) 2022-09-02 14:27:29 +00:00
Nga Tran c8cbc5299b
feat: make compactors to select candidates based on the last n minutes (#5535)
* feat: make compactors to select candidates based on the last n minutes to reduce workload for postgres catalog query

* refactor: remove 1-minute case per review comment
2022-09-01 20:07:26 +00:00
Carol (Nichols || Goulding) 8a0fa616cf
fix: Rename columns, tables, indexes and constraints in postgres catalog 2022-09-01 10:00:54 -04:00
dependabot[bot] 7c61bdcf35
chore(deps): Bump paste from 1.0.8 to 1.0.9 (#5526)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.8 to 1.0.9.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.8...1.0.9)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-01 12:07:53 +00:00
dependabot[bot] 9af93ca9ba
chore(deps): Bump pretty_assertions from 1.2.1 to 1.3.0 (#5517)
Bumps [pretty_assertions](https://github.com/rust-pretty-assertions/rust-pretty-assertions) from 1.2.1 to 1.3.0.
- [Release notes](https://github.com/rust-pretty-assertions/rust-pretty-assertions/releases)
- [Changelog](https://github.com/rust-pretty-assertions/rust-pretty-assertions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/rust-pretty-assertions/rust-pretty-assertions/compare/v1.2.1...v1.3.0)

---
updated-dependencies:
- dependency-name: pretty_assertions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-01 10:20:26 +00:00
dependabot[bot] 00ed79ff1b
chore(deps): Bump thiserror from 1.0.32 to 1.0.33 (#5524)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.32 to 1.0.33.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.32...1.0.33)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-01 09:11:31 +00:00
Nga Tran cb10a7c6d8
feat: More accurate memory estimate for compaction (#5471)
* feat: initial implementation of memory estimation for a compaction

* feat: estimate size of files and have the right actions for the needed budget

* feat: run candidates in parallel

* fix: have the right name for the column field of the output struct

* feat: add metrics for estimated budgets

* chore: cleanup

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* fix: fix syntax after applying review's suggestions

* refactor: Convert a Vec to VecDeque to go well with pop and push

* chore: remove max_concurrent_size_bytes and input_size_threshold_bytes

* chore: remove input_file_count_threshold

* test: tests for estimate_arrow_bytes_for_file

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-30 13:44:44 +00:00
Carol (Nichols || Goulding) dbd27f648f
refactor: Rename more mentions of Kafka to their other name where appropriate 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 1b49ad25f7
refactor: Rename KafkaTopicId to TopicId 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 58f0b63cdc
refactor: Rename KafkaTopic to Topic or TopicMetadata or topic name as appropriate 2022-08-29 14:27:02 -04:00
Carol (Nichols || Goulding) 74c9529062
fix: Rename KafkaPartition to ShardIndex 2022-08-29 14:07:18 -04:00
Carol (Nichols || Goulding) ab20828c2f
fix: Rename some more comments and test values from sequencer to shard 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) fe9c474620
fix: rustfmt 2022-08-29 14:06:45 -04:00
Jake Goulding 4abf21c724
refactor: Rename Sequencer (and its entourage) to Shard 2022-08-29 14:06:43 -04:00
Marko Mikulicic 4beb721a9a
fix: Revert Bump dotenvy from 0.15.1 to 0.15.2 (#5450) (#5455)
This reverts commit 84acbd2fad.

Closes #5454

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-24 09:10:09 +00:00
dependabot[bot] 84acbd2fad
chore(deps): Bump dotenvy from 0.15.1 to 0.15.2 (#5450)
Bumps [dotenvy](https://github.com/allan2/dotenvy) from 0.15.1 to 0.15.2.
- [Release notes](https://github.com/allan2/dotenvy/releases)
- [Changelog](https://github.com/allan2/dotenvy/blob/master/CHANGELOG.md)
- [Commits](https://github.com/allan2/dotenvy/commits/v0.15.2)

---
updated-dependencies:
- dependency-name: dotenvy
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-23 11:24:42 +00:00
pierwill 51141f2c78
docs: Edit catalog API docs (#5409)
* docs: Edit Catalog docs string

* docs: Edit top-level catalog module doc

* docs: Mark `sealed` trait w/ `doc(hidden)`

* docs: Edit catalog transaction docs

* docs: Edit Catolog trait docs

* docs: Edit `RepoCollection` docs

Clarify concept of repository.

Add links.

* docs: Add link to `Transaction`

Co-authored-by: pierwill <pierwill@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-16 21:36:29 +00:00
Dom Dwyer 180ff9f681 feat: table name in schema validation errors
Scopes all schema validation errors to include the table name in the
error output.
2022-08-16 19:00:44 +02:00
Carol (Nichols || Goulding) fc62c82722
feat: Select cold partitions 2022-08-04 16:55:47 -04:00
Marco Neumann 273b3cc165
chore: replace `dotenv` with `dotenvy` (#5285)
The latter one is a maintained fork. This avoids having both crates
after #5282.
2022-08-03 12:41:38 +00:00
dependabot[bot] 94fe5b4c10
chore(deps): Bump paste from 1.0.7 to 1.0.8 (#5280)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.7 to 1.0.8.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.7...1.0.8)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-03 09:03:25 +00:00
dependabot[bot] fbd39844d8
chore(deps): Bump async-trait from 0.1.56 to 0.1.57 (#5247)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.56 to 0.1.57.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.56...0.1.57)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-08-01 08:30:33 +00:00
Nga Tran a2c82a6f1c
chore: remove min sequence number from the catalog table as we no longer use it (#5178)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 20:47:55 +00:00
Nga Tran 69cb3f2b19
refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151)
* refactor: remove min_sequnce_number

* fix: typos

* fix: remove min_sequencer_number from new files from merging main

* fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier

* test: add test_compactor_collision back but modify the input to make it work woth new changes

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 13:51:54 +00:00
Nga Tran c8f4000f04
feat: Select compaction candidates (#5131)
* feat: initial implementation for selecting compaction candidates

* feat: 2 catalog functions to choose the most thorughput partitions to compact and the selecting candidate function itself

* test: tests for the new 2 queries

* feat: more tests and metrics for chooing compaction candidates

* chore: Apply self suggestions from self review

* chore: cleanup

* chore: fix doc comment

* chore: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

* refactor: address review comments

* fix: get the right time provider for the tests

* refactor: remove the left over compaction_

* fix: typos

* fix: make the param name and env name consistent

* refactor: make relevant iSomething to uSomething

* fix: typo

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-07-18 18:05:13 +00:00
Jake Goulding 635f535e0e refactor: replace level_2 with level_1 2022-07-16 21:49:45 -04:00
dependabot[bot] 9b67de2f43
chore(deps): Bump tokio from 1.19.2 to 1.20.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-14 01:21:43 +00:00
Carol (Nichols || Goulding) 61c023139b
refactor: Switch compaction levels to an enum with values rather than separate consts
Bonuses:

- Type checking
- Validation
- Less casting
- Exhaustiveness checking
- Less use of the numerical value
2022-07-13 11:30:36 -04:00
Carol (Nichols || Goulding) 80b6c5c82f
fix: Correct typo in constant name so searching for COMPACTION_LEVEL returns all (#5077) 2022-07-08 16:31:52 +00:00
Carol (Nichols || Goulding) a96976db46
fix: Start Kafka Partition IDs for default records at 0, not 1
In the all-in-one command, only one write buffer partition is supported,
and it's specified using Kafka Partition ID 0:

```
        // All-in-one mode only supports one write buffer partition.
        let write_buffer_partition_range_start = 0;
        let write_buffer_partition_range_end = 0;
```

When using all-in-one mode with an ephemeral, in-memory catalog,
`create_or_get_default_records` is what puts records into the catalog
that need to match the write buffer configuration.
2022-07-06 11:00:55 -04:00
Marco Neumann be53716e4d
refactor: use IDs for `parquet_file.column_set` (#4965)
* feat: `ColumnRepo::list_by_table_id`

* refactor: use IDs for `parquet_file.column_set`

Closes #4959.

* refactor: introduce `TableSchema::column_id_map`
2022-06-30 15:08:41 +00:00
Marko Mikulicic 16a8d29b9f
fix: Fix typo in const name (#4993) 2022-06-30 07:51:39 +00:00
Nga Tran cfcc4b8426
refactor: change level 1 to level 2 preparing for next design changes (#4954)
* refactor: change level 1 to level 2 preparing for next design changes

* fix: make level-2 consistent everywhere

* chore: remove unused comments

* refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent

* chore: add correspinding constants for the comapction levels in the comments

Co-authored-by: Dom <dom@itsallbroken.com>
2022-06-29 14:08:58 +00:00
Marco Neumann 215f297162
refactor: parquet file metadata from catalog (#4949)
* refactor: remove `ParquetFileWithMetadata`

* refactor: remove `ParquetFileRepo::parquet_metadata`

* refactor: parquet file metadata from catalog

Closes #4124.
2022-06-27 15:38:39 +00:00
Marco Neumann b9cbb3dfca
refactor: do not use in-parquet IOx metadata in compactor (*) (#4935)
* refactor: avoid feeding sort key from struct into same struct

* feat: allow namespace schema query by ID

* refactor: do not use binary parquet file MD in compactor tests

* refactor: do not use in-parquet IOx metadata

* refactor: reduce number of catalog queries
2022-06-27 08:06:11 +00:00
Nga Tran 92eeb5b232
chore: remove unused sort_key_old from catalog partition (#4944)
* chore: remove unused sort_key_old from catalog partition

* chore: add new line at the end of the SQL file
2022-06-24 15:02:38 +00:00
Marco Neumann 994bc5fefd
refactor: ensure that SQL parquet file column sets are not NULL (#4937)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-24 14:26:18 +00:00
Marco Neumann c3912e34e9
refactor: store per-file column set in catalog (#4908)
* refactor: store per-file column set in catalog

Together with the table-wide schema and the partition-wide sort key, this should
be everything we need to read a parquet file directly into memory
without peeking any file-level metadata.

The querier will use this to directly load parquet files into the read
buffer.

**WARNING: This requires a catalog wipe!**

Ref #4124.

* refactor: use proper `ColumnSet` type
2022-06-21 10:26:12 +00:00
Marco Neumann 0fbff981ec
chore(deps): Bump sqlx to 0.6.0 and uuid to 1 (#4894)
Closes #4889.
Closes #4890.

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-17 10:28:28 +00:00
Andrew Lamb 005610b172
refactor: remove some `&` use in iox_catalog (#4862)
* refactor: remove some `&` use in iox_catalog

* fix: Update data_types/src/lib.rs
2022-06-15 11:31:49 +00:00
Nga Tran b682dbbc2e
chore: Add debug info of sort_key for ingester (#4859)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 20:39:17 +00:00
Carol (Nichols || Goulding) e875a92cf8
feat: Log time spent requesting ingester partitions (#4806)
* feat: Log time spent requesting ingester partitions

Fixes #4558.

* feat: Record a metric for the duration queriers wait on ingesters

* fix: Use DurationHistogram instead of U64 Histogram

* test: Add a test for the ingester ms metric

* feat: Add back the logging to provide both logging and metrics for ingester duration

* refactor: Use sample_count method on metrics

* feat: Record ingester duration separately for success or failure

* fix: Create a separate test for the ingester metrics

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-14 17:58:19 +00:00
Dom Dwyer b41ea1d718 refactor: PartitionKey type
This commit changes the code base to use a new reference-counted
PartitionKey type wrapper, instead of passing a bare String around.

This allows the compiler to type check & verify usage of the partition
key, instead of passing a bare string around. By reference counting the
underlying string, we reduce memory usage for some use cases.
2022-06-14 14:47:56 +01:00
kodiakhq[bot] dd8d44e24f
Merge branch 'main' into cn/duration 2022-06-10 14:23:09 +00:00
Nga Tran 13c57d524a
feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801)
* feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string

* test: add column with comma

* fix: use new protonuf field to avoid incompactible

* fix: ensure sort_key is an empty array rather than NULL

* refactor: address review comments

* refactor: address more comments

* chore: clearer comments

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql

* fix: Rename migration so it will be applied after

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2022-06-10 13:31:31 +00:00
Marko Mikulicic c09f6f6bc9
chore: Incrementally migrate sort_key to array type (#4826)
This PR is the first step where we add a new column sort_key_arr whose content we'll manually migrate from sort_key.

When we're done with this, we'll merge https://github.com/influxdata/influxdb_iox/pull/4801/ (whose migration script must be adapted slightly to rename the `sort_key_arr` column back to `sort_key`).

All this must be done while we shut down the ingesters and the compactors.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-10 11:35:43 +00:00
Andrew Lamb 50697906b1
refactor: Make `DMLWrite::sequence_number` a `SequenceNumber` (#4817) 2022-06-09 19:36:37 +00:00
Carol (Nichols || Goulding) 1c7cbaf5ae
refactor: Use DurationHistogram in more places 2022-06-09 14:20:51 -04:00
dependabot[bot] e03bf94420
chore(deps): Bump tokio from 1.18.2 to 1.19.1 (#4783)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.18.2 to 1.19.1.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.18.2...tokio-1.19.1)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-06 14:15:12 +00:00
dependabot[bot] 9a21292db8
chore(deps): Bump async-trait from 0.1.53 to 0.1.56 (#4774)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.53 to 0.1.56.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.53...0.1.56)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-06-03 09:10:40 +00:00
Ryan Russell d279deddad
docs(various): Improve Readability (#4768)
Signed-off-by: Ryan Russell <git@ryanrussell.org>

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-02 18:01:06 +00:00
Andrew Lamb dde3c3922c
refactor: use consistent spelling of serialize (#4717) 2022-05-27 14:42:59 +00:00
Carol (Nichols || Goulding) 077884c925
fix: Remove allow dead_code annotations from undead code 2022-05-06 16:58:02 -04:00
Carol (Nichols || Goulding) 6681298a93
fix: Remove unused dependencies found with cargo-udeps 2022-05-06 14:51:54 -04:00
Carol (Nichols || Goulding) 068096e7e1
fix: Rename data_types2 to data_types 2022-05-06 14:45:39 -04:00
Carol (Nichols || Goulding) 12793bffbf
fix: Move Partition Template types to data_types2 2022-05-06 14:45:36 -04:00
Andrew Lamb 7c7d3fafe9
Merge branch 'main' into dom/schema-cache-warm 2022-04-29 09:11:53 -04:00
Marco Neumann 0a20086a58
feat: expose catalog timeouts via CLI/env (#4472)
This is useful for local instances that run against a prod system,
because port forwarding can lead to long connection delays.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-29 11:14:15 +00:00
Paul Dix 8e48fcd620
feat: add remote pull partition (#4433)
Add lookup of partitions by table id to catalog.
Add API to catalog to return partitions by table id.
Add to client to return partitions by table id.
Add CLI to pull remote schema, partition, and parquet files into a local catalog and object store.
2022-04-28 21:04:27 +00:00
dependabot[bot] 420c306caa
chore(deps): Bump tokio from 1.17.0 to 1.18.0 (#4453)
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.17.0 to 1.18.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.17.0...tokio-1.18.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-04-28 08:21:17 +00:00
Dom Dwyer bb8a19b571 feat(iox_catalog): list_schemas()
Adds a function to resolve an atomic snapshot of all NamespaceSchema in
the catalog with minimal query overhead.
2022-04-27 17:23:28 +01:00
Dom Dwyer 874521da8a feat(iox_catalog): ColumnRepo::list()
Allow all columns in the catalog to be fetched.
2022-04-27 17:21:00 +01:00
Dom Dwyer eb5abce99e feat(iox_catalog): TableRepo::list()
Allow all tables in the catalog to be fetched.
2022-04-27 17:20:53 +01:00
二手掉包工程师 4b47d723b1
refactor: Rename time to iox_time (#4416)
Signed-off-by: hi-rustin <rustin.liu@gmail.com>

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-26 00:19:59 +00:00
Marco Neumann 86e8f05ed1
fix: make all catalog IDs 64bit (#4418)
Closes #4365.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-25 16:49:34 +00:00
Dom Dwyer 320f1073e0 fix: revert column service limits (#4179)
This reverts commit ea865b63f4.
2022-04-19 16:08:56 +01:00
Paul Dix 5bf4550259
feat: add object store service to router (#4338)
Add method to catalog to get parquet file by object store id.
Add gRPC service for object store to get a file from by its uuid.
Add the object store service to router2 with object store config.
2022-04-16 17:58:31 +00:00
Carol (Nichols || Goulding) 94dcde4996
fix: Do fewer queries for metadata
By adding another _with_metadata catalog function. Also introduce a new
type rather than passing around tuples everywhere.
2022-04-13 10:43:20 -04:00
Carol (Nichols || Goulding) bba4251363
fix: Remove duplication in metric name 2022-04-13 10:43:19 -04:00
Carol (Nichols || Goulding) 02fee3b84f
feat: Request parquet metadata from the catalog when needed only 2022-04-13 10:43:19 -04:00
Carol (Nichols || Goulding) ec25620b73
feat: Add a catalog method for requesting a parquet file's metadata 2022-04-13 10:43:19 -04:00
Carol (Nichols || Goulding) ee56ebf0e3
feat: Store metadata in catalog, but don't fetch by default 2022-04-13 10:43:19 -04:00
Dom Dwyer 02f87e8484 refactor: reduce level_0 query limit
Reduce the query limit from 10,000 to 1,000 to help reduce query
execution time.
2022-04-05 15:14:56 +01:00
Paul Dix 81d41f81a1
fix: ingester replay logic (#4212)
Fix the ingester to track the max persisted sequence number per partition.
Ensure replay takes in data from unpersisted partitions.
Simplify the table persist info to not return a max persisted sequence number for the table as that information isn't needed.
2022-04-04 18:04:34 +00:00
kodiakhq[bot] e2439c0a4f
Merge branch 'main' into cn/sort-key-catalog 2022-04-04 16:54:48 +00:00
Dom Dwyer 61bc9c83ad refactor: add table_id index on column_name
After checking the postgres workload for the catalog in prod, this
missing index was noted as the cause of unexpectedly expensive plans for
simple queries.
2022-04-04 13:04:25 +01:00
dependabot[bot] dc9632114c
chore(deps): Bump pretty_assertions from 1.2.0 to 1.2.1 (#4213)
Bumps [pretty_assertions](https://github.com/colin-kiegel/rust-pretty-assertions) from 1.2.0 to 1.2.1.
- [Release notes](https://github.com/colin-kiegel/rust-pretty-assertions/releases)
- [Changelog](https://github.com/colin-kiegel/rust-pretty-assertions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/colin-kiegel/rust-pretty-assertions/compare/v1.2.0...v1.2.1)

---
updated-dependencies:
- dependency-name: pretty_assertions
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-04 10:53:31 +00:00
Carol (Nichols || Goulding) cbf7888435
feat: Add Partition update_sort_key method to catalog 2022-04-01 15:45:51 -04:00
Carol (Nichols || Goulding) c9bc70f03a
feat: Add optional sort_key column to partition table
Connects to #4195.
2022-04-01 15:45:51 -04:00
Luke Bond ea865b63f4
fix: create_or_get_multi for column in catalog now enforces limits (#4179)
* fix: create_or_get_multi for column in catalog now enforces limits

fix: create_or_get_multi for column in catalog now enforces limits
chore: reorder catalog column create fns to be next to each other
test: add failing test for multi col insert w/ limits

test: bend catalog mem impl to match postgres for tests

fix: postgres column insert many column type error checks

chore: clippy

* test: assert column counts in partial column insert test

* chore: add some sql comments to the monster multicolumn insert query; s/RIGHT/INNER/ join

* chore: adding comments to clarify partial failure behaviour of multi col insert

* test: add tests for create_or_get_many columns in catalog

* test: forgot how macros work for a moment

* test: service limit test handles partial update of cols

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-01 10:59:43 +00:00
Paul Dix 6479e1fc8e
fix: add indexes to parquet_file (#4198)
Add indexes so compactor can find candidate partitions and specific partition files quickly.
Limit number of level 0 files returned for determining candidates. This should ensure that if comapction is very backed up, it will be able to work through the backlog without evaluating the entire world.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-04-01 09:59:39 +00:00
Nga Tran ddc2c8304f
fix: have the compaction level set correctly (#4184)
* fix: have the compaction level set correctly, especially for compacted file from the compactor

* fix: typo
2022-03-30 21:23:40 +00:00
Paul Dix 04d961e70d
feat: wire up compactor scheduler and config (#4139)
Add configuration options for compactor for the max size of level 0 files and split percentage.
Add metrics for compaction to track the number of candidates, compactions, and durations.
Add functions to separate identifying partitions to compact from running compaction.
Make compaction run in smaller chunks, specifically per partition.
Update compaction to automatically promote level 0 files that are non-overlapping without waiting some period of time.

Closes #4120

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 17:45:24 +00:00
Marko Mikulicic 2c47d77a5b
fix: Backfill namespace_id in schema migration (#4177)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-30 16:31:26 +00:00
Carol (Nichols || Goulding) 79447aed33
fix: Logical merge conflict, missing namespace_id in test setup 2022-03-29 08:28:51 -04:00
Carol (Nichols || Goulding) 5c8a80dca6
fix: Add an index to parquet_file to_delete 2022-03-29 08:15:26 -04:00
Carol (Nichols || Goulding) f3f792fd08
feat: Add namespace_id to the parquet_files table; object store paths need it 2022-03-29 08:15:26 -04:00
Carol (Nichols || Goulding) 39a1d1b26f
feat: Delete parquet files marked to be deleted before a specified time
Connects to #3954.
2022-03-29 08:13:06 -04:00
Nga Tran 80b7e9cce1
feat: delete fully processed tombstones & integration tests for find_and_compact (#4116)
* feat: remove fully processed tombstones

* test: first few tests

* fix: delete SQL

* fix: test how IN (...) works in PG

* fix: test how IN (?) works in PG

* fix: test how IN (?) works in PG

* fix: dynamically add  IN (?, ?, ...)

* fix: dynamically add  IN (?, ?, ...) & its dynamic values

* fix: add argument directly in the SQL

* test: more tests for catalog read and update functions

* chore: move a subfunction to make it easier to read)

* test: first test for find_can_compact but disabled due to bug

* test: integration tests and a bug fix for find_and_compact

* chore: cleanup

* refactor: address review comments

* fix: put 2 delete processed  tombstones and tombstones in a transaction
2022-03-28 18:35:54 +00:00
dependabot[bot] 4f9515ffba
chore(deps): Bump async-trait from 0.1.52 to 0.1.53 (#4141)
Bumps [async-trait](https://github.com/dtolnay/async-trait) from 0.1.52 to 0.1.53.
- [Release notes](https://github.com/dtolnay/async-trait/releases)
- [Commits](https://github.com/dtolnay/async-trait/compare/0.1.52...0.1.53)

---
updated-dependencies:
- dependency-name: async-trait
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-28 08:55:24 +00:00
dependabot[bot] e5bbc74f7a
chore(deps): Bump paste from 1.0.6 to 1.0.7 (#4140)
Bumps [paste](https://github.com/dtolnay/paste) from 1.0.6 to 1.0.7.
- [Release notes](https://github.com/dtolnay/paste/releases)
- [Commits](https://github.com/dtolnay/paste/compare/1.0.6...1.0.7)

---
updated-dependencies:
- dependency-name: paste
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-03-28 08:44:01 +00:00
Dom Dwyer 8e85846db6 refactor: lowercase error messages
Lowercases the error messages in the big iox_catalog Error enum for
better composition of messages (no random capitalisation in
glued-together strings, which is common with wrapped errors).
2022-03-25 11:33:27 +00:00
Carol (Nichols || Goulding) 67e13a7c34
fix: Change to_delete column on parquet_files to be a time (#4117)
Set to_delete to the time the file was marked as deleted rather than
true.

Fixes #4059.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-23 18:47:27 +00:00
Carol (Nichols || Goulding) 2749c37d02
fix: Query for tombstones in a time range, not for a particular parquet file
The compactor at this point is still querying for each file; this is an
intermediate step
2022-03-23 09:52:00 -04:00
Carol (Nichols || Goulding) 87dc2981f6
feat: Query for tombstones relevant to a parquet file
Connects to #3948.
2022-03-23 09:52:00 -04:00
Marco Neumann 55643945a1
refactor: `querier` w/o `db` (#4063)
* feat: `TombstoneRepo::list_by_table`

* feat: `ParquetFileRepo::list_by_table_not_to_delete`

* refactor: `querier` w/o `db`

Get the `querier` to work w/o relying on `db`. A few notes:

- Testing is kinda shallow, we really need to get `query_tests` working
  w/ `querier` (see #3934).
- We still run a sync loop for namespaces, tables and schemas. This will
  be a replaced by "update namespace incl. tables and schemas on demand".
  Note however that we cannot fetch single tables and schemas on demand
  at the moment, because DataFusion doesn't implement async schema
  inspection (only `scan` / "give me all the chunks" is async). I think
  that's OK for now and we can address this later.
- There is NO cache for parquet files and tombstones at the moment. For
  correctness, they need to be fetched in a single transaction (or we
  need a kinda tricky sequence number / logical clock tracking) and I am
  not sure yet how this makes sense when we have the ingester data wired
  up and predicates pushed down to the catalog (see next point). So
  let's measure first and then decide on a caching strategy for this.
- Predicates are currently NOT pushed down to the catalog. I'll need to
  figure out how to extract time range from generic DataFusion
  expressions to make that work (it's easier for InfluxRPC queries, but
  they are not tested at the moment, see first point).

Sorry that this commit is kinda huge. I initially planned to only
migrate the chunks away from `db` and leave the tables and schemas for a
follow-up PR, but the DataFusion trait structure (chunks are bound to
their tables) makes this kinda pointless.

Closes #3974.

* docs: explain what we're doing

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* docs: mention tracking issues

* docs: explain what we're doing

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-03-21 16:58:00 +00:00
Carol (Nichols || Goulding) 8fd3d85634
refactor: Move add_parquet_file_with_tombstones from ingester to compactor 2022-03-21 10:16:57 -04:00
Marco Neumann 0779f81b6b
refactor: rework `TableCache (#4054)
* feat: `TableRepo::get_by_namespace_and_name`

* refactor: rework `TableCache`

- dual cache that can also map table names to IDs
- deal w/ missing tables w/o panics
- set proper timeouts to missing data

For #3974.

* test: extend table cache tests
2022-03-21 13:40:06 +00:00
Luke Bond da517bd8e2
feat: impl table & column limits in catalog (#3832)
fix: refactor table & col limit enforcement in catalog into single SQL statement

fix: borked rebase

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-18 13:54:07 +00:00
Dom Dwyer 0d4949cd1b refactor: lower pg idle connection timeout
Configure the postgres catalog to close unused connections after 1
minute, rather than 500s to introduce a bit of fluidity to pool of
connection acquires.
2022-03-17 13:44:59 +00:00
dependabot[bot] 3f0f090c4e
chore(deps): Bump pretty_assertions from 1.1.0 to 1.2.0 (#4024)
Bumps [pretty_assertions](https://github.com/colin-kiegel/rust-pretty-assertions) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/colin-kiegel/rust-pretty-assertions/releases)
- [Changelog](https://github.com/colin-kiegel/rust-pretty-assertions/blob/main/CHANGELOG.md)
- [Commits](https://github.com/colin-kiegel/rust-pretty-assertions/commits)

---
updated-dependencies:
- dependency-name: pretty_assertions
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-03-14 10:33:27 +00:00
Carol (Nichols || Goulding) 268138ceef
fix: Make SQL queries more consistent
- Use "SELECT *" when possible
- Left align
- Wrap at 100 chars
- Include semicolon
2022-03-13 20:28:12 -04:00
Carol (Nichols || Goulding) 8888e4c3a2
fix: Remove MAX_COMPACT_SIZE from the compaction queries 2022-03-13 20:09:30 -04:00
Carol (Nichols || Goulding) 1dacf567d9
feat: Add a function to the catalog to fetch level 1 parquet files
Fixes #3946.
2022-03-11 15:40:34 -05:00
Carol (Nichols || Goulding) f184b7023c
feat: Update specified parquet file records to compaction level 1
Fixes #3950.
2022-03-11 15:34:40 -05:00
Carol (Nichols || Goulding) fabd262442
feat: Add a function to the catalog to fetch level 0 parquet files
Connects to #3946.
2022-03-11 15:34:05 -05:00
Carol (Nichols || Goulding) ecd06c6ec3
fix: ParquetFileRepo create should be responsible for setting INITIAL_COMPACTION_LEVEL
When created in the catalog, parquet files should always have compaction
level 0. Updating the compaction level should always happen in the
compactor.

Only the catalog should need to know about the initial compaction level
value.
2022-03-10 13:51:18 -05:00
Carol (Nichols || Goulding) ff31407dce
refactor: Extract a ParquetFileParams type for create
This has the advantages of:

- Not needing to create fake parquet file IDs or fake deleted_at
  values that aren't used by create before insertion
- Not needing too many arguments for create
- Naming the arguments so it's easier to see what value is what
  argument, especially in tests
- Easier to reuse arguments or parts of arguments by using copies of
  params, which makes it easier to see differences, especially in tests
2022-03-10 13:51:18 -05:00
Paul Dix 27999ff72f
feat: add compaction_level and created_at to parquet_file (#3972) 2022-03-10 15:56:57 +00:00
Carol (Nichols || Goulding) 1f474bfbf0
test: Create the test database before running postgres iox_catalog tests 2022-03-09 10:43:30 -05:00
Carol (Nichols || Goulding) 8af2f60b59
fix: Run catalog setup as part of end-to-end test setup 2022-03-09 09:55:43 -05:00
Carol (Nichols || Goulding) 93b0cdbcc4
fix: Create the test database as part of ng server fixture startup 2022-03-09 09:55:43 -05:00
Carol (Nichols || Goulding) 880344494a
fix: Remove reference to AWS from postgres test comment 2022-03-09 09:55:42 -05:00
kodiakhq[bot] caba70f871
Merge branch 'main' into cn/not-database-url 2022-03-09 13:32:02 +00:00
Dom Dwyer d31576b90c perf: get_table_persist_info indexes for joins
Adds indexes to the JOINed fields to reduce execution cost, as the
TableRepo::get_table_persist_info() is currently by far the most
expensive catalog operation.
2022-03-08 12:12:47 +00:00
Marco Neumann db3f1e8db7
feat: wire up tombstones into querier (#3962)
* feat: `TombstoneRepo::list_by_namespace`

* test: model sequencer properly

* feat: wire up tombstones into querier

Closes #3932.

* refactor: `override_delete_predicates` => `set_delete_predicates`
2022-03-08 10:06:22 +00:00
Carol (Nichols || Goulding) 4765e447e3
chore: Wrap markdown at 100 columns 2022-03-07 11:02:58 -05:00
Carol (Nichols || Goulding) 4dacf0d68f
fix: Instead of using DATABASE_URL, use INFLUXDB_IOX_CATALOG_DSN and TEST_INFLUXDB_IOX_CATALOG_DSN 2022-03-07 11:02:58 -05:00