This commit fixes loads of crates (47!) had unused dependencies, or
mis-configured dependencies (test deps as normal deps).
I added the "unused_crate_dependencies" to all crates to help prevent
this mess from growing again!
https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html
This has the minor downside of false-positives when specifying
dev-dependencies for test/bench binaries - these are files in /test or
/benches (not normal tests). This commit includes a workaround,
importing them in lib.rs (gated by a feature flag). I think the
trade-off of better dependency management is worth it!
* refactor: Change catalog configuration so it is entirely dsn based / support end to end testing without postgres
Restores code from https://github.com/influxdata/influxdb_iox/pull/7708
Revert "revert: PR #7708"
This reverts commit c9cfe05f8d.
* fix: merge
* fix: Update new test
This PR increases the batch/page size of list operations in the gc 10x
to 10,000; it introduces a new cli config for the sleep interval between
batches. Previously a single sleep config was used between batches and
between entirely new list operations. This PR also moves to debug some
noisy logging.
* tag to #7689
Still insert them into the database and associate them with namespaces,
but don't ever query them back out.
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
GC object store lister adjustments: the previous take wasn't being
respected because of where it was; a chunked list is now used instead.
The lister specific errors will now display the source error too.
* follow up to #7562
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* chore(garbage collector): backoff first connect to objectstore
This pr replaces the initial sleep added in #7562 with expential backoff
retry of connecting to objectstore. This avoids the issue of waiting the
configured sleep which can be quite long in a world where the service is
getting redeployed often.
* for #7562
* chore: rewrite lister::perform based on pr feedback
This commit redoes perform as a do..while loop, putting the call to list
from the object store at the top of the loop so the infinite backoff
retry and be used for each loop iteration - it might fail on more than
the first time!
There are 3 selects as there are 3 wait stages and each needs to check
for shutdown: os list, processing the list, and sleeping on the poll
interval.
* chore: hoist cancellation check higher; limit listing to 1000 files
Responding to PR feedback.
* chore: add error info message
* chore: make build. :|
* chore: linter
* Don't print an info message for each deleted file. This can be 1000s
at a time and many more in total.
* Even if there are more files to delete, sleep the interval to decrease
catalog load.
* part of influxdata/idpe#17451
* fix: Garbage collector hangs indefinitely on shutdown
* style(garbage_collector): conform to linter and fmt
---------
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
`time` 0.1 suffers from [RUSTSEC-2020-0071] and many upstream crates
have tried to remove it for years. The last dependency is
1. `chrono-english`
2. `chrono` (default features)
3. `chrono` (oldtime)
4. `time` 0.1
`chrono-english` doesn't seem to be super well maintained, but I
couldn't find a nice replacement for it. Luckily the master branch of
`chrono-english` is already fixed, so let's just directly use that.
[RUSTSEC-2020-0071]: https://rustsec.org/advisories/RUSTSEC-2020-0071
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
This PR makes 3 improvements.
* It adds the configured sleep interval at the start of the object store
checker to avoid issues with making a remote list immediately at
startup. We see issues with the s3 api.
* the --dry-run flag was stopping deletes of objects from object store,
but the retention flagger was still making updates to the catalog.
These writes to the catalog are surprising when the --dry-run flag is
provided. Now, with --dry-run the catalog is not updated. The logging
instead says how many records would be updated because of retention.
* It decreases logging in should_delete of the checker as it will be
extremely noisey when reporting files it skips. An internal
environment has 3.8 million parquet files, most of which would be
skipped.
* related to #7363
* fixesinfluxdata/idpe#17451
* feat: introduce a new way of max_sequence_number for ingester, compactor and querier
* chore: cleanup
* feat: new column max_l0_created_at to order files for deduplication
* chore: cleanup
* chore: debug info for chnaging cpu.parquet
* fix: update test parquet file
Co-authored-by: Marco Neumann <marco@crepererum.net>
* feat: create namespace API call in router
Co-authored-by: Nga Tran <nga-tran@live.com>
* chore: treat retention as ns except in CLI
* fix: overflow in nanosecond calc
* fix: retention test after changing it from hours to ns
* chore: comment clarification in cli; better response type for error in ns API
* fix: correct some rebase mistakes
* chore: merge namespace create & create_with_retention; renamed ns create test helper fn & const
* fix: ns autocreation test was wrong after rebase
* fix: mem catalog has default 1hr retention, accidently removed in rebase
* chore: remove mem catalogs default 1hr retention; make it settable in sets & router
Co-authored-by: Luke Bond <luke.n.bond@gmail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
* feat: deletion flagging in GC based on retention policy
* chore: typo in comment
* fix: only soft delete parquet files that aren't yet soft deleted
* fix: guard against flakiness in catalog test
* chore: some better tests for parquet file delete flagging
Co-authored-by: Nga Tran <nga-tran@live.com>