Commit Graph

500 Commits (7202dddab6d9ede46c74664c0675fe349da2fd13)

Author SHA1 Message Date
dependabot[bot] 933493fab3
chore(deps): Bump object_store from 0.5.0 to 0.5.1
Bumps [object_store](https://github.com/apache/arrow-rs) from 0.5.0 to 0.5.1.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG-old.md)
- [Commits](https://github.com/apache/arrow-rs/compare/object_store_0.5.0...object_store_0.5.1)

---
updated-dependencies:
- dependency-name: object_store
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-10-11 01:19:10 +00:00
Jake Goulding 7389fbe528
feat: Write data to Parquet files from the data generator 2022-09-29 16:03:33 -04:00
dependabot[bot] 227dde1dfc
chore(deps): Bump thiserror from 1.0.36 to 1.0.37 (#5753)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.36 to 1.0.37.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.36...1.0.37)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-29 10:37:14 +00:00
Andrew Lamb 66dbb9541f
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to 23.0.0, `thrift` to 0.16.0 (#5694)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight`  to 23.0.0

* chore: Update thrift / remove parquet_format

* fix: Update APIs

* chore: Update lock + Run cargo hakari tasks

* fix: use patched version of arrow-rs to work around https://github.com/apache/arrow-rs/issues/2779

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-27 12:50:54 +00:00
dependabot[bot] b1740f45d6
chore(deps): Bump thiserror from 1.0.35 to 1.0.36 (#5737)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.35 to 1.0.36.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.35...1.0.36)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-26 14:44:36 +00:00
Andrew Lamb 65f1550126
feat: Implement `debug parquet_to_lp` command to convert parquet to line protocol (#5734)
* feat: add `influxdb_iox debug parquet_to_lp` command

* chore: Run cargo hakari tasks

* fix: update command description

* fix: remove unecessary Result import

* fix: Apply suggestions from code review

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
2022-09-26 14:17:27 +00:00
dependabot[bot] 0cc29300ce
chore(deps): Bump pbjson-types from 0.4.0 to 0.5.0 (#5703)
Bumps [pbjson-types](https://github.com/influxdata/pbjson) from 0.4.0 to 0.5.0.
- [Release notes](https://github.com/influxdata/pbjson/releases)
- [Commits](https://github.com/influxdata/pbjson/commits)

---
updated-dependencies:
- dependency-name: pbjson-types
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-21 09:44:26 +00:00
Andrew Lamb ea51feadf4
chore: Improve debug logging when parquet files are created (#5699)
* chore: Improve debug logging when parquet files are created

* fix: add duration for encoding parqut

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-20 20:43:34 +00:00
Carol (Nichols || Goulding) 414b0f02ca
fix: Use time helper methods in more places 2022-09-19 13:24:08 -04:00
dependabot[bot] b4a25fdb0e
chore(deps): Bump thiserror from 1.0.34 to 1.0.35 (#5629)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.34 to 1.0.35.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.34...1.0.35)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-14 12:54:12 +00:00
Andrew Lamb 9d63edb299
chore: remove unecessary batch check (#5620)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-13 12:41:59 +00:00
Andrew Lamb f86d3e31da
chore: Update datafusion + object_store (#5619)
* chore: Update datafusion pin

* chore: update object_store to 0.5.0

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-13 12:34:54 +00:00
Andrew Lamb 1fd31ee3bf
chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0 (#5591)
* chore: Update datafusion / `arrow` / `arrow-flight` / `parquet` to version 22.0.0

* fix: enable dynamic comparison flag

* chore: derive Eq for clippy

* chore: update explain plans

* chore: Update sizes for ReadBuffer encoding

* chore: update more tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-12 17:45:03 +00:00
Nga Tran e56ebb5e31
fix: remove unnecessary warning (#5594)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-09 14:15:54 +00:00
Marco Neumann adeacf416c
ci: fix (#5569)
* ci: use same feature set in `build_dev` and `build_release`

* ci: also enable unstable tokio for `build_dev`

* chore: update tokio to 1.21 (to fix console-subscriber 0.1.8

* fix: "must use"
2022-09-06 14:13:28 +00:00
dependabot[bot] 9f0b0328f7
chore(deps): Bump thiserror from 1.0.33 to 1.0.34 (#5556)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.33 to 1.0.34.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.33...1.0.34)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-09-06 09:18:41 +00:00
Marco Neumann 064f0e9b29
refactor: use DataFusion to read parquet files (#5531)
Remove our own hand-rolled logic and let DataFusion read the parquet
files.

As a bonus, this now supports predicate pushdown to the deserialization
step, so we can use parquets as in in-mem buffer.

Note that this currently uses some "nested" DataFusion hack due to the
way the `QueryChunk` interface works. Midterm I'll change the interface
so that the `ParquetExec` nodes are directly visible to DataFusion
instead of some opaque `SendableRecordBatchStream`.
2022-09-05 09:25:04 +00:00
dependabot[bot] 00ed79ff1b
chore(deps): Bump thiserror from 1.0.32 to 1.0.33 (#5524)
Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.32 to 1.0.33.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.32...1.0.33)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-09-01 09:11:31 +00:00
Andrew Lamb 6669d85fb4
chore: Update datafusion + arrow/parquet to `21.0.0` (#5519)
* chore: Update arrow/arrow-flight/parquet to 21.0.0

* chore: Update datafusion pin

* chore: Fix arrow update script

* chore: Update Cargo.lock

* chore: Update for new API
2022-08-31 13:30:47 +00:00
Dom Dwyer 7698264768 refactor: raise error for no rows in parquet file
Previously when attempting to serialise a stream of one or more
RecordBatch containing no rows (resulting in an empty file), the parquet
serialisation code would panic.

This changes the code path to raise an error instead, to support the
compactor making multiple splits at once, which may overlap a single
chunk:

                  ────────────── Time ────────────▶

                          │                │
                  ┌█████──────────────────────█████┐
                  │█████  │    Chunk 1     │  █████│
                  └█████──────────────────────█████┘
                          │                │

                          │                │

                      Split T1         Split T2

In the example above, the chunk has an unusual distribution of write
timestamps over the time range it covers, with all data having a
timestamp before T1, or after T2. When a running a SplitExec to slice
this chunk at T1 and T2, the middle of the resulting 3 subsets will
contain no rows. Because we store only the min/max timestamps in the
chunk statistics, it is unfortunately impossible to prune one of these
split points from the plan ahead of time.
2022-08-30 14:52:31 +02:00
Carol (Nichols || Goulding) fe9c474620
fix: rustfmt 2022-08-29 14:06:45 -04:00
Carol (Nichols || Goulding) 240946d8f5
fix: Deprecate proto sequencer_id fields; add shard_id fields 2022-08-29 14:06:44 -04:00
Jake Goulding 4abf21c724
refactor: Rename Sequencer (and its entourage) to Shard 2022-08-29 14:06:43 -04:00
Andrew Lamb 7f0ae53d6f
chore: Update to (almost) released object_store 0.4.0 (#5419)
* chore: update object_store

* chore: update hakari config

* chore: Run cargo hakari tasks

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-17 13:44:48 +00:00
Carol (Nichols || Goulding) b982bdaf2f
fix: Derive Eq when we derive PartialEq and members can derive Eq
Allow this in generated code that we don't control, though.

Recommended by clippy now. https://rust-lang.github.io/rust-clippy/master/index.html#derive_partial_eq_without_eq
2022-08-11 15:04:06 -04:00
Andrew Lamb 16ddc5efc6
chore: Update datafusion / arrow/parquet/arrow-flight and prost/tonic ecosystem (#5360)
* chore: Update datafusion and arrow

* chore: Update Cargo.lock

* chore: update to Decimal128

* chore: Update tonic/prost/pbjson/etc

* chore: Run cargo hakari tasks

* fix: doctest in generated types

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
2022-08-09 17:30:44 +00:00
Marco Neumann e24cecd926
fix: buffer allocation while reading parquet files (#5312)
Work around https://github.com/apache/arrow-rs/issues/2321 by limiting
reader batch size to number of rows (based on file-level metadata).

Fixes https://github.com/influxdata/conductor/issues/1103 .
2022-08-04 14:21:05 +00:00
dependabot[bot] 55e1e2ec2b
chore(deps): Bump thiserror from 1.0.31 to 1.0.32 (#5294)
* chore(deps): Bump thiserror from 1.0.31 to 1.0.32

Bumps [thiserror](https://github.com/dtolnay/thiserror) from 1.0.31 to 1.0.32.
- [Release notes](https://github.com/dtolnay/thiserror/releases)
- [Commits](https://github.com/dtolnay/thiserror/compare/1.0.31...1.0.32)

---
updated-dependencies:
- dependency-name: thiserror
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: Run cargo hakari tasks

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-03 16:20:36 +00:00
Nga Tran ee151c8b41
fix: make batch size back to 1024 to see if the OOM in the compactor go away (#5289)
* fix: make batch size back to 1024 to see if the OOM in the compactor go away

* fix: address review comments

* chore: Apply suggestions from code review

Co-authored-by: Marco Neumann <marco@crepererum.net>

* fix: import needed constant

Co-authored-by: Marco Neumann <marco@crepererum.net>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-03 15:13:04 +00:00
Nga Tran 8f1b6f2465
chore: reduce log info (#5254)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-01 16:00:34 +00:00
Marco Neumann 3ae89de324
refactor: increase batch size when reading parquet (#5250)
* refactor: increase batch size when reading parquet

This reduced our overhead when reading parquet files quite a lot.

In some internal benchmark, this reduces the size to perform a single
series aggregation of a rather large series with cold caches from 58s to
48s for cold caches. No real difference could be measured for warm
caches (~21ms for both).

This should also help the compactor since the record batches should be
larger.

* refactor: ensure that parquet row group size is in-sync

Ensure that we use the same row group size for reading and writing
parquet files. This is the same value as upstream currently uses as a
default, but let's make sure we don't diverge from that:

3032a521c9/parquet/src/file/properties.rs (L65)

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-08-01 10:31:26 +00:00
Andrew Lamb 9215a534d0
chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0` (#5229)
* chore: Update datafusion and `arrow`/`parquet`/`arrow-flight` to `19.0.0`

* chore: Run cargo hakari tasks

* fix: Update for API changes

* fix: clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-28 08:10:47 +00:00
Nga Tran d05f383a98
refactor: reduce compacting size and compacted file size to prevent compactor from waiting for reading a large file forever (#5206)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-25 20:08:11 +00:00
Nga Tran 69cb3f2b19
refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151)
* refactor: remove min_sequnce_number

* fix: typos

* fix: remove min_sequencer_number from new files from merging main

* fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier

* test: add test_compactor_collision back but modify the input to make it work woth new changes

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-21 13:51:54 +00:00
dependabot[bot] 278a7f91af
chore(deps): Bump bytes from 1.1.0 to 1.2.0 (#5156)
Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.1.0 to 1.2.0.
- [Release notes](https://github.com/tokio-rs/bytes/releases)
- [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md)
- [Commits](https://github.com/tokio-rs/bytes/compare/v1.1.0...v1.2.0)

---
updated-dependencies:
- dependency-name: bytes
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-20 10:00:08 +00:00
Andrew Lamb e2d871b00b
chore: Update datafusion and arrow/parquet/arrow-flight to `18.0.0` (#5079)
* chore: Update datafusion to 10.0.0, arrow/parquet/arrow-flight to 18

* chore: Run cargo hakari tasks

* fix: update cargo pin

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-18 15:01:03 +00:00
Jake Goulding 635f535e0e refactor: replace level_2 with level_1 2022-07-16 21:49:45 -04:00
dependabot[bot] 9b67de2f43
chore(deps): Bump tokio from 1.19.2 to 1.20.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.19.2 to 1.20.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.19.2...tokio-1.20.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-07-14 01:21:43 +00:00
Carol (Nichols || Goulding) 61c023139b
refactor: Switch compaction levels to an enum with values rather than separate consts
Bonuses:

- Type checking
- Validation
- Less casting
- Exhaustiveness checking
- Less use of the numerical value
2022-07-13 11:30:36 -04:00
Marco Neumann b1b2cb5d4a
feat: load read buffer on demand (#5091)
* refactor: extract `select_schema`

* refactor: improve `InternalLostInputField` error message

* test: improve SQL runner output

* feat: load read buffer on demand

Closes #5032.

* refactor: move `[Half]OwnedSelection` to `schema` crate`
2022-07-13 08:51:40 +00:00
Carol (Nichols || Goulding) 80b6c5c82f
fix: Correct typo in constant name so searching for COMPACTION_LEVEL returns all (#5077) 2022-07-08 16:31:52 +00:00
Andrew Lamb c46e1c6347
chore: Update datafusion + arrow/parquet/arrow-flight to `17.0.0` (#5021)
* fix: correct nullability declaration of system tables

* chore: Update datafusion and arrow/parquet/arrow-flight

* chore: Run cargo hakari tasks

* fix: Update tests

* fix: Update tests

* fix: predicate pruning

* fix: add some tests

* fix: query_functions

* fix: fix read_buffer test

* fix: fix clippy

Co-authored-by: CircleCI[bot] <circleci@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-07 19:22:15 +00:00
Marco Neumann 16bd3e67c0
refactor: unify `apply_predicate_to_metadata` (#5030)
Instead of using some hand-rolled timestamp-based logic (or just
"unknown") all over the place, just use logic introduced in #5017.

This requires slightly improved table summaries within the querier that
at least has min/max for the timestamp column. For that, the former
`IngesterChunk`-specific `calculate_summary` method was extended to
`create_basic_summary` to include that data and is now also used by
`QuerierParquetChunk`.

Note: `QuerierRBChunk` already has detailled metrics that are provided
by the read buffer implementation.

Should we ever need even better pruning for `QuerierParquetChunk` (or
`IngesterChunk`) then we _only_ need add extra data to the table
summaries.

Closes #4976.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-07-05 12:51:59 +00:00
Marco Neumann 016dd93d9c
feat: filter chunks before requesting read buffers (#4996)
Fixes #4976.
2022-07-01 08:59:07 +00:00
Marco Neumann be53716e4d
refactor: use IDs for `parquet_file.column_set` (#4965)
* feat: `ColumnRepo::list_by_table_id`

* refactor: use IDs for `parquet_file.column_set`

Closes #4959.

* refactor: introduce `TableSchema::column_id_map`
2022-06-30 15:08:41 +00:00
Marco Neumann 05189fdb00
fix: ensure panic is propagated throw `AdapterStream` (#4980)
Prior to this change background tasks that we feed into `AdapterStream`
can panic but that would just end the stream without any user-visible
error (except for the panic message on stdout/stderr).

This was found while developing #4964. I have proposed another fix in #4966
but found that I actually developed an existing solution a 2nd time:
`watch_task`. But I also see a major issue with the existing API: one
can create `AdapterStream` with ordinary tokio tasks that are not
watched at all, leaving the burden to the implementor to check for that
(and actually we forgot that in `parquet_file`).

So this change takes a slightly different approach:
The `AdapterStream` does NOT accept ordinary join handles any longer but
requires that you pass a "watched task". The newly introduced
`WatchedTask` does the same as we did manually before: wrapping a future
into a tokio task, watch it and wrap the watcher into a task.

It is now way more difficult to do anything stupid (sure you can still
mix up the tasks and the channels, but we need at least some flexibility
here to allow for "split" and potential future fan-in/out constructs).

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-30 07:57:58 +00:00
Raphael Taylor-Davies 835e1c91c7
chore: update object_store to 0.3.0 (#4707)
* chore: update object_store to 0.3.0

* chore: review feedback

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-06-29 21:44:03 +00:00
Nga Tran cfcc4b8426
refactor: change level 1 to level 2 preparing for next design changes (#4954)
* refactor: change level 1 to level 2 preparing for next design changes

* fix: make level-2 consistent everywhere

* chore: remove unused comments

* refactor: change all the name level_1 to level_2 to completely replace 1 with 2 to amke everything consistent

* chore: add correspinding constants for the comapction levels in the comments

Co-authored-by: Dom <dom@itsallbroken.com>
2022-06-29 14:08:58 +00:00
Marco Neumann 9b66d02229
fix: parquet reader sort order (#4964) 2022-06-28 15:28:38 +00:00
Marco Neumann 215f297162
refactor: parquet file metadata from catalog (#4949)
* refactor: remove `ParquetFileWithMetadata`

* refactor: remove `ParquetFileRepo::parquet_metadata`

* refactor: parquet file metadata from catalog

Closes #4124.
2022-06-27 15:38:39 +00:00