Commit Graph

405 Commits (40a69c6e29cf63f4e558af980b095a0a0c1f1c63)

Author SHA1 Message Date
dependabot[bot] b63f920d4c
chore(deps): Bump parquet from 9.0.2 to 9.1.0 (#3828)
* chore(deps): Bump parquet from 9.0.2 to 9.1.0

Bumps [parquet](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0)

---
updated-dependencies:
- dependency-name: parquet
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* chore: update chunk size test

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Raphael Taylor-Davies <r.taylordavies@googlemail.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-23 11:25:15 +00:00
dependabot[bot] 3b7d31c88a
chore(deps): Bump arrow from 9.0.2 to 9.1.0 (#3826)
Bumps [arrow](https://github.com/apache/arrow-rs) from 9.0.2 to 9.1.0.
- [Release notes](https://github.com/apache/arrow-rs/releases)
- [Changelog](https://github.com/apache/arrow-rs/blob/master/CHANGELOG.md)
- [Commits](https://github.com/apache/arrow-rs/compare/9.0.2...9.1.0)

---
updated-dependencies:
- dependency-name: arrow
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-02-23 09:25:46 +00:00
dependabot[bot] ad3868ed7c
chore(deps): Bump tokio from 1.16.1 to 1.17.0 (#3814)
* chore(deps): Bump tokio from 1.16.1 to 1.17.0

Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.16.1 to 1.17.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.16.1...tokio-1.17.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>

* build: update workspace-hack

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Dom Dwyer <dom@itsallbroken.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-22 16:27:43 +00:00
Andrew Lamb a30803e692
chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 (#3733)
* chore: Update datafusion

* chore: Update arrow

* fix: missing updates

* chore: Update cargo.lock

* fix: update for smaller parquet size

* fix: update test for smaller parquet files

* test: ensure parquet_file tests write multiple row groups

* fix: update callsite

* fix: Update for tests

* fix: harkari

* fix: use IoxObjectStore::existing

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-15 12:10:24 +00:00
Carol (Nichols || Goulding) 73828323ac
feat: Ingester Flight gRPC API (#3623)
* feat: Add a way to run ingester with an in-memory catalog from the CLI

If you set the --catalog-dsn string to "mem", rather than using that as
a Postgres connection URL, create an in-memory catalog.

Planning on using this in tests, so not documenting.

* fix: Set default topic to the same value as SHARED_KAFKA_TOPIC

Namely, both should use an underscore. I don't think there's a way to
directly share these values between a constant and an annotation.

* feat: Add a flight API (handshake only) to ingester

* fix: Create partitions if using file-based write buffer

* fix: Change the server fixture to handle ingester server type

For now, the ingester doesn't implement the deployment API. Not sure if
it should or not.

* feat: Start implementing ingester do_get, namely decoding the query

Skip serialization of the predicate for the moment.

* refactor: Rename ingest protos to ingester to match crate name

* refactor: Rename QueryResults to QueryData

* feat: Move ingester flight client to new querier crate

* fix: Off by one error, different starting indexes in sequencers

* fix: Create new CLI argument to pick the catalog type

* fix: Create a CLI option to set the number of topics to auto-create in the write buffer

* fix: Check the arrow flight service's health to tell that the ingester gRPC is up

* fix: Set postgres as the default catalog type

* fix: Return an error rather than panicking if CLI args aren't right
2022-02-09 19:07:44 +00:00
Carol (Nichols || Goulding) 2e30483f1f
refactor: Remove predicate module from predicate crate (#3648)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:54:07 +00:00
Nga Tran 17fbeaaade
feat: insert the persisted info into the catalog in one transaction (#3636)
* feat: add ProcessedTombstoneRepo

* feat: add function add_parquet_file_with_tombstones

* fix: remove unecessary use

* feat: handling transaction when adding parquet file and its processed tombstones

* feat: tests update catalog for parquet file and processed tombstones

* fix: make add parquet file & its processed tombstones fully transactional

* chore: cleanup

* test: add integration tests for new catalog update functions

* chore: remove catalog_update.rs

* chore: cleanup

* fix: assert the right values

* fix: create unique namespace

* fix: support non transaction create_many

* test: remove tests that do not work in a transaction

* fix: one more case with unique namespace

* chore: more verification around for better understanding why certain tests fail

* fix: compare difference rather than absolute becasue the DB already has data

* fix: fix the argument provided to SQL

* fix: return non-empty processed tombstones

* fix: insert the right parquet file

* chore: remove unsed file

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:44:15 +00:00
Carol (Nichols || Goulding) 62a2ad289b
feat: Implement deserializing IoxMetadata from protobuf (#3589)
Fixes #3587.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 16:05:21 +00:00
Marco Neumann 22778a3a80
chore: upgrade rskafka and parking_lot (#3592) 2022-02-01 11:50:42 +00:00
Carol (Nichols || Goulding) 093d5acfd4
fix: Unify temporary multiple definitions of IoxMetadata 2022-01-31 10:48:29 -05:00
Carol (Nichols || Goulding) 8f81ce5501
refactor: Share parquet_file::storage code between new and old metadata 2022-01-31 10:36:33 -05:00
Carol (Nichols || Goulding) bf89162fa5
refactor: Move IoxMetadata to parquet_file 2022-01-31 10:36:33 -05:00
Carol (Nichols || Goulding) 0f72a881ef
refactor: Rename Rust struct parquet_file::IoxMetadata to be IoxMetadataOld 2022-01-31 10:36:33 -05:00
Carol (Nichols || Goulding) 1b298bb5bd
refactor: Alias the old proto definitions to make clearer the new ones coming in 2022-01-31 10:36:33 -05:00
Dom 32d7c4cbfe
refactor: remove InfluxColumnType::IOx (#3565)
* refactor: remove InfluxColumnType::IOx

Remove unused column variant - see #3554 for context.

* refactor: reserve SEMANTIC_TYPE_IOX name in proto

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 21:15:36 +00:00
Andrew Lamb 5488c257d1
chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 (#3517)
* chore: Update datafusion

* chore: update to arrow 8

* fix: update to use new DataFusion APIs

* fix: update case for sortedness

* fix: cargo hakari
2022-01-27 13:33:27 +00:00
Andrew Lamb dd23056efd
chore: update datafusion, arrow, prost, tonic, pbjson, etc (#3455)
* chore: update datafusion, arrow, prost, tonic, etc

* fix: update pprof as well

* chore: update hakari

* fix: update pbjson

* chore: update heappy

* fix: hakari

* fix: workaround https://github.com/influxdata/influxdb_iox/issues/3458

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-13 17:07:15 +00:00
Andrew Lamb cdf5c21cd4
fix: Fix max timestamp value comparison in chunk metadata (#3453)
* fix: Fix max timestamp value comparison in chunk metadata

* refactor: rename contains to overlaps

Co-authored-by: Edd Robinson <me@edd.io>
2022-01-13 16:58:30 +00:00
Raphael Taylor-Davies c5cf03511c
fix: parquet column count statistics (#2124) (#3444)
* fix: parquet metadata total_count (#2124)

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-11 21:56:24 +00:00
Marco Neumann f3f6f335a9
chore: upgrade to snafu 0.7 (#3440) 2022-01-11 19:22:36 +00:00
Marco Neumann 37bb7f2120 chore: `cargo update`
dependabot currently doesn't work due to
https://github.com/dependabot/dependabot-core/issues/4574

Excluded `quote` due to
https://github.com/dtolnay/quote/issues/204
2022-01-11 14:57:51 +01:00
Nga Tran ec8644a39a refactor: return clearer error message 2021-12-07 12:24:28 -05:00
Nga Tran 561c5ed8e7 refactor: make checking no data happen during reading inout stream 2021-12-07 12:03:41 -05:00
Nga Tran c992c82582 chore: Merge branch 'main' into ntran/compact_os_tests 2021-12-07 11:08:12 -05:00
Raphael Taylor-Davies 5fdaa5b4ab
chore: don't panic with invalid parquet (#3309)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-12-06 21:15:35 +00:00
Carol (Nichols || Goulding) 7499eac067
fix: Disable uuid serde feature; we're not actually serializing any UUIDs
Connects to #3117.
2021-12-06 09:37:31 -05:00
Carol (Nichols || Goulding) 02c297e850
fix: Always specify the parking_lot feature of tokio to get potential perf boost 2021-12-06 09:37:15 -05:00
Carol (Nichols || Goulding) 0b24b3c227
fix: Use a consistent version specifier when depending on the futures crate 2021-12-06 09:37:12 -05:00
Raphael Taylor-Davies bca561366b
feat: don't copy parquet files out of disk object store (#3282) (#3293)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-12-05 16:31:40 +00:00
Raphael Taylor-Davies 11067bfe3f
feat: simplify parquet reader (#3282) (#3291)
* feat: simplify parquet reader (#3282)

* chore: add back log line
2021-12-03 23:21:58 +00:00
Nga Tran 86f9fe0bcb refactor: no longer need to create and test no-row-groups parquet files 2021-12-03 15:14:04 -05:00
Nga Tran 152281e428 fix: Capture the right 'no data' while parquet has no data 2021-12-03 12:19:48 -05:00
kodiakhq[bot] 2857b6a990
Merge branch 'main' into er/feat/load_chunk_cli 2021-12-02 20:20:56 +00:00
Edd Robinson b4ea9887ba refactor: error name 2021-12-02 20:14:02 +00:00
Carol (Nichols || Goulding) 5d0fd1c603
fix: Allow dead code on fields that are now detected as never read 2021-12-02 11:52:01 -05:00
Edd Robinson 88aedc556e feat: add FromStr implementation 2021-12-02 12:59:52 +00:00
Nga Tran bf74608dc8 docs: not persist of the input stream is empty 2021-12-01 17:53:19 -05:00
Nga Tran f085af034e
refactor: not persist empty chunk resulting from deleting & deduplicating (#3274)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-12-01 20:57:30 +00:00
Nga Tran f53cdca010 feat: handling empty compacted stream 2021-11-30 18:13:36 -05:00
Raphael Taylor-Davies 197634ed50
feat: reload chunk back into read buffer (#3209) (#3216)
* feat: reload chunk back into read buffer (#3209)

* chore: fix logical conflict

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-29 11:34:55 +00:00
kodiakhq[bot] d16a7759ca
Merge branch 'main' into cn/workspace-hack 2021-11-22 17:05:31 +00:00
Raphael Taylor-Davies 73d60539ad
refactor: use ChunkGenerator in parquet_catalog (#2209) (#3167)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-22 10:29:33 +00:00
Carol (Nichols || Goulding) 9fd4a560f5
feat: Results of running cargo hakari manage-deps 2021-11-19 09:21:57 -05:00
Raphael Taylor-Davies ca4e0ad13b
refactor: add parquet chunk generator (#2209) (#3163)
* refactor: add parquet chunk generator (#2209)

* fix: tests

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-19 12:35:18 +00:00
Carol (Nichols || Goulding) c8d80e5c28
fix: Change database paths to be under /dbs/ instead of under /[server id]/ 2021-11-05 10:14:06 -04:00
Andrew Lamb 1902c4f8a9
chore: Update DataFusion (#3012)
* chore: Update DataFusion

* fix: restore Cargo.log

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-11-02 18:06:21 +00:00
Marco Neumann 4c9570b519 refactor: move `catalog` protobuf to `preserved_catalog`
This makes it clearer what's going since the contained messages are
only for the preserved part, not the in-mem catalog and its management.
2021-11-01 18:07:25 +01:00
dependabot[bot] c540b40f05
chore(deps): bump tokio from 1.12.0 to 1.13.0
Bumps [tokio](https://github.com/tokio-rs/tokio) from 1.12.0 to 1.13.0.
- [Release notes](https://github.com/tokio-rs/tokio/releases)
- [Commits](https://github.com/tokio-rs/tokio/compare/tokio-1.12.0...tokio-1.13.0)

---
updated-dependencies:
- dependency-name: tokio
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2021-11-01 11:21:59 +00:00
Carol (Nichols || Goulding) 990f768cda
fix: Assign a UUID when creating a database 2021-10-28 13:20:28 -04:00
Carol (Nichols || Goulding) 8198c1ff2a
refactor: Rename IoxObjectStore constructors to better match what server does with Databases 2021-10-28 13:20:27 -04:00
Marco Neumann bc7244c48e chore: use Rust edition 2021 2021-10-25 10:58:20 +02:00
Andrew Lamb a82dc6f5f0
chore: Update datafusion + arrow (#2903)
* chore: Update datafusion to latest, arrow to 6.0.0

* fix: Update tests

* fix: bubble internal error

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-19 17:14:08 +00:00
Marco Neumann d8f35d8ee9 chore: remove unused `parquet_file` => `chrono` dep 2021-10-19 14:45:56 +02:00
Marco Neumann 28195b9c0c chore: new `parquet_catalog` crate 2021-10-14 14:34:59 +02:00
Andrew Lamb 0568452a0c
chore: Update datafusion (#2838)
* chore: update datafusion version

* refactor: Update to use new datafusion apis

* fix: do not upgrade other packages
2021-10-13 20:51:19 +00:00
Marco Neumann 1523e0edcd refactor: clean up preserved catalog interface
1. Remove `new_empty` logic. It's a leftover from the time when the
   `PreservedCatalog` owned the in-memory catalog.
2. Make `db_name` a part of the `PreservedCatalogConfig`.
2021-10-13 13:58:11 +02:00
Raphael Taylor-Davies 8414e6edbb
feat: migrate preserved catalog to TimeProvider (#2722) (#2808)
* feat: migrate preserved catalog to TimeProvider (#2722)

* fix: deterministic catalog prune tests

* fix: failing test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-12 14:43:05 +00:00
Raphael Taylor-Davies 3dfe400e6b
feat: migrate write path to TimeProvider (#2722) (#2807) 2021-10-12 12:09:08 +00:00
Raphael Taylor-Davies b39e01f7ba
feat: migrate PersistenceWindows to TimeProvider (#2722) (#2798) 2021-10-11 20:40:00 +00:00
Raphael Taylor-Davies 06c2c23322
refactor: create PreservedCatalogConfig struct (#2793)
* refactor: create PreservedCatalogConfig struct

* chore: fmt

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-10-11 15:43:05 +00:00
Carol (Nichols || Goulding) 5da2f7b1b0
Merge branch 'main' into cn/less-database-name 2021-10-11 10:35:42 -04:00
Raphael Taylor-Davies afe34751e7
refactor: split out schema crate (#2781)
* refactor: split out schema crate

* chore: fix doc
2021-10-11 09:45:08 +00:00
Carol (Nichols || Goulding) 8407735e00 fix: Pass the database name into PreservedCatalog 2021-10-08 15:25:10 -04:00
Carol (Nichols || Goulding) 276aef69c9 refactor: Move PreservedCatalog test helper functions to test helpers and use them more 2021-10-08 15:25:10 -04:00
Carol (Nichols || Goulding) 3aff4fcb07 refactor: Extract test helper functions for common catalog operations
This will make the next change easier, and I think it makes the tests
easier to read.
2021-10-08 15:25:10 -04:00
kodiakhq[bot] 559a7e0221
Merge branch 'main' into cn/chunk-addr-smaller 2021-10-08 17:26:20 +00:00
Carol (Nichols || Goulding) fbe76935f4 fix: Remove some calls to iox_object_store.database_name 2021-10-08 09:50:14 -04:00
Marco Neumann 64bda1fc08 feat: improve `Debug`/`Display` for test `ChunkId`s 2021-10-08 13:55:56 +02:00
Marco Neumann d3de6bb6e4 refactor: `max_persisted_timestamp` => `flush_timestamp`
There might be data left before this timestamp that wasn't persisted
(e.g. incoming data while the persistence was running).
2021-10-08 12:36:23 +02:00
Marco Neumann 63a932fa37 refactor: "min unpersisted ts" => "max persisted ts"
Store the "maximum persisted timestamp" instead of the "minimum
unpersisted timestamp". This avoids the need to calculate the next
timestamp from the current one (which was done via "max TS + 1ns").

The old calculation was prone to overflow panics. Since the
timestamps in this calculation originate from user-provided data (and
not the wall clock), this was an easy DoS vector that could be triggered
via the following line protocol:

```text
table_1 foo=1 <i64::MAX>
```

which is

```text
table_1 foo=1 9223372036854775807
```

Bonus points: the timestamp persisted in the partition
checkpoints is now the very same that was used by the split query during
persistence. Consistence FTW!

Fixes #2225.
2021-10-08 11:52:49 +02:00
kodiakhq[bot] 7d6be3f500
Merge branch 'main' into crepererum/issue2748 2021-10-07 09:04:18 +00:00
Marco Neumann 63d74be490 refactor: make `ChunkId` a UUID 2021-10-07 10:23:27 +02:00
Marco Neumann 2a52fd90d9 fix: transaction pruning logic for "nothing to do" 2021-10-07 10:14:42 +02:00
kodiakhq[bot] d72a494198
Merge branch 'main' into crepererum/in_mem_expr_part5 2021-10-05 16:20:24 +00:00
Marco Neumann b8aa4c33ce refactor: use protobuf bytes for transaction UUIDs 2021-10-05 12:27:48 +02:00
Marco Neumann bb7a27e5ed refactor: use proper sets during delete predicate collection
We no longer need hacky pointer tricks to de-duplicate delete predicates
when collecting them for catalog checkpoints. This was once required
when the delete predicates didn't implement `Eq` and `Hash` but now it's
all way easier.
2021-10-05 10:37:34 +02:00
Marco Neumann 28ccf2a8c3 refactor: `TransactionHandle::delete_predicate` cannot fail 2021-10-05 09:41:46 +02:00
Marco Neumann 10c1a72402 refactor: remove unused fields from `DeletePredicate` 2021-10-05 09:29:24 +02:00
Marco Neumann 97881079e8 refactor: make `ChunkOrder` non-zero
This will make it easier to handle missing values.

Helps with #2633.
2021-10-04 17:49:12 +02:00
Marco Neumann 75ac6e8646 refactor: make `DeletePredicate::range` non-optional 2021-10-04 16:36:20 +02:00
Marco Neumann d1835a3eee fix: doc links 2021-10-04 16:36:20 +02:00
Marco Neumann 5a5a929b9e refactor: introduce `DeletePredicate`
`DeletePredicate` is a simpler version of `Predicate` that is based on
IOx `DeleteExpr` instead of the full-blown DataFusion `Expr`. This will
allow us to do a couple of things (in follow-up changes):

- Order and de-duplicate delete predicates
- Normalize predicates
- Infallible serialization
- Smaller memory footprint

Note that this change only affects delete expressions. Query expressions
that are supported via the API are not changed. The query subsystem also
still uses the full-featured expressions/predicates (delete
expressions/predicates are converted to the more powerful DataFusion
version on-the-fly).
2021-10-04 16:36:20 +02:00
Edd Robinson e72f7e958c test: update expected results 2021-10-04 12:20:21 +01:00
Andrew Lamb 7316f3407a
fix: Reduce log noise when no files are deleted (#2671)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-09-30 08:55:30 +00:00
Carol (Nichols || Goulding) 92583aee82 fix: Remove streaming API since we're not streaming anyway 2021-09-29 08:19:32 -04:00
Carol (Nichols || Goulding) d05528bcfd refactor: Use s3_request for put requests
Which meant we also needed to change the byte stream to be a closure
that can generate a byte stream
2021-09-29 08:19:32 -04:00
Raphael Taylor-Davies 86cee568d5
feat: use upstream pbjson (#2650)
* feat: use upstream pbjson

* chore: fmt
2021-09-28 16:29:26 +00:00
kodiakhq[bot] b16e7ea91a
Merge branch 'main' into crepererum/issue2518c 2021-09-22 16:09:04 +00:00
Marco Neumann d7b697dfe9 chore: remove unused `object_store` => `tracker` dep 2021-09-22 11:13:40 +02:00
Marco Neumann 981ee0c6df refactor: accept unknown chunks in persisted delete predicates
Due to the timing of the "persist" lifecycle action and that delete
predicates might arrive at any time + the fact that we don't wanna hold
transaction locks for too long, we should accept delete predicates for
chunks that are currently "persisting" even though that lifecycle action
might fail.
2021-09-22 09:29:50 +02:00
Marco Neumann 6682178d6f feat: teach preserved catalog to handle delete predicates 2021-09-20 15:51:14 +02:00
Marco Neumann cef5aeee52 refactor: introduce `ChunkId` type 2021-09-20 13:10:41 +02:00
Marco Neumann acf698c366 fix: delete predicate sorting 2021-09-20 10:48:32 +02:00
Marco Neumann 0c5ba3786b refactor: rename closure to make syntax a bit clearer 2021-09-20 10:48:32 +02:00
Marco Neumann 4c4fd59724 docs: extend comment about (not) cleanup up delete predicates 2021-09-20 10:48:32 +02:00
Marco Neumann 492d991f49 feat: delete catalog pres. catalog <=> in-mem catalog API
First step towards #2518. Creates the Rust API to communicate delete
predicates between the preserved catalog and the in-memory catalog and
adds tests ensuring that the in-mem catalog produces the wanted errors
as well as correct checkpoints (similar to how this is done for the
parquet file tracking already).

**This does NOT contain the actual preservation!**
2021-09-20 10:48:32 +02:00
Marco Neumann 831e55d79e refactor: make error messages more precise 2021-09-20 09:42:55 +02:00
Marco Neumann 9c80d32af5 refactor: use normal google timestamps in parquet metadata again
We changed from Google timestamp (which use variable-sized integers) to
our own fixed-sized integer timestamps so that the size of the parquet
metadata does not depend on the timestamp. However with the introduction
of compression this is the case anyways (since slightly different
timestamps lead to different compression results) and we need now
derministic timestamps for tests. So there is now point in using our own
timestamp type. Switching back to the variable-sized type also shrinks
the post-compression results a bit.
2021-09-20 09:34:03 +02:00
Marco Neumann afc507ae14 feat: compress encoded parquet metadata
Depending on the number of columns, this should safe between 60% and
75%.
2021-09-20 09:33:18 +02:00
Marco Neumann 2820db5583 refactor: split preserved catalog `api` into `core` and `interface`
This makes it clearer which traits and functions users of the preserved
catalog must implement. This also splits the error types into smaller
enums that are easier to understand.

This change should make it easier to implement new functionality (like
capturing delete predicates).
2021-09-16 10:30:11 +02:00