Commit Graph

193 Commits (a4704dd165a816aa5beb9c78a1cb5ea6a3684981)

Author SHA1 Message Date
Andrew Lamb 4da8a16c18
chore: update to arrow 5.0 and master datafusion (#2049)
* chore: update to arrow 5.0 and master datafusion

* fix: Update test for change in object size
2021-07-19 12:49:51 +00:00
Jake Goulding 42b56ad657 refactor: Use SNAFU's context instead of `ok_or_else` 2021-07-16 09:59:54 -04:00
Jake Goulding 939d15a21f perf: Avoid clone when an error doesn't occur 2021-07-16 09:59:54 -04:00
Marco Neumann f57ba6afdb
fix: use fixed-size timestamps for parquet metadata (#2032)
This fixes flaky tests that rely on predictable files sizes.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-16 13:14:02 +00:00
Andrew Lamb 0c86d1dccf
feat: Record parquet bytes size in catalog / parquet_file (#2006)
* feat: Store object store size in parquet_file

* fix: update TRANSACTION_VERSION to 8

* refactor: rename os_bytes --> file_size_bytes
2021-07-15 12:07:11 +00:00
Marco Neumann 40047a76bc refactor: `remove_parquet` cannot fail 2021-07-15 12:07:56 +02:00
Raphael Taylor-Davies 1d00fa2fd8
refactor: track memory metrics in catalog (#1995)
* refactor: track memory metrics in catalog

* chore: update comment
2021-07-14 16:23:00 +00:00
Andrew Lamb d35b74c226
fix: Fix doc build warnings (#1945)
* fix: Fix doc build warnings

* refactor: add deny bare_urls to crates

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-13 08:03:42 +00:00
Andrew Lamb 670826daf9
refactor: make object_store construction interface consistent (#1944)
* refactor: make object_store construction interface consistent

* fix: benchmarks

* fix: doc build

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-12 12:56:36 +00:00
Marco Neumann 18893e76e0 refactor: convert some table name and part. key String to Arcs
This has the (somewhat nice) side effect that it shrinks the in-mem
catalog a bit as well because nw `ParquetChunk` is a bit smaller making
the chunk stage enum smaller as well.
2021-07-08 14:34:28 +02:00
Marco Neumann b528ac2b55 feat: store schemas per table
This way we can:

- check for schema matches even for writes going into different
  partitions
- solve #1768 and #1884 in some future PR

Closes #1897.
2021-07-08 09:18:09 +02:00
Andrew Lamb e6d995cbd8
chore: Update to Rust 1.53.0 (#1922)
* chore: Update to Rust 1.53.0

* fix: Update to latest clippy standards

* fix: bad refactor

* fix: Update escaping

* test: update test output

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-07 18:02:03 +00:00
Marco Neumann 4ca2d3e148 chore: move persistence windows related code into own crate
The entire persistence windows data structures (including the
checkpoints) have nothing to do with the mutable buffer per se. So lets
move them into their own crate. This also makes `parquet_file` not
longer depend on `mutable_buffer`.
2021-07-05 10:23:58 +02:00
Marco Neumann d96e15c3f7 docs: explain why we store checkpoints in parquet files 2021-07-05 09:42:46 +02:00
Marco Neumann cdab1bed05 feat: persist part+db checkpoint in parquets and catalog
This will be required for replay on server startup.
2021-07-05 09:42:46 +02:00
Jacob Marble 0779b0d9bd
feat: add gRPC listener for new write protocol (#1842)
* feat: add gRPC listener for new write protocol

* chore: clippy happy

* chore: lint

* chore: cargo fmt --all

* chore: cargo clippy

* chore: protobuf-lint

* chore: more formatting

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-07-01 16:15:12 +00:00
Marco Neumann 4204127b05 refactor: use protobuf for in-parquet metadata 2021-06-30 16:51:37 +02:00
Marco Neumann ddc9cd49ca chore: bump preserved catalog version 2021-06-29 14:23:06 +02:00
Marco Neumann 3ebb6a3037 refactor: do not capture txn-specific information in parquet files
This helps with #1821.
2021-06-29 14:22:36 +02:00
kodiakhq[bot] eda9532eb2
Merge branch 'main' into crepererum/issue1821-cleanup-lock 2021-06-29 10:48:43 +00:00
Marco Neumann 48df13de05 refactor: use parking lot for catalog cleanup 2021-06-29 12:47:29 +02:00
Marco Neumann f824f235b4
fix: fix info log message
Co-authored-by: Andrew Lamb <alamb@influxdata.com>
2021-06-29 12:35:05 +02:00
Marco Neumann 778a611fb8 docs: add clarifying comment for rebuild test 2021-06-29 11:58:19 +02:00
Marco Neumann 17f89ea8d0 docs: fix comment about lock downgrade 2021-06-29 11:53:55 +02:00
Marco Neumann 2cd5ce98be refactor: do not pass locks around for catalog cleanup 2021-06-29 10:21:41 +02:00
Marco Neumann 730a23faa3 refactor: improve locking around the parquet file cleanup
Instead of (ab)using the transaction lock to prevent the cleanup job
from removing just-written parquet files, use a dedicated lock. This
will later allow us to write parquet files before starting a transaction
(i.e. w/o holding the transaction lock).

This will help with #1821.
2021-06-29 10:20:03 +02:00
Marco Neumann 6ec24353bf refactor: only rebuild a single txn for pres. catalogs
Stop relying on in-parquet transaction information during catalog
rebuilds. This has some downsides (no fork detection, only a single
transaction hence no time travel) but will allow that we remove
transaction information from parquet files, so that we can finally move
the actual parquet file storage out of the transaction lock.

This will help with #1821.
2021-06-28 15:10:44 +02:00
Andrew Lamb 0a03605bbc
refactor: pull Channel --> Stream adapater into its own module (#1793)
* refactor: pull Channel --> Stream adapater into its own module

* docs: Update query/src/exec/stream.rs

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>

Co-authored-by: Marko Mikulicic <mkm@influxdata.com>
2021-06-24 10:35:45 +00:00
kodiakhq[bot] 59993e8b8f
Merge branch 'main' into crepererum/issue1623 2021-06-23 12:40:05 +00:00
Marco Neumann c395409b51 feat: include UUIDv4 into parquet file names
Change schema from

```text
<server_id>/<db_name>/data/<part_key>/<chunk_id>/<table_name>.parquet
```

to

```text
<server_id>/<db_name>/data/<table_name>/<part_key>/<chunk_id>.<uuid>.parquet
```

So parquet files will NEVER be overwritten. This is especially helpful
when dealing with old catalog leftovers (i.e. a parquet file that
belonged to an old but wiped catalog). It also simplifies the reasoning
about file references in the future and follows what other dataset
formats are usually doing (i.e. never replace files).

Also use `ChunkAddr` where it makes sense.
2021-06-23 14:30:28 +02:00
kodiakhq[bot] 70817a474c
Merge branch 'main' into crepererum/issue1740-d 2021-06-23 12:29:54 +00:00
Raphael Taylor-Davies 5cd911c74a
fix: correct row count for object store chunks (#1789) 2021-06-23 12:06:49 +00:00
Marco Neumann 1636f47565 refactor: remove dead code 2021-06-23 10:51:22 +02:00
Marco Neumann cf55df68b5 refactor: remove some `Arc`s around the in-mem catalog
This is for #1740.
2021-06-23 10:51:22 +02:00
Marco Neumann e36b6f9c7a docs: fix intra-doc link 2021-06-23 10:25:05 +02:00
Marco Neumann 67508094b4 fix: double ref
Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
2021-06-23 10:25:05 +02:00
Marco Neumann d2be641864 refactor: make checkpointing easier to use
Don't mix commit+checkpoint in a single call so that the caller has to
reason about the error type and which of the two operations has failed.
Splitting it also makes it easier to create the correct checkpoint data.
2021-06-23 10:25:05 +02:00
Marco Neumann 4a961694ec refactor: make caller sync mem<>OS view during catalog transactions
This is for #1740. Greatly simplifies the integration of the persisted
catalog into the DB.
2021-06-23 10:25:05 +02:00
Marco Neumann d1db0dfaeb refactor: remove type parameter from preserved catalog
For #1740.
2021-06-22 10:53:10 +02:00
Marco Neumann ff60627500 refactor: make preserved catalog NOT own the in-mem catalog
Works towards #1740.
2021-06-21 18:39:43 +02:00
Marco Neumann 881729bd23 refactor: make caller responsible to create checkpoint data
This decouples the in-mem and preserved catalog a bit and works
towards #1740.
2021-06-21 18:33:23 +02:00
Marco Neumann aba973a6e1 refactor: make catalog `wipe` a freestanding function
It does not interact with the `CatalogState` so users can call this
function without that type.
2021-06-21 09:31:23 +02:00
Andrew Lamb 258a6b1956
chore: remove more dead code (#1760)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-18 21:28:22 +00:00
Andrew Lamb de67bd3efe
refactor: Remove PartitionChunk::table_schema (#1756)
* refactor: Remove PartitionChunk::table_schema

* docs: update comments
2021-06-18 16:13:16 +00:00
Raphael Taylor-Davies f6dbc8d6f2
refactor: add ChunkAddr to describe location of chunk in catalog (#1745)
* refactor: add ChunkPath to describe location of chunk in catalog

* refactor: rename ChunkPath to ChunkAddr

* chore: further renames

* chore: even more renames

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-17 12:04:37 +00:00
Marco Neumann e056d97cf6 test: always test transaction aborts 2021-06-16 11:01:14 +02:00
Marco Neumann caaf95c6ec refactor: remove lock from `TestCatalogState` 2021-06-16 10:51:15 +02:00
Marco Neumann c8c412f6fe refactor: rework catalog state interface
This now allows not only for copy-based transaction handling but also
for eager exec and rollbacks. This will be useful to properly implement
transaction aborts for the "real" catalog.
2021-06-16 10:51:15 +02:00
Marco Neumann e064a6bbba test: add test suite for `CatalogState` impls
This makes it easier to check if `CatalogState` correctly implement all
features, including transaction aborting.
2021-06-16 10:50:47 +02:00
Andrew Lamb b756e09904
refactor: Rename parquet_file::Chunk --> ParquetChunk (#1722)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2021-06-15 11:21:49 +00:00