Commit Graph

867 Commits (0bd7941a187588299f4141d7f6e47fff67da4643)

Author SHA1 Message Date
Markus Westerlind 0bd7941a18
fix(REPL): Don't buffer lines until a trailing semicolon is found and add history hinting (#3630)
* fix(REPL): Don't buffer lines until a trailing semicolon is found

The repl would silently buffer all lines until a trailing semicolon were found which
resulted in some very confusing error messages as I would input invalid commands followed
by a command I thought were valid, except I'd still get an error due to the previous command being buffered.

This uses rustyline's helper feature to detect incomplete input (no trailing semicolon) and makes
it accept multiline input until the input is completed.

I also included some of rustyline's default hint and highlighting while I was at it.

* chore: cargo clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 17:11:01 +00:00
Marco Neumann 50cff27b01
chore: remove rdkafka dependency (#3625)
All features are now covered by rskafka. This also removes the need to
specify a server ID for write buffer consumers. This was only used for
rdkafka since there we needed to specify a consumer group, even though
we did not use any transactions.
2022-02-03 13:33:56 +00:00
Marco Neumann bc4b7f8a5b
test: ensure that rskafka and rdkakfa work together (#3624)
* chore: upgrade rskafka + enable snappy support

* test: ensure that rskafka and rdkakfa work together

Before removing rdkafka ensure that:

- rskafka can consume existing messages produced by rdkafka so we do not
  need to drain existing topics
- rdkafka can consume new messages produced by rskafka so we can roll
  back

I ran the whole `write_buffer` test suite (including the newly added
tests) using Apache Kafka as well as Redpanda.

* test: ensure we handle consumer offset in error case correctly

* docs: explain test setup

Co-authored-by: Andrew Lamb <alamb@influxdata.com>

Co-authored-by: Andrew Lamb <alamb@influxdata.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 12:52:42 +00:00
Paul Dix ce46bbaada
feat: wire up the write buffer to the ingester process (#3533)
This adds the scaffolding for the ingester server to consume data from Kafka. This ingests data in an in memory structure while creating records in the catalog for any partitions that don't yet exist.

I've removed catalog_update.rs in ingester for now. That was mostly a placeholder and will be going in a combination of handler.rs and data.rs on my next PR which will have some primitive lifecycle wired up.

There's one ugly bit here where the DML write is cloned because it's getting borrowed to output spans and metrics. I'll need to follow up with a refactor to make it so that the DML write's tables can be consumed without it gumming up the metrics stuff.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 11:47:28 +00:00
kodiakhq[bot] 9079baf19a
Merge branch 'main' into dom/schema-validation 2022-02-02 16:27:35 +00:00
Andrew Lamb 030a2cb4c1
chore: Update datafusion (#3613)
* chore: Update datafusion

* fix: update for latest DF API

* fix: another API change

* fix: clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 16:27:11 +00:00
Dom Dwyer 4744c5804e refactor: remove Dashmap
Swap Dashmap for a regular RwLock<HashMap<..,>> due to soundness issues:

    https://rustsec.org/advisories/RUSTSEC-2022-0002
2022-02-02 14:04:53 +00:00
Dom Dwyer 39d489d9e7 refactor: enable schema validation
Adds the SchemaValidator to the DML handler stack - this adds it into
the request path in router2.
2022-02-02 14:04:14 +00:00
Dom Dwyer 6598023726 feat: cache NamespaceSchema in validator
Adds an in-memory cache of table schemas to the SchemaValidator DML
handler.

The cache pulls from the global catalog when observing a column for the
first time, and pushes the column type to set it for subsequent requests
if it does not exist (this pull & push is done by atomically by the
catalog in an "upsert" call).

The in-memory cache is sharded by namespace, with each shard guarded by
an individual lock to minimise contention between readers (the expected
average case) and writers (only when adding new columns/tables).

Relies on the catalog to serialise new column creation and validate
parallel creation requests.
2022-02-02 13:04:53 +00:00
Dom Dwyer c81f207298 feat: schema validation
Implements a write schema validation DML handler, denying requests that
conflict with the schema within the global catalog. Additive schema
changes are accepted, incrementally updating the global catalog schema.

Deletes are passed through unchanged and unvalidated.
2022-02-02 13:04:53 +00:00
Edd Robinson 5441682207 feat: add support for parsing predicate 2022-02-02 11:02:33 +00:00
Edd Robinson c1f5994660
refactor: move sql parsing -> RPC predicate into own crate (#3604)
* chore: create crate

* refactor: move module to new crate
2022-02-02 10:41:57 +00:00
Edd Robinson fa546047fb
refactor: update dep with API change (#3596) 2022-02-01 14:53:10 +00:00
Edd Robinson 443fd00c1b
Merge branch 'main' into er/feat/parse_sql_rpc 2022-02-01 13:55:08 +00:00
Marco Neumann 0a2cb36ddf
fix: workspace-hack (#3593) 2022-02-01 12:32:58 +00:00
Marco Neumann 22778a3a80
chore: upgrade rskafka and parking_lot (#3592) 2022-02-01 11:50:42 +00:00
Edd Robinson 175732c3ca feat: basic sqlparser -> RPCNode 2022-02-01 10:26:03 +00:00
Marco Neumann b326b62b44
feat: buffer writes when writing to RSKafka (#3520) 2022-02-01 10:07:52 +00:00
kodiakhq[bot] 8bef2c105c
Merge branch 'main' into cn/persist 2022-01-31 18:50:45 +00:00
Andrew Lamb 7b96a37165
chore: Update datafusion (#3586)
* chore: update DataFusion to f849968057ddddccc9aa19915ef3ea56bf14d80d

* fix: reduce overhead of creating physical expressions

* chore: use MemTrackingMetrics

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-31 18:15:28 +00:00
Carol (Nichols || Goulding) bf89162fa5
refactor: Move IoxMetadata to parquet_file 2022-01-31 10:36:33 -05:00
Carol (Nichols || Goulding) dd9620da0c
feat: Create a new proto definition for the new design's IoxMetadata 2022-01-31 10:36:32 -05:00
Carol (Nichols || Goulding) 5e0e0d8aa7
feat: Write parquet to object storage in a similar way as parquet_file::Storage 2022-01-31 10:36:32 -05:00
Carol (Nichols || Goulding) c633c9bc5c
feat: Wire object store into ingester persistence 2022-01-31 10:36:30 -05:00
Raphael Taylor-Davies 4101d16f71
chore: feature flag consistency (#3574)
* chore: feature flag consistency

* chore: add aarch64-apple-darwin to hakari

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-28 16:38:59 +00:00
Nga Tran 8735ede74f
feat: IoxMetadata for parquet file (#3547)
* feat: IoxMetadata for parquet file

* fix: typos

* refactor: address review comments

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-28 14:41:59 +00:00
Dom 1669acc0c2
build: update tokio (#3566)
Release notes:
    https://github.com/tokio-rs/tokio/releases/tag/tokio-1.16.0
2022-01-28 10:36:12 +00:00
Andrew Lamb b486258dfb
chore: run cargo update (#3559)
Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 21:05:31 +00:00
Dom 9201023ea4
feat: schema validation for MutableBuffer instances (#3554)
* refactor: Debug bounds on Catalog trait

* feat: validate MutableBatch schema

Changes the schema validation code to validate MutableBatch instances
(coming from a pre-parsed LP write, and non-LP-based writes) instead of
parsed LP lines.

* refactor: Send bound on boxed errors

* refactor: clippy

Allow assert_eq!(bool, bool) for readability.

* refactor: no PartialEq<MB Column> for ColumnSchema

Remove the PartialEq<mutable_buffer::Column> for ColumnSchema - it's
definitely more readable as a method call.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 20:55:18 +00:00
Luke Bond 4a96e52290
feat: router2 sharder benchmarking (#3558)
* feat: benchmarking the router2 sharder

* chore: added throughput to sharder benchmarks; vary num buckets
2022-01-27 18:09:16 +00:00
Andrew Lamb 2062267d0f
chore: Update hashbrown (#3551)
* chore: Update hashbrown

* fix: hakari

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 15:34:10 +00:00
Dom ce568ab447
build(iox_catalog): remove unused dependencies (#3552)
* build: don't pull in all of tokio

We already specify the tokio features we need so "full" (all features)
is not necessary.

* build: remove chrono dependency

Appears unused.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 15:14:42 +00:00
Raphael Taylor-Davies 21c1824a7a
refactor: remove table_names from Predicate (#3545)
* refactor: remove table_names from Predicate

* chore: fix benchmarks

* chore: review feedback

Co-authored-by: Edd Robinson <me@edd.io>

* chore: review feedback

* chore: replace Default::default with InfluxRpcPredicate::default()

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 14:44:49 +00:00
Andrew Lamb 5488c257d1
chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 (#3517)
* chore: Update datafusion

* chore: update to arrow 8

* fix: update to use new DataFusion APIs

* fix: update case for sortedness

* fix: cargo hakari
2022-01-27 13:33:27 +00:00
Andrew Lamb 7261571abf
fix: Revert "chore: temporarily hack around datafusion tempfiles" (#3525)
This reverts commit ae5763c1cb6bb4a98ffe0779a3a35f6daaf10971.

Co-authored-by: Edd Robinson <me@edd.io>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-27 12:08:49 +00:00
Carol (Nichols || Goulding) bc44d33108
feat: Implement a snapshot method on DataBuffer (#3518)
* feat: Implement a snapshot method on DataBuffer

Fixes #3510.

* test: Add a test snapshotting batches with different but compatible schemas

* fix: Simplify min/max sequencer number collection

The first batch should always have the min sequencer number. The last
batch should always have the max sequencer number. The min should always
be less than (or equal to, in case there's only one batch) the max.
2022-01-26 15:22:51 +00:00
Marco Neumann 2928254c0f
fix: test logging (#3536)
- Use a more standard way to setup the tracing subsystem (as described
  in tracing-subscriber docs)
- Also capture content from `log` crate
- Play nice w/ Rust's libtest message capture

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-26 10:28:51 +00:00
Dom b846ead320
feat(router2): shard writes/deletes into write buffer (#3499)
* feat: Sequencer wrapper

This type wraps an underlying WriteBufferWriter implementation, tagging
it with a sequencer ID it should use when enqueuing operations to the
buffer.

* feat: mock sharder

Implements a mock Sharder impl that returns pre-configured responses to
shard(), and captures the input to the call.

* feat: sharded write buffer

Implements sharding of ops into an underlying WriteBuffer.

Writes are sharded by some abstract Sharder impl, collated per shard to
maximise the size of each op (and therefore compression efficiency),
converted into a DML operation and then enqueued in parallel to the
underlying WriteBuffer implementation.

Deletes are modelled as being mapped to a single write buffer shard,
which is the case while we support sharding based on the table &
namespace only. Deletes will be extended to support (potentially)
multiple shards when column overrides are implemented.

* refactor: runtime write buffers

Switch from using static dispatch, to using a runtime specified
WriteBufferWriting implementation.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-25 15:19:48 +00:00
Andrew Lamb 51a0f8a56a
chore: temporarily hack around datafusion tempfiles (#3524) 2022-01-25 12:30:29 +00:00
NGA-TRAN f9c1e80a7f chore: update thread_local
chore: update thread_local
2022-01-24 13:37:52 -05:00
NGA-TRAN 797ba459b9 chore: merge main to branch 2022-01-24 12:06:23 -05:00
NGA-TRAN 5f98a07b7f chore: add Corgo.lock 2022-01-24 12:03:02 -05:00
Paul Dix bb893510a0 feat: Add scaffolding for ingester server
* Adds a new ingester command to start an ingester server
* Moves previous ingester server over to handler
* Skeleton for gRPC and HTTP handlers
2022-01-21 18:02:19 -05:00
Andrew Lamb 9615feacb3
fix(InfluxQL): Support RegEx with escape sequences not supported by Rust regex (#3502)
* fix(InfluxQL): Translate unsupported meta characters

* fix: remove debugging

* fix: clippy sacrifice

* docs: Add additional background and rationale for rewriting

* fix: doc link
2022-01-21 14:40:10 +00:00
kodiakhq[bot] 5b22abf6a2
Merge branch 'main' into crepererum/wb_rskafka 2022-01-20 16:48:10 +00:00
Andrew Lamb 9b6e626626
chore: Update datafusion (and get fix for influxql test failure) (#3484)
* test: add tests for comparing dictionary arrays

* chore: update datafusion deps

* refactor: Update code for DataFusion API changes

* fix: update test

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-01-20 14:01:47 +00:00
Marco Neumann 76dd62a6c2 feat: RSKafka-driven write buffer 2022-01-20 12:36:10 +01:00
Dom 36d50d083f feat: sharder trait & impl
This commit defines the Sharder trait that should allow us to implement
multiple sharding strategies over a defined set of input types (such as
a MutableBatch for writes, DeletePredicate for deletes, etc).

This commit also includes a jump hash implementation that consistently
shards (table name, namespace) tuples to a given shard for all input
types.
2022-01-20 11:10:37 +00:00
Paul Dix d825dab8e2 fix: hakari workspace hack 2022-01-19 14:48:00 -05:00
Paul Dix 41038721e1 feat: Add parquet file records to iox_catalog
* Adds ParquetFile and scaffolding to IOx catalog
* Changed the file_location in parquet_file to object_store_id which is a uuid
2022-01-19 14:14:54 -05:00