Commit Graph

317 Commits (092268fc8755acd1aa67e320b18a9c75a88f62f1)

Author SHA1 Message Date
Edd Robinson 2334e779eb feat: implement read_window_aggregate sub-command 2022-02-09 12:32:48 +00:00
Edd Robinson 0774e1d328 feat: add read_window_aggregate request builder 2022-02-09 12:32:48 +00:00
Marco Neumann 4bddab56e2
feat: create new sequencers in ingester on demand (#3671)
There is no need to introduce yet another admin action to do that. If
the sequencer does not exist yet, we can just create it and set the
`min_unpersisted_sequence_number` to 0 (which is done be `create_or_get`).
2022-02-09 12:26:30 +00:00
Edd Robinson dfa6fd8579 feat: add quiet option to storage 2022-02-08 21:27:29 +00:00
Edd Robinson 11855a5eff feat: add format flag 2022-02-08 21:15:07 +00:00
Edd Robinson c175ccd1b4
feat: make stop/stop/predicate global (#3681) 2022-02-08 20:06:47 +00:00
kodiakhq[bot] ace76cef14
Merge branch 'main' into dom/sharded-cache 2022-02-08 16:09:48 +00:00
Paul Dix 59b2141c0b
feat: Add lifecycle manager to ingester (#3645)
This adds the lifecycle manager to the ingester. It will trigger based on a threshold for max partition size or age or based on keeping total memory under a certain threshold.

It defines a new interface for a persister, which is stubbed out for IngesterData. I'm not sure yet how persistence errors should be handled. The assumption here is that the persister continues to retry persistence forever until it succeeds.

There is one scenario I can think of that may cause this lifecycle manager problems. If a single partition is very high throughput, it could cause things to back up as persistence is not parallelized within a single partition. Any given partition can currently only run one persistence operation at a time. We can address this later.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 15:23:40 +00:00
Marco Neumann 5de4d6203f
refactor: catalog transaction (#3660)
* refactor: catalog Unit of Work (= transaction)

Setup an inteface to handle Units of Work within our catalog. Previously
both the Postgres and the in-mem backend used "mini-transactions on
demand". Now the caller has a clear way to establish boundaries and
gets read and write isolation. A single `Arc<dyn Catalog>` can create as
many `Box<dyn UnitOfWork>` as you like, but note that depending on the
backend you may not scale infinitely (postgres will likely impose
certain limits and the in-mem backend limits concurrency to 1 to keep
things simple).

* docs: improve wording

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>

* refactor: rename Unit of Work to Transaction

* test: improve `test_txn_isolation`

* feat: clearify transaction drop semantics

Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 13:38:33 +00:00
kodiakhq[bot] 4567800901
Merge branch 'main' into er/feat/tag_values_cli 2022-02-08 13:07:59 +00:00
Raphael Taylor-Davies be662ec731
feat: lazy query log! (#3654)
* feat: lazy query log

* chore: fmt

* chore: review feedback

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-08 13:07:28 +00:00
Edd Robinson 6c10e1e901 feat: support _measurement/_field tag keys 2022-02-08 11:32:28 +00:00
Edd Robinson eb733042ca feat: add support for tag_values cli 2022-02-07 22:02:29 +00:00
Edd Robinson 38a889ecf6 refactor: remove unnecessary struct 2022-02-07 22:02:29 +00:00
Marco Neumann d9cc9f5a2a
feat: expose write buffer connection config via CLI (#3651)
* feat: improve rskafka config error messages

* feat: expose write buffer connection config via CLI
2022-02-07 16:24:28 +00:00
Marco Neumann 977ccc1989
fix: use a single metric registry for ingester (#3652)
With this change write buffer ingestion metrics are showing up under
`/metrics`

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 15:56:54 +00:00
Edd Robinson 87ac926e06
feat: add queries system table (#3655)
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
2022-02-07 15:26:06 +00:00
Carol (Nichols || Goulding) 2e30483f1f
refactor: Remove predicate module from predicate crate (#3648)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 14:54:07 +00:00
Marco Neumann e2db1df11f
refactor: improve writer buffer consumer interface (#3631)
* refactor: improve writer buffer consumer interface

The change looks huge but is actually rather simple. To
understand the interface change, let me first explain what we want:

- be able to fetch watermarks for any sequencer
- have streams:
  - each streams tracks a sequencer and has an offset state (no read
    multiplexing)
  - we can seek a stream
  - seeking and streaming cannot be done at the same time (that would be
    weird and likely leads to many bugs both in write buffer and in the
    user code)
- ideally we don't need to create streams of all sequencers but can
  choose a subset

Before this change we had one mutable consumer struct where you can get
all streams and watermark functions (this mutable-borrows the consumer)
or you can seek a single stream (this also mutable-borrows the
consumer). This is a bit weird for multiple reasons:

- you cannot seek a single stream without dropping all of them
- the mutable-borrow construct makes it really difficult to pass the
  streams into separate threads
- the consumer is boxed (because its mutable) which makes it more
  difficult to handle in a large-scale application

What this change does is the following:

- you have an immutable consumer (similar to the producer)
- the consumer offers the following methods:
  - get the set of sequencer IDs
  - get watermark for any sequencer
  - get a stream handler (see next point) for any sequencer
- the stream handler captures the stream state (offset) and provides you
  a standard `Stream<_>` interface as well as a seek function.
  Mutable-borrows ensure that you cannot use both at the same time.

The stream handler provides you the stream via `handler.stream()`. It
doesn't implement `Stream<_>` itself because the way boxing, dynamic
dispatch work, and pinning interact (i.e. I couldn't get it to work
without the indirection).

As a bonus point (which we don't use however) you can now create
multiple streams for the same sequencer and they all have their own
offset.

* fix: review comments

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>

Co-authored-by: Carol (Nichols || Goulding) <193874+carols10cents@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-07 12:24:17 +00:00
Edd Robinson a52c0a26e6 feat: print read filter results 2022-02-04 22:14:22 +00:00
Edd Robinson d328b37803 feat: teach IOx to convert RPC frames into Recordbatches 2022-02-04 18:34:54 +00:00
Edd Robinson 4cdaaf96bf refactor: clean up errors 2022-02-04 18:34:54 +00:00
Edd Robinson ea0ece8b4b feat: issue read_filter request 2022-02-04 18:34:54 +00:00
Dom Dwyer 0b044b95fb perf: use sharded namespace cache
Enables the ShardedCache for the namespace schema cache.
2022-02-04 16:12:51 +00:00
Dom Dwyer 026a557c0b refactor: rename TableNamespaceSharder
Rename to JumpHash and expose the hashing internals for reuse (outside
of only table & namespace sharding).
2022-02-04 15:56:09 +00:00
Dom Dwyer 0fd122e365 refactor: "inf" retention const
Adds the iox_catalog::INFINITE_RETENTION_POLICY constant.
2022-02-04 15:35:33 +00:00
Dom Dwyer f1ba50f40b feat: resolve query pool ID at startup
This commit adds a --query-pool flag to router2, used to upsert a
catalog record at startup. Auto-created namespaces will reference this
query pool.

This is for testing only and will be removed in a future commit.
2022-02-04 15:35:30 +00:00
Dom Dwyer aefc70a9ea feat(router2): namespace auto-creation
Decorate the existing request handler pipeline with a layer that
implicitly creates the namespace when a write request is received.
2022-02-04 15:34:15 +00:00
Marco Neumann 0c01044677
fix: partition range in ingester CLI has INCLUSIVE end (#3641) 2022-02-04 13:41:57 +00:00
Marco Neumann d2ccf23263
fix: use standard DSN argument for router2 CLI (#3632)
- support long-form (instead of relying on positional arguments)
- use same code as everying else

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 17:20:52 +00:00
Markus Westerlind 0bd7941a18
fix(REPL): Don't buffer lines until a trailing semicolon is found and add history hinting (#3630)
* fix(REPL): Don't buffer lines until a trailing semicolon is found

The repl would silently buffer all lines until a trailing semicolon were found which
resulted in some very confusing error messages as I would input invalid commands followed
by a command I thought were valid, except I'd still get an error due to the previous command being buffered.

This uses rustyline's helper feature to detect incomplete input (no trailing semicolon) and makes
it accept multiline input until the input is completed.

I also included some of rustyline's default hint and highlighting while I was at it.

* chore: cargo clippy

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 17:11:01 +00:00
Marco Neumann b3b2d9b623
feat: catalog setup CLI command (#3627)
Closes #3509.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 14:16:21 +00:00
Andrew Lamb ab3c7573f5
test: add end to end for read_filter and empty string predicates (#3619)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 14:05:32 +00:00
Marco Neumann 50cff27b01
chore: remove rdkafka dependency (#3625)
All features are now covered by rskafka. This also removes the need to
specify a server ID for write buffer consumers. This was only used for
rdkafka since there we needed to specify a consumer group, even though
we did not use any transactions.
2022-02-03 13:33:56 +00:00
kodiakhq[bot] 3197ea945b
Merge branch 'main' into dom/extract-ns-cache 2022-02-03 12:30:37 +00:00
Andrew Lamb 77b80e7618
fix(InfluxQL): treat null tags as `''` rather than `null` in storagerpc queries (#3557)
* fix(InfluxQL): treat null tags as `''` rather than `null` in storage rpc queries

* test: add one more case

* fix: Update comment

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>

Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 12:14:43 +00:00
Paul Dix ce46bbaada
feat: wire up the write buffer to the ingester process (#3533)
This adds the scaffolding for the ingester server to consume data from Kafka. This ingests data in an in memory structure while creating records in the catalog for any partitions that don't yet exist.

I've removed catalog_update.rs in ingester for now. That was mostly a placeholder and will be going in a combination of handler.rs and data.rs on my next PR which will have some primitive lifecycle wired up.

There's one ugly bit here where the DML write is cloned because it's getting borrowed to output spans and metrics. I'll need to follow up with a refactor to make it so that the DML write's tables can be consumed without it gumming up the metrics stuff.

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-03 11:47:28 +00:00
Dom Dwyer 3cc4481616 refactor: extract NamespaceSchema cache
Breaks the in-memory cache of NamespaceSchema out into a decoupled type
that can be shared across multiple DML handlers.
2022-02-03 10:01:07 +00:00
Carol (Nichols || Goulding) a534136ccc
fix: Correct a 'long' argument name so the ingester command can run (#3621)
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 20:09:41 +00:00
Andrew Lamb 429d59f1b6
feat: Simplify predicates in the `InfluxRpcFrontend` before using them (#3588)
* feat: normalize + simplify RPC predicates before using them

* docs: Update predicate/src/rpc_predicate.rs

Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
2022-02-02 19:46:57 +00:00
kodiakhq[bot] 6de8ed4adc
Merge branch 'main' into dom/schema-validation 2022-02-02 16:05:41 +00:00
Luke Bond 6da15d9690 chore: cleanup catalog CLI output & args
Co-authored-by: Marco Neumann <marco@crepererum.net>
2022-02-02 15:30:19 +00:00
Luke Bond 15827a534b feat: catalog CLI command with update subcommand 2022-02-02 15:30:19 +00:00
Dom Dwyer 39d489d9e7 refactor: enable schema validation
Adds the SchemaValidator to the DML handler stack - this adds it into
the request path in router2.
2022-02-02 14:04:14 +00:00
Edd Robinson 5441682207 feat: add support for parsing predicate 2022-02-02 11:02:33 +00:00
Edd Robinson 08901c13cd feat: support parsing timerange 2022-02-02 11:02:33 +00:00
Edd Robinson a424d1c912 feat: shell command read_filter 2022-02-02 11:02:33 +00:00
Marco Neumann 59a2c74352
refactor: reusable ingester/router2 CLI pieces (#3590)
* refactor: use a single CLI parser for ingester/router2 WB

* refactor: reusable catalog DSN CLI handling

We are going to need DSN handling for the router as well as for the some
admin tools.

* fix: DNS -> DSN
2022-02-01 12:57:58 +00:00
Marco Neumann 22778a3a80
chore: upgrade rskafka and parking_lot (#3592) 2022-02-01 11:50:42 +00:00
Marco Neumann b326b62b44
feat: buffer writes when writing to RSKafka (#3520) 2022-02-01 10:07:52 +00:00