influxdb

Commit Graph

Author	SHA1	Message	Date
Dom Dwyer	b07f15bec7	refactor: parallel column resolution A quick change to perform the ColumnRepo::create_or_get() calls in parallel (up to a maximum of 3 in-flight at any one time) in order to mitigate the latency of the call and reduce the overall schema validation call duration. The in-flight limit is enforced to avoid starving the DB connection pool of connections.	2022-02-24 21:04:25 +00:00
Carol (Nichols \|\| Goulding)	723a0c659f	fix: Remove greater_than_sequence_number from IngesterQueryRequest (#3856 )	2022-02-24 19:23:44 +00:00
Marco Neumann	49d1be30e7	feat: wire up `ParquetFilePath` for NG (#3853 ) It's a bit of a duck-type hack, but if we wanna just `ParquetFileChunk` in the new architecture, we somehow need it to accept new-gen paths. Also path handling should be somewhat centralized since ingester/compactor/querier all need to construct them. So having a `ParquetFilePath` that supports both path styles seems to be a not-to-bad solution. This should obviously be cleaned up in some not-to-distant future.	2022-02-24 16:05:38 +00:00
Carol (Nichols \|\| Goulding)	252ced7adf	feat: Add row count to the parquet_file record in the catalog (#3847 ) Fixes #3842.	2022-02-24 15:20:50 +00:00
Marco Neumann	d62a052394	feat: extend catalog so we can recover `ParquetChunk`s from it (#3852 ) * refactor: less parquet data copying * feat: `PartitionRepo::get_by_id` * feat: `TableRepo::get_by_id` * feat: `ParquetFile::file_size_bytes` * feat: `ParquetFile::parquet_metadata` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-24 13:16:15 +00:00
Marco Neumann	9079e6ddb0	feat: backoff retries in ingester (#3841 ) * feat: add `backoff` crate * feat: backoff retries in ingester Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-23 17:58:16 +00:00
Carol (Nichols \|\| Goulding)	71f62eee68	fix: Remove min_time and max_time from IngesterQueryRequest (#3839 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-23 15:46:31 +00:00
Marco Neumann	657ac249e9	feat: track ingester jobs (#3836 )	2022-02-23 15:33:47 +00:00
Paul Dix	276d9b123a	feat: Add min_sequence_number tracking for sequencers in ingester (#3785 ) Fixes #3702. This pulls the min sequence tracking into the LifecycleManager. Because the number requires looking at all other partitions in memory, this was the most efficient place to put it. The manager updates the sequencer state after it calls persist. The number is meant to be a lower bound on the sequence number. Issue #3783 will add functionality for the ingester to ignore replayed data that has already been persisted.	2022-02-22 21:53:33 +00:00
Nga Tran	a91e2eadc7	feat: apply tombstones to the batches of the ingest life-cycle (#3770 ) * feat: changes needed to apply tombstones correctly on the life-cycle ingest bacthes * refactor: adjust the design after discussing with Paul * feat: apply the coming tombstone on all data but persiting one * chore: fmt * fix: build on buffer tombstone * test: delete & write tests for a parition and some cleanup * feat: No need add processed tombstones for newly created parquet file in the ingester becasue all deletes before that parquet file is created were applied * chore: cleanup Co-authored-by: Paul Dix <paul@pauldix.net> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-22 18:54:21 +00:00
Carol (Nichols \|\| Goulding)	1b9212540b	feat: Send IngesterQueryResponse data back as response of doGet Flight request (#3772 ) * fix: Adjust fields of IngesterQueryResponse * feat: Adjust IngestHandler query method to call prepare_data_to_querier * feat: Send ingest query result data back through Flight doGet * feat: Send delete predicates and max sequencer number in metadata * fix: greater_than_sequence_number should be of type SequenceNumber * fix: Remove DeletePredicates from IngesterQueryResponse Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-18 17:42:49 +00:00
Marco Neumann	f54ef92b77	fix: supervise and shutdown ingester background tasks (#3769 ) * fix: supervise and shutdown ingester background tasks Closes #3761. Closes #3762. * docs: improve wording Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com> * test: join/shutdown handling for ingester Co-authored-by: Raphael Taylor-Davies <1781103+tustvold@users.noreply.github.com>	2022-02-18 09:35:29 +00:00
Paul Dix	23b3942306	fix: compact persisting panics on single row (#3784 )	2022-02-17 18:33:04 +00:00
Carol (Nichols \|\| Goulding)	90da060156	feat: Add namespace and sequencer id fields to IngesterQueryRequest protobuf (#3766 ) Fixes #3753. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-16 19:21:15 +00:00
Nga Tran	ea814e9aa4	feat: API and steps to prepare data to send back to the Querier per its request (#3756 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-16 02:45:58 +00:00
Paul Dix	f542045485	feat: wire up persistence in ingester (#3685 ) This adds persistence into the ingester with a lifecycle manager. The persist operation must still be updated to keep track of the min_unpersisted_sequence_number for each sequencer.	2022-02-16 00:13:40 +00:00
Nga Tran	0b3f76462d	feat: build Query Plan that queries QueryableBatch with filters (#3742 ) * feat: initial implementaion the Query Plan that query QueryableBatch with filters * fix: read_filter of QueryableBatch should provide the shema of the columns/projection it needs * chore: Apply suggestions from code review Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * chore: address review comment Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-02-15 16:06:26 +00:00
Marco Neumann	44ee0166a0	fix: start Kafka write buffer stream at "earliest" offset, not at "0" (#3748 )	2022-02-15 13:36:59 +00:00
Andrew Lamb	a30803e692	chore: Update datafusion, update `arrow`/`parquet`/`arrow-flight` to 9.0 (#3733 ) * chore: Update datafusion * chore: Update arrow * fix: missing updates * chore: Update cargo.lock * fix: update for smaller parquet size * fix: update test for smaller parquet files * test: ensure parquet_file tests write multiple row groups * fix: update callsite * fix: Update for tests * fix: harkari * fix: use IoxObjectStore::existing Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-15 12:10:24 +00:00
Marco Neumann	c6e374a025	feat: allow catalog access w/o a transaction (#3735 ) * feat: allow catalog access w/o a transaction Now the caller has the full control if they want to use a transaction or not. * fix: remove non-transaction-safe `create_many` * fix: remove unnecessary transactions	2022-02-15 10:15:36 +00:00
Carol (Nichols \|\| Goulding)	85aa019f50	feat: Turn protobuf predicates into predicate::Predicate (#3707 ) * feat: Turn protobuf predicates into predicate::Predicate * fix: Take buf lint's suggestions Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-14 17:56:56 +00:00
Nga Tran	d1c71ba5d8	feat: predicate pushdown for Ingester's QueryableBatch (#3728 ) * feat: predicate pushdown for Ingester's QueryableBatch * chore: comment cleanup * chore: Apply suggestions from code review Co-authored-by: Edd Robinson <me@edd.io> * refactor: address review comments Co-authored-by: Edd Robinson <me@edd.io> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-14 17:28:52 +00:00
Nga Tran	d3bd03e37a	feat: Support Projection Pushdown for a QueryableBatch (#3712 ) * feat: projection pushdown for QueryableBatch * chore: clean up and remove unwrap * fix: Add Sync to a Snafu source to have the code compile * chore: cleanup and add comments for tests * refactor: Add tests for scanning non existing columns and fix related bugs * chore: modify comment to trigger auto check in github work	2022-02-10 19:29:21 +00:00
Carol (Nichols \|\| Goulding)	73828323ac	feat: Ingester Flight gRPC API (#3623 ) * feat: Add a way to run ingester with an in-memory catalog from the CLI If you set the --catalog-dsn string to "mem", rather than using that as a Postgres connection URL, create an in-memory catalog. Planning on using this in tests, so not documenting. * fix: Set default topic to the same value as SHARED_KAFKA_TOPIC Namely, both should use an underscore. I don't think there's a way to directly share these values between a constant and an annotation. * feat: Add a flight API (handshake only) to ingester * fix: Create partitions if using file-based write buffer * fix: Change the server fixture to handle ingester server type For now, the ingester doesn't implement the deployment API. Not sure if it should or not. * feat: Start implementing ingester do_get, namely decoding the query Skip serialization of the predicate for the moment. * refactor: Rename ingest protos to ingester to match crate name * refactor: Rename QueryResults to QueryData * feat: Move ingester flight client to new querier crate * fix: Off by one error, different starting indexes in sequencers * fix: Create new CLI argument to pick the catalog type * fix: Create a CLI option to set the number of topics to auto-create in the write buffer * fix: Check the arrow flight service's health to tell that the ingester gRPC is up * fix: Set postgres as the default catalog type * fix: Return an error rather than panicking if CLI args aren't right	2022-02-09 19:07:44 +00:00
Paul Dix	59b2141c0b	feat: Add lifecycle manager to ingester (#3645 ) This adds the lifecycle manager to the ingester. It will trigger based on a threshold for max partition size or age or based on keeping total memory under a certain threshold. It defines a new interface for a persister, which is stubbed out for IngesterData. I'm not sure yet how persistence errors should be handled. The assumption here is that the persister continues to retry persistence forever until it succeeds. There is one scenario I can think of that may cause this lifecycle manager problems. If a single partition is very high throughput, it could cause things to back up as persistence is not parallelized within a single partition. Any given partition can currently only run one persistence operation at a time. We can address this later. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-08 15:23:40 +00:00
Marco Neumann	5de4d6203f	refactor: catalog transaction (#3660 ) * refactor: catalog Unit of Work (= transaction) Setup an inteface to handle Units of Work within our catalog. Previously both the Postgres and the in-mem backend used "mini-transactions on demand". Now the caller has a clear way to establish boundaries and gets read and write isolation. A single `Arc<dyn Catalog>` can create as many `Box<dyn UnitOfWork>` as you like, but note that depending on the backend you may not scale infinitely (postgres will likely impose certain limits and the in-mem backend limits concurrency to 1 to keep things simple). * docs: improve wording Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: rename Unit of Work to Transaction * test: improve `test_txn_isolation` * feat: clearify transaction drop semantics Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-08 13:38:33 +00:00
Marco Neumann	977ccc1989	fix: use a single metric registry for ingester (#3652 ) With this change write buffer ingestion metrics are showing up under `/metrics` Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-07 15:56:54 +00:00
Carol (Nichols \|\| Goulding)	2e30483f1f	refactor: Remove predicate module from predicate crate (#3648 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-07 14:54:07 +00:00
Marco Neumann	e2db1df11f	refactor: improve writer buffer consumer interface (#3631 ) * refactor: improve writer buffer consumer interface The change looks huge but is actually rather simple. To understand the interface change, let me first explain what we want: - be able to fetch watermarks for any sequencer - have streams: - each streams tracks a sequencer and has an offset state (no read multiplexing) - we can seek a stream - seeking and streaming cannot be done at the same time (that would be weird and likely leads to many bugs both in write buffer and in the user code) - ideally we don't need to create streams of all sequencers but can choose a subset Before this change we had one mutable consumer struct where you can get all streams and watermark functions (this mutable-borrows the consumer) or you can seek a single stream (this also mutable-borrows the consumer). This is a bit weird for multiple reasons: - you cannot seek a single stream without dropping all of them - the mutable-borrow construct makes it really difficult to pass the streams into separate threads - the consumer is boxed (because its mutable) which makes it more difficult to handle in a large-scale application What this change does is the following: - you have an immutable consumer (similar to the producer) - the consumer offers the following methods: - get the set of sequencer IDs - get watermark for any sequencer - get a stream handler (see next point) for any sequencer - the stream handler captures the stream state (offset) and provides you a standard `Stream<_>` interface as well as a seek function. Mutable-borrows ensure that you cannot use both at the same time. The stream handler provides you the stream via `handler.stream()`. It doesn't implement `Stream<_>` itself because the way boxing, dynamic dispatch work, and pinning interact (i.e. I couldn't get it to work without the indirection). As a bonus point (which we don't use however) you can now create multiple streams for the same sequencer and they all have their own offset. * fix: review comments Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-07 12:24:17 +00:00
Paul Dix	ce46bbaada	feat: wire up the write buffer to the ingester process (#3533 ) This adds the scaffolding for the ingester server to consume data from Kafka. This ingests data in an in memory structure while creating records in the catalog for any partitions that don't yet exist. I've removed catalog_update.rs in ingester for now. That was mostly a placeholder and will be going in a combination of handler.rs and data.rs on my next PR which will have some primitive lifecycle wired up. There's one ugly bit here where the DML write is cloned because it's getting borrowed to output spans and metrics. I'll need to follow up with a refactor to make it so that the DML write's tables can be consumed without it gumming up the metrics stuff. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-02-03 11:47:28 +00:00
kodiakhq[bot]	8bef2c105c	Merge branch 'main' into cn/persist	2022-01-31 18:50:45 +00:00
Andrew Lamb	7b96a37165	chore: Update datafusion (#3586 ) * chore: update DataFusion to f849968057ddddccc9aa19915ef3ea56bf14d80d * fix: reduce overhead of creating physical expressions * chore: use MemTrackingMetrics Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-31 18:15:28 +00:00
Carol (Nichols \|\| Goulding)	4006dc14b3	fix: Correct typo in function name	2022-01-31 10:48:30 -05:00
Carol (Nichols \|\| Goulding)	749989a937	refactor: Simplify type, eliminating empty vec creation If there aren't any record batches, there isn't any metadata, and vice versa. Make this relationship clearer by putting the Option around both the vec of recordbatches and the metadata.	2022-01-31 10:48:30 -05:00
Carol (Nichols \|\| Goulding)	093d5acfd4	fix: Unify temporary multiple definitions of IoxMetadata	2022-01-31 10:48:29 -05:00
Carol (Nichols \|\| Goulding)	8f81ce5501	refactor: Share parquet_file::storage code between new and old metadata	2022-01-31 10:36:33 -05:00
Carol (Nichols \|\| Goulding)	bf89162fa5	refactor: Move IoxMetadata to parquet_file	2022-01-31 10:36:33 -05:00
Carol (Nichols \|\| Goulding)	dd9620da0c	feat: Create a new proto definition for the new design's IoxMetadata	2022-01-31 10:36:32 -05:00
Carol (Nichols \|\| Goulding)	81647f253c	feat: Use IoxMetadata and a list of RecordBatches	2022-01-31 10:36:32 -05:00
Carol (Nichols \|\| Goulding)	fef968f75c	fix: Remove catalog insertion; will be handled elsewhere	2022-01-31 10:36:32 -05:00
Carol (Nichols \|\| Goulding)	8b47ad6885	test: Add more tests	2022-01-31 10:36:32 -05:00
Carol (Nichols \|\| Goulding)	d413157b99	feat: Extract a fn for creating the parquet file paths and test it	2022-01-31 10:36:32 -05:00
Carol (Nichols \|\| Goulding)	5e0e0d8aa7	feat: Write parquet to object storage in a similar way as parquet_file::Storage	2022-01-31 10:36:32 -05:00
Carol (Nichols \|\| Goulding)	ea18c71e6d	feat: Create an object store path for a new parquet file	2022-01-31 10:36:32 -05:00
Carol (Nichols \|\| Goulding)	c633c9bc5c	feat: Wire object store into ingester persistence	2022-01-31 10:36:30 -05:00
Nga Tran	ac247e4de5	feat: update catalog after persistence (#3581 ) * feat: update catalog after persistence * test: add a negative test for the update catalog * chore: add IDs into the messages Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * chore: Apply suggestions from code review Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> * refactor: address review comments Co-authored-by: Carol (Nichols \|\| Goulding) <193874+carols10cents@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-31 15:23:16 +00:00
Nga Tran	8735ede74f	feat: IoxMetadata for parquet file (#3547 ) * feat: IoxMetadata for parquet file * fix: typos * refactor: address review comments Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-28 14:41:59 +00:00
Nga Tran	fb33a88dc8	test: Delete application during Ingester's compaction (#3542 ) * test: Delete application during Ingester's compaction * fix: typos Co-authored-by: Andrew Lamb <alamb@influxdata.com> * chore: remove comments Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: Andrew Lamb <alamb@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-01-27 16:53:17 +00:00
Andrew Lamb	5488c257d1	chore: Update datafusion, upgrade to arrow/parqet/arrow-flight 8.0.0 (#3517 ) * chore: Update datafusion * chore: update to arrow 8 * fix: update to use new DataFusion APIs * fix: update case for sortedness * fix: cargo hakari	2022-01-27 13:33:27 +00:00
Carol (Nichols \|\| Goulding)	bc44d33108	feat: Implement a snapshot method on DataBuffer (#3518 ) * feat: Implement a snapshot method on DataBuffer Fixes #3510. * test: Add a test snapshotting batches with different but compatible schemas * fix: Simplify min/max sequencer number collection The first batch should always have the min sequencer number. The last batch should always have the max sequencer number. The min should always be less than (or equal to, in case there's only one batch) the max.	2022-01-26 15:22:51 +00:00
Nga Tran	52866fe6a9	fix: merge record batches into one batch (#3535 ) * fix: merge record batches into one batch refactor: address review comments * chore: update test output	2022-01-25 23:29:16 +00:00
Nga Tran	d559561fd7	refactor: have the deduplicate work without chunk statistics (#3519 ) * refactor: have the deduplicate work without chunk statistics * test: more tests for duplicates data on different combinations of record batches * refactor: address review comments	2022-01-25 17:00:25 +00:00
NGA-TRAN	c6a195b0e6	refactor: address review comments	2022-01-24 13:05:44 -05:00
NGA-TRAN	797ba459b9	chore: merge main to branch	2022-01-24 12:06:23 -05:00
NGA-TRAN	939ea536d4	feat: add but ignore a few compaction tests	2022-01-24 12:00:23 -05:00
NGA-TRAN	ee0a468b4d	feat: a few tests for compaction	2022-01-21 18:15:23 -05:00
Paul Dix	bb893510a0	feat: Add scaffolding for ingester server * Adds a new ingester command to start an ingester server * Moves previous ingester server over to handler * Skeleton for gRPC and HTTP handlers	2022-01-21 18:02:19 -05:00
NGA-TRAN	fa41067e3d	refactor: for paul	2022-01-21 16:50:49 -05:00
NGA-TRAN	cd01b141f3	refactor: for paul	2022-01-21 16:49:02 -05:00
Paul Dix	bfa54033bd	refactor: Clean up the Catalog API This updates the catalog API to make it easier to work with for consumers. I also found a bug in the MemCatalog implementation while refactoring the tests to work with the new API definition. Consumers will now be able to Arc wrap the catalog and use it across awaits.	2022-01-21 16:01:13 -05:00
NGA-TRAN	191adc9fc7	feat: initial implementation for ingester's compaction	2022-01-20 18:22:41 -05:00
NGA-TRAN	029f4bb41e	fix: comment	2022-01-19 18:11:00 -05:00
NGA-TRAN	dcf952bb27	chore: merge main to branch	2022-01-19 17:59:05 -05:00
NGA-TRAN	4ede10b3a0	refactor: add new fields and comments in ingest data buffer	2022-01-19 17:53:58 -05:00
Paul Dix	860e5a30ca	refactor: update ingester to get sequencer record and not attempt to create	2022-01-19 17:15:10 -05:00
NGA-TRAN	be3e523312	fix: use PersistingBatch	2022-01-19 13:25:03 -05:00
NGA-TRAN	9977f174b7	refactor: use wrapper ID	2022-01-19 12:51:04 -05:00
NGA-TRAN	edb97f51cf	refactor: add persisting struct	2022-01-19 12:36:18 -05:00
NGA-TRAN	8a17e1c132	refactor: address review comments	2022-01-19 11:20:20 -05:00
NGA-TRAN	b89c250ccc	refactor: use RepoColection instead of MemCatalog	2022-01-18 21:39:22 -05:00
NGA-TRAN	b57f027e35	refactor: address review comments	2022-01-18 20:57:13 -05:00
NGA-TRAN	1c970a2064	fix: format	2022-01-18 18:01:47 -05:00
NGA-TRAN	667ec5bfc5	fix: the code is now compile without warnings	2022-01-18 18:01:06 -05:00
NGA-TRAN	b20d1757d0	feat: initialize ingester data	2022-01-18 17:43:03 -05:00
NGA-TRAN	125285ae9a	feat: commit in order to pull and merge new commit from main	2022-01-18 16:11:25 -05:00
NGA-TRAN	23290fd2ff	fix: new data structures suggested by reviewers	2022-01-18 14:04:07 -05:00
NGA-TRAN	ef336b4659	feat: add ingester crate and a few basic data structures for its data lifecycle	2022-01-17 15:38:03 -05:00

1 2 3 4

177 Commits (fabfbada6012bd5ddf6a61d301aa1d9f92f8a457)