influxdb

Commit Graph

Author	SHA1	Message	Date
Marko Mikulicic	5a0af921c8	chore: Roll forward: Sync ReadWindowAggregate API: TagKeyMetaNames (#5186 ) This reverts commit 5d02c755687ef041f5f45dbfc3e633a833284edb.	2022-07-22 10:44:06 +00:00
Marko Mikulicic	07cdb99192	chore: Revert "Sync ReadWindowAggregate API: TagKeyMetaNames" (#5184 ) We're noticing a possible regression (OOMs) in our testing cluster that roughly correlates with this.	2022-07-22 09:26:42 +00:00
Nga Tran	69cb3f2b19	refactor: remove min_sequence_number from Compactor and Querier, add `count_by_overlaps_with_level_0` and `count_by_overlaps_with_level_1` to catalog (#5151 ) * refactor: remove min_sequnce_number * fix: typos * fix: remove min_sequencer_number from new files from merging main * fix: add back throwing error if the compactor compacts files persisted by the ingester after the ingester sends max seq_num back to querier * test: add test_compactor_collision back but modify the input to make it work woth new changes Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-21 13:51:54 +00:00
Marko Mikulicic	21d033eafd	fix: Sync ReadWindowAggregate API: TagKeyMetaNames The storage API has been updated in https://github.com/influxdata/idpe/pull/12868 in January, but since we forked the `.proto` files we never noticed.	2022-07-21 15:07:04 +02:00
dependabot[bot]	278a7f91af	chore(deps): Bump bytes from 1.1.0 to 1.2.0 (#5156 ) Bumps [bytes](https://github.com/tokio-rs/bytes) from 1.1.0 to 1.2.0. - [Release notes](https://github.com/tokio-rs/bytes/releases) - [Changelog](https://github.com/tokio-rs/bytes/blob/master/CHANGELOG.md) - [Commits](https://github.com/tokio-rs/bytes/compare/v1.1.0...v1.2.0) --- updated-dependencies: - dependency-name: bytes dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-20 10:00:08 +00:00
Marco Neumann	1993448abf	refactor: remove `Predicat::partition_key` (#5016 ) There is no way a user can filter for partition keys (neither via InfluxRPC nor via SQL) and the query engine doesn't use this field at all. So let's remove it. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-07-01 17:17:29 +00:00
Marco Neumann	be53716e4d	refactor: use IDs for `parquet_file.column_set` (#4965 ) * feat: `ColumnRepo::list_by_table_id` * refactor: use IDs for `parquet_file.column_set` Closes #4959. * refactor: introduce `TableSchema::column_id_map`	2022-06-30 15:08:41 +00:00
Dom Dwyer	75c425f375	refactor(schema-api): column data type enum Previously the column data type was exposed using an internal i32 value. This commit changes the Schema API to use a self-descriptive proto enum for the column data type.	2022-06-27 16:14:49 +01:00
Marco Neumann	c3912e34e9	refactor: store per-file column set in catalog (#4908 ) * refactor: store per-file column set in catalog Together with the table-wide schema and the partition-wide sort key, this should be everything we need to read a parquet file directly into memory without peeking any file-level metadata. The querier will use this to directly load parquet files into the read buffer. WARNING: This requires a catalog wipe! Ref #4124. * refactor: use proper `ColumnSet` type	2022-06-21 10:26:12 +00:00
Dom Dwyer	c1f7154031	feat: propagate partition key through kafka Changes the kafka message wire format to include the partition key for serialised DML writes on the wire. After this commit, the kafka messages will contain the partition key for each op, but this information will go unused in the ingester - this enables us to roll out the producer side, before making the value's presence necessary on the consumer side. A follow-up PR will change the ingester to utilise this embedded partition key. This has the unfortunate side effect of making the partition key part of the public gRPC write API: https://github.com/influxdata/influxdb_iox/issues/4866	2022-06-20 13:42:51 +01:00
Marco Neumann	66c7d95312	refactor: use new ingester<>querier wire protocol (#4867 ) * refactor: use new ingester<>querier wire protocol Use and document the new and more flexible ingester<>querier wire protocol. Note that the ingester does NOT stream the response data yet, but the internal data structures would allow that. A follow-up change will adjust the ingester code to stream the data. Ref #4849. * fix: typos Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: clarify naming and public interface * test: add schema assertion to `ingester_response_to_record_batches` Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-06-16 08:02:28 +00:00
Marco Neumann	7c60edd38c	refactor: prepare new ingester<>querier protocol on the querier side (#4863 ) * refactor: prepare new ingester<>querier protocol on the querier side This changes the querier internals to work with the new protocol. The wire protocol stays the same (for now). There's a (somewhat hackish) adapter in place on the querier side that converts the old to the new protocol on-the-fly. This is an intermediate step before we actually change the wire protocol (and in a step after that also take advantage of the new possibilites on the ingester side). Ref #4849. * docs: explain adapter	2022-06-15 14:32:24 +00:00
Nga Tran	13c57d524a	feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801 ) * feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string * test: add column with comma * fix: use new protonuf field to avoid incompactible * fix: ensure sort_key is an empty array rather than NULL * refactor: address review comments * refactor: address more comments * chore: clearer comments * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * fix: Rename migration so it will be applied after Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2022-06-10 13:31:31 +00:00
Andrew Lamb	2ec7764fdd	refactor: rename builder like predicate methods to be `with_` (#4808 ) * refactor: rename builder like predicate methods to be `with_` * fix: merge conflict Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-09 11:26:03 +00:00
Andrew Lamb	afc1c12062	refactor: consolidate `PredicateBuilder` into `Predicate` (#4799 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-08 12:21:24 +00:00
Andrew Lamb	dde3c3922c	refactor: use consistent spelling of serialize (#4717 )	2022-05-27 14:42:59 +00:00
Dom	9cd1286051	Merge branch 'main' into dom/meta-remove-row-count	2022-05-23 16:39:38 +01:00
Marco Neumann	2029bd16ba	feat: enable debugging of failed querier->ingester requests (#4659 ) * feat: enable debugging of failed querier->ingester requests - extend `query-ingester` CLI to allow usage of predicates - on failed requests: log all information that required for the CLI - test the "ingester fails" scenario * test: explain Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: move b64 pred. serde into a single crate Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-05-23 15:37:31 +00:00
Dom Dwyer	2e6c49be83	refactor: remove IoxMetadata min & max timestamp Removes the min/max timestamp fields from the IoxMetadata proto structure embedded within a Parquet file's metadata. These values are redundant as they already exist within the Parquet column statistics, and precluded streaming serialisation as these removed min/max values were needed before serialising the file.	2022-05-23 16:27:08 +01:00
Dom Dwyer	a142a9eb57	refactor: remove row_count from IoxMetadata Remove the redundant row_count from the IoxMetadata structure that is serialised into the Parquet file. The reasoning is twofold: * The Parquet file's native metadata already contains a row count * Needing to know the number of rows up-front precludes streaming	2022-05-23 16:18:35 +01:00
Carol (Nichols \|\| Goulding)	2ee4a6669a	refactor: Move the code merging write infos to generated_types to share	2022-05-11 14:07:42 -04:00
Carol (Nichols \|\| Goulding)	26170b7a07	refactor: Move gRPC conversion code to generated_types to share	2022-05-11 14:07:12 -04:00
Andrew Lamb	84fd883688	feat: Add query_ingester CLI command (#4554 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-10 18:18:07 +00:00
Jake Goulding	e07bcd40c2	refactor: Remove unused dependencies These were found by iterating over all of the dependencies of each Cargo.toml, then grepping that crate for the dependency's name. If it didn't show up, I attempted to remove it. I left a few dependencies that this process flagged: * generated_types - `pbjson`,`serde`. Apparently used by the generated code. * grpc-router-test-gen - `prost`. Apparently used by the generated code. * influxdb_iox - `heappy`. Doesn't appear used, but is behind enough feature flags that I don't care to reason about and it's already optional. - `tikv_jemalloc_sys`. Appears to be setting a feature flag of an indirect dependency. * iox_gitops_adapter - `k8s_openapi`. Appears to be setting a feature flag of an indirect dependency.	2022-05-06 15:57:58 -04:00
Carol (Nichols \|\| Goulding)	6681298a93	fix: Remove unused dependencies found with cargo-udeps	2022-05-06 14:51:54 -04:00
Carol (Nichols \|\| Goulding)	068096e7e1	fix: Rename data_types2 to data_types	2022-05-06 14:45:39 -04:00
Carol (Nichols \|\| Goulding)	fb8f8d22c0	fix: Remove now-unused ServerId. Fixes #4451	2022-05-06 14:45:38 -04:00
Carol (Nichols \|\| Goulding)	485d6edb8f	refactor: Move IngesterQueryRequest to generated_types	2022-05-06 14:45:37 -04:00
Carol (Nichols \|\| Goulding)	e9a42c418a	fix: Only use data_types2 in generated_types	2022-05-06 14:45:36 -04:00
Carol (Nichols \|\| Goulding)	91961273c2	fix: Remove unused Rust code in generated_types	2022-05-06 11:50:03 -04:00
Carol (Nichols \|\| Goulding)	e6e0655b31	fix: Remove OG proto definitions Fixes #4475.	2022-05-06 11:50:03 -04:00
Carol (Nichols \|\| Goulding)	b422eac064	fix: Add go_package to protos (#4505 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-05-04 15:20:09 +00:00
Andrew Lamb	6381ea60bb	chore: port remaining read_filter influxrpc tests to NG (#4383 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-29 14:06:50 +00:00
Paul Dix	8e48fcd620	feat: add remote pull partition (#4433 ) Add lookup of partitions by table id to catalog. Add API to catalog to return partitions by table id. Add to client to return partitions by table id. Add CLI to pull remote schema, partition, and parquet files into a local catalog and object store.	2022-04-28 21:04:27 +00:00
Andrew Lamb	e13d3433ae	feat: Use datafusion serialization code rather than our own copy of it (#4421 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-28 13:03:34 +00:00
Andrew Lamb	115f007317	refactor: Use DataFusion `Expr` instead of our own custom wrapper for `ValueExpr` (#4440 ) * refactor: Use DataFusion `Expr` instead of custom wrapper for BinaryExprs * fix: apply code review suggestions * fix: more code review suggestions	2022-04-27 19:20:15 +00:00
二手掉包工程师	4b47d723b1	refactor: Rename time to iox_time (#4416 ) Signed-off-by: hi-rustin <rustin.liu@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-26 00:19:59 +00:00
Marco Neumann	86e8f05ed1	fix: make all catalog IDs 64bit (#4418 ) Closes #4365. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-25 16:49:34 +00:00
Andrew Lamb	73bed810da	chore: Update arrow, arrow-flight, parquet, tonic, prost, etc (#4357 ) * chore: Update datafusion * chore: Update arrow/arrow-flight/parquet to 12 * chore: update datafusion correctly * chore: Update prost, tonic, and dependents * fix: Fixup some api changes * fix: Update test output in db * fix: Update test output in parquet_file * fix: remove old pbjson types * fix: Add "--experimental_allow_proto3_optional" flag * chore: Run cargo hakari tasks * fix: compile error * chore: Update heappy Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-20 11:12:17 +00:00
Andrew Lamb	0642ec0b82	docs: add note about write_info API being internal (#4356 ) * docs: add note about write_info API being internal * fix: update doc urls Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-20 09:25:14 +00:00
Andrew Lamb	5ea676d3f7	feat: add per kafka partition durability reporting to write info response (#4341 ) * feat: add per kafka partition durability reporting to write info response * fix: buf lint + test cleanup * fix: clean up protobuf * refactor: pull out conversion of KafkaPartitionStatus into a function * fix: fmt * fix: typo Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-19 16:46:20 +00:00
Andrew Lamb	e3d83fe757	chore: update datafusion (#4342 ) * chore: update datafusion * fix: Update imports for change in datafusion organization	2022-04-19 13:38:12 +00:00
Marco Neumann	5b48675435	fix: actually transmit record-batch metadata from querier (#4347 ) Attaching the "batch => partition" mapping via per-batch schema KV metadata does NOT work because flight will transmit the schema once for all batches (even though on the Rust side we have a schema ref attached to every batch, probably for convenience). Instead we now use the same global protobuf metadata that we also use for the "partition => max sequence number" information. This somewhat limits our ability to create record batches lazily on the ingester side (since the global metadata is sent before any actual payload) but I think we should not modify the usage of the flight protocol too much right now (e.g. by sending more schema messages). If this becomes an issue, we can always find a more complex solution in the future.	2022-04-19 10:54:23 +00:00
Paul Dix	5bf4550259	feat: add object store service to router (#4338 ) Add method to catalog to get parquet file by object store id. Add gRPC service for object store to get a file from by its uuid. Add the object store service to router2 with object store config.	2022-04-16 17:58:31 +00:00
Paul Dix	99cbb28a89	feat: add initial catalog service to router (#4316 ) Create new crate for iox_catalog_service. Add rpc to return parquet_file records by partition id. Add CatalogService to router2. The catalog service will be added to over time to provide access to the catalog over gRPC.	2022-04-14 17:39:18 +00:00
Marco Neumann	83f77712b1	refactor: querier<>ingester flight protocol adjustments (#4286 ) * refactor: querier<>ingester flight protocol adjustments This makes a few adjustments to the querier<>ingester flight protocol. Query Scope =========== The querier will request data for ALL sequencer IDs for now. There is no reason to have a request per sequencer ID. We can add a range/set filter later if we want, but this is not required for now. Partition-level =============== The only time when the querier cares about sequencer IDs (i.e. sharding) at all is when it selects which ingesters to ask for unpersisted data (this is currently not implemented, it just asks all ingesters). Afterwards the querier only cares about partitions (which are bound to specific sequencers anyways) because this is the level where parquet file persistence and compaction as well as deduplication happen. So we make partitions a first-class citizen in the ingester response. Metadata VS RecordBatches ========================= The global app-metadata will list all partitions and their max persisted parquet files and tombstones (theoretically tombstones are at table-level, but the ingester could in the future break them down to the partition-level). Then it receives a stream of record batches. Each record batch is tagged (via key-value metadata in its schema) so it can be assigned to a partition. At the moment the ingester returns 0 or 1 batches per unpersisted partition (0 in case we've filtered out all the data via the predicate), but in the future it is free to return multiple batches. This setup gives the ingester more freedom over memory management and (potentially parallel) query processing, while at the same time keeps the set of duplicated information minimal and allows easy extensions (since the global metadata is a full-blown protobuf message). Querier ======= At the moment the querier ignores all the metdata. Follow-up PRs will change that. * docs: improve Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * refactor: make code clearer Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-04-12 16:48:40 +00:00
Marco Neumann	380cd9bbff	refactor: use a single flight client implementation (#4273 ) "end-user -> querier" and "querier -> ingester" should use a single Flight client implementation. The difference is just the request and response metadata. This changes our default Flight client to use protobuf instead of JSON for the ticket format.	2022-04-12 09:08:25 +00:00
Andrew Lamb	a30a85e62c	feat: Add get_write_info service (#4227 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-07 19:24:58 +00:00
Andrew Lamb	5d66cd0a81	feat: Add WriteSummary serialization and deserialization to protobuf (#4232 ) * feat: Add WriteSummary serialization and deserialization to protobuf * fix: clippy Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-05 09:57:32 +00:00
dependabot[bot]	276449ee09	chore(deps): Bump pbjson from 0.2.3 to 0.3.0 (#4215 ) Bumps [pbjson](https://github.com/influxdata/pbjson) from 0.2.3 to 0.3.0. - [Release notes](https://github.com/influxdata/pbjson/releases) - [Commits](https://github.com/influxdata/pbjson/compare/0.2.3...0.3.0) --- updated-dependencies: - dependency-name: pbjson dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-04-04 12:05:46 +00:00

1 2 3 4 5 ...

449 Commits (34ccc9c7f57ea40eee94f17280fa27ddbbd80770)