influxdb

Commit Graph

Author	SHA1	Message	Date
Stuart Carnie	377e108152	chore: No need to be exported from current module	2023-05-26 12:12:10 +10:00
kodiakhq[bot]	928731767e	Merge pull request #7826 from influxdata/cn/table-create-grpc-api feat: table creation gRPC API with optional custom table template	2023-05-25 18:59:00 +00:00
kodiakhq[bot]	c4eca5fecf	Merge branch 'main' into cn/table-create-grpc-api	2023-05-25 18:53:12 +00:00
Carol (Nichols \|\| Goulding)	27e700f54c	docs: Flag race condition possibility as a known issue	2023-05-25 14:15:18 -04:00
Carol (Nichols \|\| Goulding)	c2e19b3826	docs: Mention tag column creation in the table creation service description Co-authored-by: Dom <dom@itsallbroken.com>	2023-05-25 14:02:37 -04:00
Andrew Lamb	d68a399a7b	fix: fix span name (#7868 ) Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-25 17:40:43 +00:00
Andrew Lamb	138b14e0db	chore: Update DataFusion and arrow to `40.0.0` (#7864 ) * chore: Update DataFusion and arrow to `40.0.0` * chore: Run cargo hakari tasks * fix: update for API --------- Co-authored-by: CircleCI[bot] <circleci@influxdata.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-25 17:34:51 +00:00
Carol (Nichols \|\| Goulding)	5d457212d4	feat: Log the table's partition template on successful creation	2023-05-25 13:19:33 -04:00
Carol (Nichols \|\| Goulding)	7662b12dd9	fix: Return 'invalid argument' error if a column already exists and it's not a tag No test because I think this is only possible with a race condition.	2023-05-25 13:19:33 -04:00
Carol (Nichols \|\| Goulding)	c3117e7eb8	fix: Return 'already exists' errors from namespace and table gRPC APIs When appropriate, rather than internal errors.	2023-05-25 13:19:33 -04:00
Carol (Nichols \|\| Goulding)	de243ad823	test: Verify default template usage	2023-05-25 10:55:51 -04:00
Carol (Nichols \|\| Goulding)	fe07e34714	test: Add router tests that set templates and verify writes	2023-05-25 10:44:57 -04:00
Carol (Nichols \|\| Goulding)	17219d71fe	feat: Use the table service in the router	2023-05-25 10:44:57 -04:00
Carol (Nichols \|\| Goulding)	e1a93252c5	feat: Add a new table service crate	2023-05-25 10:44:57 -04:00
Carol (Nichols \|\| Goulding)	32195748a3	feat: Add proto definitions for a table create gRPC API	2023-05-25 10:44:57 -04:00
Andrew Lamb	cdd519424d	feat(cli): Automatically send influx-trace-id, and improve help text (#7830 ) * feat(cli): Automatically send influx-trace-id, and improve tracing CLI help * fix: remove random ra * fix: clippy * fix: Update influxdb_iox/src/main.rs Co-authored-by: Chunchun Ye <14298407+appletreeisyellow@users.noreply.github.com> * refactor: Use Vec<String> * fix: clippy --------- Co-authored-by: Chunchun Ye <14298407+appletreeisyellow@users.noreply.github.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-25 09:43:55 +00:00
dependabot[bot]	89d8207784	chore(deps): Bump io-lifetimes from 1.0.10 to 1.0.11 (#7865 ) Bumps [io-lifetimes](https://github.com/sunfishcode/io-lifetimes) from 1.0.10 to 1.0.11. - [Commits](https://github.com/sunfishcode/io-lifetimes/compare/v1.0.10...v1.0.11) --- updated-dependencies: - dependency-name: io-lifetimes dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-05-25 07:49:53 +00:00
Dom	deb0c52fed	Merge pull request #7862 from influxdata/dom/remove-catalog-metric-accessor refactor(catalog): mark metrics() as test only	2023-05-24 16:48:47 +01:00
Dom Dwyer	2094b45c10	refactor(catalog): mark metrics() as test only This method is used to enable tests - it's never intended to be used in production code to access the underlying metric registry. The Catalog trait is responsible for Catalog things, not acting as a dependency injection for metrics. The only current use of this is in test code, so no changes needed.	2023-05-24 17:38:10 +02:00
kodiakhq[bot]	2eebf2a0a7	Merge pull request #7790 from influxdata/cn/store feat: Set, store, and use custom namespace and table partitions on write	2023-05-24 14:43:10 +00:00
Carol (Nichols \|\| Goulding)	d91b75526f	fix: Clarify that the expect is on the Option, not the Result	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	e67e336a88	docs: Explain why the partition template types are implemented the way they are	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	efc817c2a8	fix: Remove From impl, leaving TablePartitionTemplateOverride::new as only creation mechanism This makes it clearer that you do or do not have a custom table override (in the first argument to `new`).	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	46f7e3e48a	fix: Handle potential for data race in catalog table insertion by re-fetching if detected	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	90cb4b6ed9	refactor: Extract a function for handling a table missing from the namespace cache	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	73b09d895f	feat: Store and handle NULL partition_template database values Treat them as the default partition template in the application, but save space and avoid having to backfill the tables by having the database values be NULL when no custom template has been specified.	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	c8712bbc90	fix: Add a fixture test encoding and documenting default partition template assumptions	2023-05-24 10:36:52 -04:00
Carol (Nichols \|\| Goulding)	fb53faaa2f	refactor: Only use Partitioner::default and derive it	2023-05-24 10:34:31 -04:00
Carol (Nichols \|\| Goulding)	aab0acc16a	fix: Panic if attempting to partition on a non-tag column	2023-05-24 10:34:31 -04:00
Carol (Nichols \|\| Goulding)	42804a20bc	fix: Switch to using Sqlite when encoding so there's no extra 1 in the JSON	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	d713ba935a	refactor: Reduce duplication of encode/decode implementations This is much less gobbledygook.	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	c479ed184d	refactor: Rearrange definitions in the partition_template module Move the application types to the top, which puts all the sqlx conversion gobbledygook at the end because it's an internal implementation detail I'm about to refactor Git probably isn't going to display this in a super obvious way, but this commit is only moving code around, not changing any of it	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	a22d809cdf	test: Create an overridden namespace, and create a table from it (no override), read it back and assert the expected partitioning scheme is derived	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	2ab3ea03b8	test: Create a default (not overridden) namespace, read it back, assert the expected partitioning scheme is derived	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	9c0faa66f0	feat: Set a table partition template explicitly or from the namespace And use the table partition template when partitioning writes to that table.	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	604bab9508	fix: Make Table create_or_get be only create	2023-05-24 10:34:30 -04:00
Carol (Nichols \|\| Goulding)	afb3838437	feat: Optionally supply the namespace partition template when creating a namespace	2023-05-24 10:10:34 -04:00
Carol (Nichols \|\| Goulding)	47157015d9	feat: Add columns to store the partition templates	2023-05-24 10:10:34 -04:00
Carol (Nichols \|\| Goulding)	6f92bccc99	feat: Use protobuf for PartitionTemplate in CreateNamespace gRPC API The service implementation doesn't use this field yet.	2023-05-24 10:10:34 -04:00
Marco Neumann	29dccdc61a	Merge pull request #7859 from influxdata/crepererum/clean_up_parquet_indices refactor: remove ununused `parquet_file` indices	2023-05-24 13:51:28 +02:00
Marco Neumann	b71564f455	refactor: remove ununused `parquet_file` indices Remove unused Postgres indices. This lower database load but also gives us room to install actually useful indices (see #7842). To detect which indices are used, I've used the following query (on the actual write/master replicate in eu-central-1): ```sql SELECT n.nspname AS namespace_name, t.relname AS table_name, pg_size_pretty(pg_relation_size(t.oid)) AS table_size, t.reltuples::bigint AS num_rows, psai.indexrelname AS index_name, pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size, CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END AS "unique", psai.idx_scan AS number_of_scans, psai.idx_tup_read AS tuples_read, psai.idx_tup_fetch AS tuples_fetched FROM pg_index i INNER JOIN pg_class t ON t.oid = i.indrelid INNER JOIN pg_namespace n ON n.oid = t.relnamespace INNER JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid WHERE n.nspname = 'iox_catalog' AND t.relname = 'parquet_file' ORDER BY 1, 2, 5; ``` At `2023-05-23T16:00:00Z`: ```text namespace_name \| table_name \| table_size \| num_rows \| index_name \| index_size \| unique \| number_of_scans \| tuples_read \| tuples_fetched ----------------+--------------+------------+-----------+--------------------------------------------------+------------+--------+-----------------+----------------+---------------- iox_catalog \| parquet_file \| 31 GB \| 120985000 \| parquet_file_deleted_at_idx \| 5398 MB \| N \| 1693383413 \| 21036174283392 \| 21336337964 iox_catalog \| parquet_file \| 31 GB \| 120985000 \| parquet_file_partition_created_idx \| 11 GB \| N \| 34190874 \| 4749070532 \| 61934212 iox_catalog \| parquet_file \| 31 GB \| 120985000 \| parquet_file_partition_idx \| 2032 MB \| N \| 1612961601 \| 9935669905489 \| 8611676799872 iox_catalog \| parquet_file \| 31 GB \| 120985000 \| parquet_file_pkey \| 7135 MB \| Y \| 453927041 \| 454181262 \| 453894565 iox_catalog \| parquet_file \| 31 GB \| 120985000 \| parquet_file_shard_compaction_delete_created_idx \| 14 GB \| N \| 0 \| 0 \| 0 iox_catalog \| parquet_file \| 31 GB \| 120985000 \| parquet_file_shard_compaction_delete_idx \| 8767 MB \| N \| 2 \| 30717 \| 4860 iox_catalog \| parquet_file \| 31 GB \| 120985000 \| parquet_file_table_idx \| 1602 MB \| N \| 9136844 \| 341839537275 \| 27551 iox_catalog \| parquet_file \| 31 GB \| 120985000 \| parquet_location_unique \| 4989 MB \| Y \| 332341872 \| 3123 \| 3123 ``` At `2023-05-24T09:50:00Z` (i.e. nearly 18h later): ```text namespace_name \| table_name \| table_size \| num_rows \| index_name \| index_size \| unique \| number_of_scans \| tuples_read \| tuples_fetched ----------------+--------------+------------+-----------+--------------------------------------------------+------------+--------+-----------------+----------------+---------------- iox_catalog \| parquet_file \| 31 GB \| 123869328 \| parquet_file_deleted_at_idx \| 5448 MB \| N \| 1693485804 \| 21409285169862 \| 21364369704 iox_catalog \| parquet_file \| 31 GB \| 123869328 \| parquet_file_partition_created_idx \| 11 GB \| N \| 34190874 \| 4749070532 \| 61934212 iox_catalog \| parquet_file \| 31 GB \| 123869328 \| parquet_file_partition_idx \| 2044 MB \| N \| 1615214409 \| 10159380553599 \| 8811036969123 iox_catalog \| parquet_file \| 31 GB \| 123869328 \| parquet_file_pkey \| 7189 MB \| Y \| 455128165 \| 455382386 \| 455095624 iox_catalog \| parquet_file \| 31 GB \| 123869328 \| parquet_file_shard_compaction_delete_created_idx \| 14 GB \| N \| 0 \| 0 \| 0 iox_catalog \| parquet_file \| 31 GB \| 123869328 \| parquet_file_shard_compaction_delete_idx \| 8849 MB \| N \| 2 \| 30717 \| 4860 iox_catalog \| parquet_file \| 31 GB \| 123869328 \| parquet_file_table_idx \| 1618 MB \| N \| 9239071 \| 348304417343 \| 27551 iox_catalog \| parquet_file \| 31 GB \| 123869328 \| parquet_location_unique \| 5043 MB \| Y \| 343484617 \| 3123 \| 3123 ``` The cluster currently is under load and all components are running. Conclusion: - `parquet_file_deleted_at_idx`: Used, likely by the GC. We could probably shrink this index by binning `deleted_at` (within the index, not within the actual database table), but let's do this in a later PR. - `parquet_file_partition_created_idx`: Unused and huge (`created_at` is NOT binned). So let's remove it. - `parquet_file_partition_idx`: Used, likely by the compactor and querier because we currently don't have a better index (see #7842 as well). This includes deleted files as well which is somewhat pointless. May become obsolete after #7842, not touching for now. - `parquet_file_pkey`: Primary key. We should probably use the object store UUID as a primary key BTW, which would also make the GC faster. Not touching for now. - `parquet_file_shard_compaction_delete_created_idx`: Huge unused index. Shards don't exist anymore. Delete it. - `parquet_file_shard_compaction_delete_idx`: Same as `parquet_file_shard_compaction_delete_created_idx`. - `parquet_file_table_idx`: Used but is somewhat too large because it contains deleted files. Might become obsolete after #7842, don't touch for now. - `parquet_location_unique`: See note `parquet_file_pkey`, it's pointless to have two IDs here. Not touching for now but this is a potential future improvement. So we remove: - `parquet_file_partition_created_idx` - `parquet_file_shard_compaction_delete_created_idx` - `parquet_file_shard_compaction_delete_idx`	2023-05-24 12:10:22 +02:00
Marco Neumann	bc18c6dc5f	refactor: re-land #7815 . (#7852 ) * refactor: consolidate pruning code Let's have a single chunk pruning implementation in our code, not two. Also removes a bit of crust from `QueryChunk` since it is technically no longer responsible for pruning (this part has been pushed into the querier for early pruning and bits for the `iox_query_influxrpc` for some RPC shenanigans). * test: regression test for incident * fix: chunk pruning * docs: add some test notes	2023-05-24 09:46:49 +00:00
dependabot[bot]	24a4f36d24	chore(deps): Bump proptest from 1.1.0 to 1.2.0 (#7857 ) Bumps [proptest](https://github.com/proptest-rs/proptest) from 1.1.0 to 1.2.0. - [Release notes](https://github.com/proptest-rs/proptest/releases) - [Changelog](https://github.com/proptest-rs/proptest/blob/master/CHANGELOG.md) - [Commits](https://github.com/proptest-rs/proptest/compare/v1.1.0...v1.2.0) --- updated-dependencies: - dependency-name: proptest dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Dom <dom@itsallbroken.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-24 09:21:32 +00:00
Marco Neumann	103e814f22	refactor: clean up catalog `parquet_files` interface (#7853 ) * feat: `ParquetFileRepo::list_all` * refactor: remove `ParquetFileRepo::list_by_table` * refactor: simlify `ParquetFileRepo::list_by_table` * refactor: remove `ParquetFileRepo::count` * refactor: remove `ParquetFileRepo::update_compaction_level` * refactor: remove `ParquetFileRepo::exists` * fix: test --------- Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2023-05-24 09:15:03 +00:00
dependabot[bot]	b7fbfa6fb2	chore(deps): Bump criterion from 0.4.0 to 0.5.0 (#7856 ) Bumps [criterion](https://github.com/bheisler/criterion.rs) from 0.4.0 to 0.5.0. - [Changelog](https://github.com/bheisler/criterion.rs/blob/master/CHANGELOG.md) - [Commits](https://github.com/bheisler/criterion.rs/compare/0.4.0...0.5.0) --- updated-dependencies: - dependency-name: criterion dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-05-24 09:08:37 +00:00
Marco Neumann	6729b5681a	fix(ingester): re-transmit schema over flight if it changes (#7812 ) * fix(ingester): re-transmit schema over flight if it changes Fixes https://github.com/influxdata/idpe/issues/17408 . So a `[Sendable]RecordBatchStream` contains `RecordBatch`es of the SAME schema. When the ingester crafts a response for a specific partition, this is also almost always the case however when there's a persist job running (I think) it may have multiple snapshots for a partition. These snapshots may have different schemas (since the ingester only creates columns if the contain any data). Now the current implementation munches all these snapshots into a single stream, and hands them over to arrow flight which has a high-perf encode routine (i.e. it does not re-check every single schema) so it sends the schema once and then sends the data for every batch (the data only, schema data is NOT repeated). On the receiver side (= querier) we decode that data and get confused why on earth some batches have a different column count compared to the schema. For the OG ingester I carefully crafted the response to ensure that we do not run into this problem, but apparently a number of rewrites and refactors broke that. So here is the fix: - remove the stream that isn't really as stream (and cannot error) - for each partition go over the `RecordBatch`es and chunk them according to the schema (because this check is likely cheaper than re-transmitting the schema for every `RecordBatch`) - adjust a bunch of testing code to cope with this * refactor: nicify code * test: adjust test	2023-05-23 14:27:11 +00:00
kodiakhq[bot]	43078576b8	Merge pull request #7839 from influxdata/dom/cleanup-deps build: remove unused dependencies from loads of crates	2023-05-23 13:01:47 +00:00
Dom Dwyer	94203287f0	test: fix line number test This test failed because it references line numbers that changed.	2023-05-23 14:55:44 +02:00
Dom Dwyer	e61fb3a78c	test: remove line numbers from asserts I don't think the tests are that specific that they need to assert the line.	2023-05-23 14:55:43 +02:00
Dom Dwyer	928a4d163e	build: remove unused dependencies from crates This commit fixes loads of crates (47!) had unused dependencies, or mis-configured dependencies (test deps as normal deps). I added the "unused_crate_dependencies" to all crates to help prevent this mess from growing again! https://doc.rust-lang.org/beta/nightly-rustc/rustc_lint_defs/builtin/static.UNUSED_CRATE_DEPENDENCIES.html This has the minor downside of false-positives when specifying dev-dependencies for test/bench binaries - these are files in /test or /benches (not normal tests). This commit includes a workaround, importing them in lib.rs (gated by a feature flag). I think the trade-off of better dependency management is worth it!	2023-05-23 14:55:43 +02:00

1 2 3 4 5 ...

12423 Commits (377e10815230c604f84020696cc05e8bf890af9a) All Branches Search

12423 Commits (377e10815230c604f84020696cc05e8bf890af9a)

All Branches