influxdb

Commit Graph

Author	SHA1	Message	Date
Carol (Nichols \|\| Goulding)	e875a92cf8	feat: Log time spent requesting ingester partitions (#4806 ) * feat: Log time spent requesting ingester partitions Fixes #4558. * feat: Record a metric for the duration queriers wait on ingesters * fix: Use DurationHistogram instead of U64 Histogram * test: Add a test for the ingester ms metric * feat: Add back the logging to provide both logging and metrics for ingester duration * refactor: Use sample_count method on metrics * feat: Record ingester duration separately for success or failure * fix: Create a separate test for the ingester metrics Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-06-14 17:58:19 +00:00
Dom Dwyer	b41ea1d718	refactor: PartitionKey type This commit changes the code base to use a new reference-counted PartitionKey type wrapper, instead of passing a bare String around. This allows the compiler to type check & verify usage of the partition key, instead of passing a bare string around. By reference counting the underlying string, we reduce memory usage for some use cases.	2022-06-14 14:47:56 +01:00
kodiakhq[bot]	dd8d44e24f	Merge branch 'main' into cn/duration	2022-06-10 14:23:09 +00:00
Nga Tran	13c57d524a	feat: Change data type of catalog partition's sort_key from a string to an array of string (#4801 ) * feat: Change data type of catalog Postgres partition's sort_key from a string to an array of string * test: add column with comma * fix: use new protonuf field to avoid incompactible * fix: ensure sort_key is an empty array rather than NULL * refactor: address review comments * refactor: address more comments * chore: clearer comments * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * chore: Update iox_catalog/migrations/20220607102200_change_sort_key_type_to_array.sql * fix: Rename migration so it will be applied after Co-authored-by: Marko Mikulicic <mkm@influxdata.com>	2022-06-10 13:31:31 +00:00
Andrew Lamb	50697906b1	refactor: Make `DMLWrite::sequence_number` a `SequenceNumber` (#4817 )	2022-06-09 19:36:37 +00:00
Carol (Nichols \|\| Goulding)	1c7cbaf5ae	refactor: Use DurationHistogram in more places	2022-06-09 14:20:51 -04:00
Carol (Nichols \|\| Goulding)	068096e7e1	fix: Rename data_types2 to data_types	2022-05-06 14:45:39 -04:00
Andrew Lamb	7c7d3fafe9	Merge branch 'main' into dom/schema-cache-warm	2022-04-29 09:11:53 -04:00
Paul Dix	8e48fcd620	feat: add remote pull partition (#4433 ) Add lookup of partitions by table id to catalog. Add API to catalog to return partitions by table id. Add to client to return partitions by table id. Add CLI to pull remote schema, partition, and parquet files into a local catalog and object store.	2022-04-28 21:04:27 +00:00
Dom Dwyer	bb8a19b571	feat(iox_catalog): list_schemas() Adds a function to resolve an atomic snapshot of all NamespaceSchema in the catalog with minimal query overhead.	2022-04-27 17:23:28 +01:00
Dom Dwyer	874521da8a	feat(iox_catalog): ColumnRepo::list() Allow all columns in the catalog to be fetched.	2022-04-27 17:21:00 +01:00
Dom Dwyer	eb5abce99e	feat(iox_catalog): TableRepo::list() Allow all tables in the catalog to be fetched.	2022-04-27 17:20:53 +01:00
二手掉包工程师	4b47d723b1	refactor: Rename time to iox_time (#4416 ) Signed-off-by: hi-rustin <rustin.liu@gmail.com> Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-26 00:19:59 +00:00
Marco Neumann	86e8f05ed1	fix: make all catalog IDs 64bit (#4418 ) Closes #4365. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-25 16:49:34 +00:00
Dom Dwyer	320f1073e0	fix: revert column service limits (#4179 ) This reverts commit `ea865b63f4`.	2022-04-19 16:08:56 +01:00
Paul Dix	5bf4550259	feat: add object store service to router (#4338 ) Add method to catalog to get parquet file by object store id. Add gRPC service for object store to get a file from by its uuid. Add the object store service to router2 with object store config.	2022-04-16 17:58:31 +00:00
Carol (Nichols \|\| Goulding)	94dcde4996	fix: Do fewer queries for metadata By adding another _with_metadata catalog function. Also introduce a new type rather than passing around tuples everywhere.	2022-04-13 10:43:20 -04:00
Carol (Nichols \|\| Goulding)	02fee3b84f	feat: Request parquet metadata from the catalog when needed only	2022-04-13 10:43:19 -04:00
Carol (Nichols \|\| Goulding)	ec25620b73	feat: Add a catalog method for requesting a parquet file's metadata	2022-04-13 10:43:19 -04:00
Carol (Nichols \|\| Goulding)	ee56ebf0e3	feat: Store metadata in catalog, but don't fetch by default	2022-04-13 10:43:19 -04:00
Paul Dix	81d41f81a1	fix: ingester replay logic (#4212 ) Fix the ingester to track the max persisted sequence number per partition. Ensure replay takes in data from unpersisted partitions. Simplify the table persist info to not return a max persisted sequence number for the table as that information isn't needed.	2022-04-04 18:04:34 +00:00
Carol (Nichols \|\| Goulding)	cbf7888435	feat: Add Partition update_sort_key method to catalog	2022-04-01 15:45:51 -04:00
Luke Bond	ea865b63f4	fix: create_or_get_multi for column in catalog now enforces limits (#4179 ) * fix: create_or_get_multi for column in catalog now enforces limits fix: create_or_get_multi for column in catalog now enforces limits chore: reorder catalog column create fns to be next to each other test: add failing test for multi col insert w/ limits test: bend catalog mem impl to match postgres for tests fix: postgres column insert many column type error checks chore: clippy * test: assert column counts in partial column insert test * chore: add some sql comments to the monster multicolumn insert query; s/RIGHT/INNER/ join * chore: adding comments to clarify partial failure behaviour of multi col insert * test: add tests for create_or_get_many columns in catalog * test: forgot how macros work for a moment * test: service limit test handles partial update of cols Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-04-01 10:59:43 +00:00
Nga Tran	ddc2c8304f	fix: have the compaction level set correctly (#4184 ) * fix: have the compaction level set correctly, especially for compacted file from the compactor * fix: typo	2022-03-30 21:23:40 +00:00
Paul Dix	04d961e70d	feat: wire up compactor scheduler and config (#4139 ) Add configuration options for compactor for the max size of level 0 files and split percentage. Add metrics for compaction to track the number of candidates, compactions, and durations. Add functions to separate identifying partitions to compact from running compaction. Make compaction run in smaller chunks, specifically per partition. Update compaction to automatically promote level 0 files that are non-overlapping without waiting some period of time. Closes #4120 Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-30 17:45:24 +00:00
Carol (Nichols \|\| Goulding)	79447aed33	fix: Logical merge conflict, missing namespace_id in test setup	2022-03-29 08:28:51 -04:00
Carol (Nichols \|\| Goulding)	f3f792fd08	feat: Add namespace_id to the parquet_files table; object store paths need it	2022-03-29 08:15:26 -04:00
Carol (Nichols \|\| Goulding)	39a1d1b26f	feat: Delete parquet files marked to be deleted before a specified time Connects to #3954.	2022-03-29 08:13:06 -04:00
Nga Tran	80b7e9cce1	feat: delete fully processed tombstones & integration tests for find_and_compact (#4116 ) * feat: remove fully processed tombstones * test: first few tests * fix: delete SQL * fix: test how IN (...) works in PG * fix: test how IN (?) works in PG * fix: test how IN (?) works in PG * fix: dynamically add IN (?, ?, ...) * fix: dynamically add IN (?, ?, ...) & its dynamic values * fix: add argument directly in the SQL * test: more tests for catalog read and update functions * chore: move a subfunction to make it easier to read) * test: first test for find_can_compact but disabled due to bug * test: integration tests and a bug fix for find_and_compact * chore: cleanup * refactor: address review comments * fix: put 2 delete processed tombstones and tombstones in a transaction	2022-03-28 18:35:54 +00:00
Dom Dwyer	8e85846db6	refactor: lowercase error messages Lowercases the error messages in the big iox_catalog Error enum for better composition of messages (no random capitalisation in glued-together strings, which is common with wrapped errors).	2022-03-25 11:33:27 +00:00
Carol (Nichols \|\| Goulding)	67e13a7c34	fix: Change to_delete column on parquet_files to be a time (#4117 ) Set to_delete to the time the file was marked as deleted rather than true. Fixes #4059. Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-23 18:47:27 +00:00
Carol (Nichols \|\| Goulding)	2749c37d02	fix: Query for tombstones in a time range, not for a particular parquet file The compactor at this point is still querying for each file; this is an intermediate step	2022-03-23 09:52:00 -04:00
Carol (Nichols \|\| Goulding)	87dc2981f6	feat: Query for tombstones relevant to a parquet file Connects to #3948.	2022-03-23 09:52:00 -04:00
Marco Neumann	55643945a1	refactor: `querier` w/o `db` (#4063 ) * feat: `TombstoneRepo::list_by_table` * feat: `ParquetFileRepo::list_by_table_not_to_delete` * refactor: `querier` w/o `db` Get the `querier` to work w/o relying on `db`. A few notes: - Testing is kinda shallow, we really need to get `query_tests` working w/ `querier` (see #3934). - We still run a sync loop for namespaces, tables and schemas. This will be a replaced by "update namespace incl. tables and schemas on demand". Note however that we cannot fetch single tables and schemas on demand at the moment, because DataFusion doesn't implement async schema inspection (only `scan` / "give me all the chunks" is async). I think that's OK for now and we can address this later. - There is NO cache for parquet files and tombstones at the moment. For correctness, they need to be fetched in a single transaction (or we need a kinda tricky sequence number / logical clock tracking) and I am not sure yet how this makes sense when we have the ingester data wired up and predicates pushed down to the catalog (see next point). So let's measure first and then decide on a caching strategy for this. - Predicates are currently NOT pushed down to the catalog. I'll need to figure out how to extract time range from generic DataFusion expressions to make that work (it's easier for InfluxRPC queries, but they are not tested at the moment, see first point). Sorry that this commit is kinda huge. I initially planned to only migrate the chunks away from `db` and leave the tables and schemas for a follow-up PR, but the DataFusion trait structure (chunks are bound to their tables) makes this kinda pointless. Closes #3974. * docs: explain what we're doing Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> * docs: mention tracking issues * docs: explain what we're doing Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org> Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>	2022-03-21 16:58:00 +00:00
Carol (Nichols \|\| Goulding)	8fd3d85634	refactor: Move add_parquet_file_with_tombstones from ingester to compactor	2022-03-21 10:16:57 -04:00
Marco Neumann	0779f81b6b	refactor: rework `TableCache (#4054 ) * feat: `TableRepo::get_by_namespace_and_name` * refactor: rework `TableCache` - dual cache that can also map table names to IDs - deal w/ missing tables w/o panics - set proper timeouts to missing data For #3974. * test: extend table cache tests	2022-03-21 13:40:06 +00:00
Luke Bond	da517bd8e2	feat: impl table & column limits in catalog (#3832 ) fix: refactor table & col limit enforcement in catalog into single SQL statement fix: borked rebase Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>	2022-03-18 13:54:07 +00:00
Carol (Nichols \|\| Goulding)	8888e4c3a2	fix: Remove MAX_COMPACT_SIZE from the compaction queries	2022-03-13 20:09:30 -04:00
Carol (Nichols \|\| Goulding)	1dacf567d9	feat: Add a function to the catalog to fetch level 1 parquet files Fixes #3946.	2022-03-11 15:40:34 -05:00
Carol (Nichols \|\| Goulding)	f184b7023c	feat: Update specified parquet file records to compaction level 1 Fixes #3950.	2022-03-11 15:34:40 -05:00
Carol (Nichols \|\| Goulding)	fabd262442	feat: Add a function to the catalog to fetch level 0 parquet files Connects to #3946.	2022-03-11 15:34:05 -05:00
Carol (Nichols \|\| Goulding)	ecd06c6ec3	fix: ParquetFileRepo create should be responsible for setting INITIAL_COMPACTION_LEVEL When created in the catalog, parquet files should always have compaction level 0. Updating the compaction level should always happen in the compactor. Only the catalog should need to know about the initial compaction level value.	2022-03-10 13:51:18 -05:00
Carol (Nichols \|\| Goulding)	ff31407dce	refactor: Extract a ParquetFileParams type for create This has the advantages of: - Not needing to create fake parquet file IDs or fake deleted_at values that aren't used by create before insertion - Not needing too many arguments for create - Naming the arguments so it's easier to see what value is what argument, especially in tests - Easier to reuse arguments or parts of arguments by using copies of params, which makes it easier to see differences, especially in tests	2022-03-10 13:51:18 -05:00
Paul Dix	27999ff72f	feat: add compaction_level and created_at to parquet_file (#3972 )	2022-03-10 15:56:57 +00:00
Marco Neumann	db3f1e8db7	feat: wire up tombstones into querier (#3962 ) * feat: `TombstoneRepo::list_by_namespace` * test: model sequencer properly * feat: wire up tombstones into querier Closes #3932. * refactor: `override_delete_predicates` => `set_delete_predicates`	2022-03-08 10:06:22 +00:00
Marco Neumann	8d00aaba90	feat: sync chunks in querier (#3911 ) * feat: `ParquetFileRepo::list_by_namespace_not_to_delete` * feat: `ChunkAddr: Clone` * test: ensure that querier keeps same partition objects * test: improve `create_parquet_file` flexibility * feat: sync chunks in querier * test: improve `test_parquet_file`	2022-03-04 08:53:39 +00:00
Paul Dix	6ba5e51897	feat: update max_persisted_sequence_number in the buffered table on persist (#3868 ) This includes a bit of a refactor in the locking structure of the buffer data. Locking at the partition collection and within the partition data was making things more complex than they needed to be. The partitions in the buffer are there only temporarily until they get persisted. Locking on the table simplifies things a bit and makes it more clear when the table state is being modified since it no longer has any interior mutability. Having access to separate partitions without the same lock isn't something we need because queries will hit all partitions and data is brought in sequentially, regardless of which partition it is hitting in a sequencer. Fixes #3850	2022-03-03 23:52:31 +00:00
Dom Dwyer	da145ffbe4	feat: batch upsert of columns to catalog Adds ColumnRepo::create_or_get_many() to upsert multiple columns in one round trip to Postgres.	2022-03-03 11:17:30 +00:00
Carol (Nichols \|\| Goulding)	8f3e44bf76	refactor: Extract a crate for shared data types in the new design	2022-03-02 12:16:15 -05:00
Marco Neumann	2fd68ea75f	feat: sync tables and schemas in querier (#3895 ) * feat: convert `iox_catalog` schema to `schema::Schema` * fix: remove leftover println statements * feat: sync tables and schemas in querier * feat: `PartitionRepo::list_by_namespace` * docs: explain `QuerierNamespace` data structs a bit * refactor: improve variable naming * test: extend `test_sync_schemas * fix: do not block forever when namespace is gone	2022-03-02 15:32:03 +00:00

1 2

83 Commits (e875a92cf8d8ede8fac3aa0e5cc8e2e787e7cd41)