influxdb

Commit Graph

Author	SHA1	Message	Date
Trevor Hilton	0e814f5d52	feat: SerdeVecMap type for serializing ID maps (#25492 ) This PR introduces a new type `SerdeVecHashMap` that can be used in places where we need a HashMap with the following properties: 1. When serialized, it is serialized as a list of key-value pairs, instead of a map 2. When deserialized, it assumes the serialization format from (1.) and deserializes from a list of key-value pairs to a map 3. Does not allow for duplicate keys on deserialization This is useful in places where we need to create map types that map from an identifier (integer) to some value, and need to serialize that data. For example: in the WAL when serializing write batches, and in the catalog when serializing the database/table schema. This PR refactors the code in `influxdb3_wal` and `influxdb3_catalog` to use the new type for maps that use `DbId` and `TableId` as the key. Follow on work can give the same treatment to `ColumnId` based maps once that is fully worked out. ## Explanation If we have a `HashMap<u32, String>`, `serde_json` will serialize it in the following way: ```json {"0": "foo", "1": "bar"} ``` i.e., the integer keys are serialized as strings, since JSON doesn't support any other type of key in maps. `SerdeVecHashMap<u32, String>` will be serialized by `serde_json` in the following way: ```json, [[0, "foo"], [1, "bar"]] ``` and will deserialize from that vector-based structure back to the map. This allows serialization/deserialization to run directly off of the `HashMap`'s `Iterator`/`FromIterator` implementations. ## The Controversial Part One thing I also did in this PR was switch the catalog from using a `BTreeMap` for tables to using the new `HashMap` type. This breaks the deterministic ordering of the database schema's `tables` map and therefore wrecks the snapshot tests we were using. I had to comment those parts of their respective tests out, because there isn't an easy way to make the underlying hashmap have a deterministic ordering just in tests that I am aware of. If we think that using `BTreeMap` in the catalog is okay over a `HashMap`, then I think it would be okay to roll a similar `SerdeVecBTreeMap` type specifically for the catalog. Coincidentally, this may actually be a good use case for [`indexmap`](https://docs.rs/indexmap/latest/indexmap/), since it holds supposedly similar lookup performance characteristics to hashmap, while preserving order and _having faster iteration_ which could be a win for WAL serialization speed. It also accepts different hashing algorithms so could be swapped in with FNV like `HashMap` can. ## Follow-up work Use the `SerdeVecHashMap` for column data in the WAL following https://github.com/influxdata/influxdb/issues/25461	2024-10-25 13:49:02 -04:00
Trevor Hilton	ce9276d96d	refactor: changes needed for IDs in pro (#25479 ) * refactor: roll back addition of DatabaseSchemaProvider trait * refactor: make parquet metrics optional in telemetry for pro * refactor: make ParquetFileId Hash * refactor: test harness logging	2024-10-21 15:17:02 -04:00
praveen-influx	5473a9489b	feat: allow telemetry endpoint to be passed in (#25475 ) Allow the endpoint for telemetry to be passed in via the cli args, e.g ``` --telemetry-endpoint "https://somehost/test/" ``` and the actual endpoint always appends `v3` to it. So, above URL becomes "https://somehost/test/v3"	2024-10-18 14:34:27 +01:00
Jamie Strandboge	0835093c78	feat(circleci): add inclusivity checks (#25437 ) * feat(circleci): add inclusivity checks * chore(circleci): adjust package-validation for inclusive language * chore: update tests for inclusive language	2024-10-09 08:01:31 -05:00
praveen-influx	1f1125c767	refactor: udpate docs and tests for the telemetry crate (#25432 ) - Introduced traits, `ParquetMetrics` and `SystemInfoProvider` to enable writing easier tests - Uses mockito for code that depends on reqwest::Client and also uses mockall to generally mock any traits like `SystemInfoProvider` - Minor updates to docs	2024-10-08 15:45:13 +01:00
praveen-influx	bd20f80ce2	chore: udpate docs for the telemetry crate (#25431 ) * chore: udpate docs for the telemetry crate * chore: Update influxdb3_telemetry/src/stats.rs Co-authored-by: Michael Gattozzi <mgattozzi@influxdata.com> * chore: Update influxdb3_telemetry/src/sender.rs Co-authored-by: Michael Gattozzi <mgattozzi@influxdata.com> --------- Co-authored-by: Michael Gattozzi <mgattozzi@influxdata.com>	2024-10-08 09:24:49 +01:00
praveen-influx	8ccb580162	feat: telemetry report for parquet metrics (#25425 ) - added mechanism within PersistedFile to expose parquet file related metrics. The details are updated when new snapshot is generated and also when all snapshots are loaded when the process starts up - at the point of creating the telemetry payload these parquet metrics are looked up before sending it to the server. Closes: https://github.com/influxdata/influxdb/issues/25418	2024-10-03 15:11:40 +01:00
praveen-influx	72dcd1866f	feat(telemetry): adds reads and writes (#25409 ) - instrumented code to get read and write measurement - introduced EventsBucket for collection of reads/writes - sampler now samples every minute for all metrics (including reads/writes) - other tidy ups closes: https://github.com/influxdata/influxdb/issues/25372	2024-10-01 18:34:00 +01:00
praveen-influx	c4514bf401	feat(telemetry): tidy ups for stats calcs (#25391 ) - moved num samples for cpu/mem to be usize from u64 - generic float function to round to 2 decimal places added - removed unnecessary cast of f32 to f64	2024-09-25 14:13:00 +01:00
praveen-influx	29daa8332d	feat(telemetry): static values, cpu and mem metrics gathering (#25380 ) - basic setup to initialise the static values for telemetry store added. - cpu and memory used by influxdb3 is sampled at 1min interval - some minor tidyups Closes: https://github.com/influxdata/influxdb/issues/25370, https://github.com/influxdata/influxdb/issues/25371	2024-09-24 20:19:21 +01:00
praveen-influx	c1a5e1b5fd	feat(telemetry): added basic types (#25374 ) - `TelemetryStore` is exposed for holding telemetry samples - added influxdb3_telemetry dependency to influxdb3 crate	2024-09-20 19:20:54 +01:00

11 Commits (aa70a73487d7fd11ec7161cf84dc561311f5d700)