Commit Graph

11 Commits (aa70a73487d7fd11ec7161cf84dc561311f5d700)

Author SHA1 Message Date
Trevor Hilton 0e814f5d52
feat: SerdeVecMap type for serializing ID maps (#25492)
This PR introduces a new type `SerdeVecHashMap` that can be used in places where we need a HashMap with the following properties:
1. When serialized, it is serialized as a list of key-value pairs, instead of a map
2. When deserialized, it assumes the serialization format from (1.) and deserializes from a list of key-value pairs to a map
3. Does not allow for duplicate keys on deserialization

This is useful in places where we need to create map types that map from an identifier (integer) to some value, and need to serialize that data. For example: in the WAL when serializing write batches, and in the catalog when serializing the database/table schema.

This PR refactors the code in `influxdb3_wal` and `influxdb3_catalog` to use the new type for maps that use `DbId` and `TableId` as the key. Follow on work can give the same treatment to `ColumnId` based maps once that is fully worked out.

## Explanation

If we have a `HashMap<u32, String>`, `serde_json` will serialize it in the following way:
```json
{"0": "foo", "1": "bar"}
```
i.e., the integer keys are serialized as strings, since JSON doesn't support any other type of key in maps.

`SerdeVecHashMap<u32, String>` will be serialized by `serde_json` in the following way:
```json,
[[0, "foo"], [1, "bar"]]
```
and will deserialize from that vector-based structure back to the map. This allows serialization/deserialization to run directly off of the `HashMap`'s `Iterator`/`FromIterator` implementations.

## The Controversial Part

One thing I also did in this PR was switch the catalog from using a `BTreeMap` for tables to using the new `HashMap` type. This breaks the deterministic ordering of the database schema's `tables` map and therefore wrecks the snapshot tests we were using. I had to comment those parts of their respective tests out, because there isn't an easy way to make the underlying hashmap have a deterministic ordering just in tests that I am aware of.

If we think that using `BTreeMap` in the catalog is okay over a `HashMap`, then I think it would be okay to roll a similar `SerdeVecBTreeMap` type specifically for the catalog. Coincidentally, this may actually be a good use case for [`indexmap`](https://docs.rs/indexmap/latest/indexmap/), since it holds supposedly similar lookup performance characteristics to hashmap, while preserving order and _having faster iteration_ which could be a win for WAL serialization speed. It also accepts different hashing algorithms so could be swapped in with FNV like `HashMap` can.

## Follow-up work

Use the `SerdeVecHashMap` for column data in the WAL following https://github.com/influxdata/influxdb/issues/25461
2024-10-25 13:49:02 -04:00
Trevor Hilton ce9276d96d
refactor: changes needed for IDs in pro (#25479)
* refactor: roll back addition of DatabaseSchemaProvider trait

* refactor: make parquet metrics optional in telemetry for pro

* refactor: make ParquetFileId Hash

* refactor: test harness logging
2024-10-21 15:17:02 -04:00
praveen-influx 5473a9489b
feat: allow telemetry endpoint to be passed in (#25475)
Allow the endpoint for telemetry to be passed in via the cli args, e.g

```
--telemetry-endpoint "https://somehost/test/"
```

and the actual endpoint always appends `v3` to it. So, above URL becomes
"https://somehost/test/v3"
2024-10-18 14:34:27 +01:00
Jamie Strandboge 0835093c78
feat(circleci): add inclusivity checks (#25437)
* feat(circleci): add inclusivity checks

* chore(circleci): adjust package-validation for inclusive language

* chore: update tests for inclusive language
2024-10-09 08:01:31 -05:00
praveen-influx 1f1125c767
refactor: udpate docs and tests for the telemetry crate (#25432)
- Introduced traits, `ParquetMetrics` and `SystemInfoProvider` to enable
  writing easier tests
- Uses mockito for code that depends on reqwest::Client and also uses
  mockall to generally mock any traits like `SystemInfoProvider`
- Minor updates to docs
2024-10-08 15:45:13 +01:00
praveen-influx bd20f80ce2
chore: udpate docs for the telemetry crate (#25431)
* chore: udpate docs for the telemetry crate

* chore: Update influxdb3_telemetry/src/stats.rs

Co-authored-by: Michael Gattozzi <mgattozzi@influxdata.com>

* chore: Update influxdb3_telemetry/src/sender.rs

Co-authored-by: Michael Gattozzi <mgattozzi@influxdata.com>

---------

Co-authored-by: Michael Gattozzi <mgattozzi@influxdata.com>
2024-10-08 09:24:49 +01:00
praveen-influx 8ccb580162
feat: telemetry report for parquet metrics (#25425)
- added mechanism within PersistedFile to expose parquet file related
  metrics. The details are updated when new snapshot is generated and
  also when all snapshots are loaded when the process starts up
- at the point of creating the telemetry payload these parquet metrics
  are looked up before sending it to the server.

Closes: https://github.com/influxdata/influxdb/issues/25418
2024-10-03 15:11:40 +01:00
praveen-influx 72dcd1866f
feat(telemetry): adds reads and writes (#25409)
- instrumented code to get read and write measurement
- introduced EventsBucket for collection of reads/writes
- sampler now samples every minute for all metrics (including
  reads/writes)
- other tidy ups

closes: https://github.com/influxdata/influxdb/issues/25372
2024-10-01 18:34:00 +01:00
praveen-influx c4514bf401
feat(telemetry): tidy ups for stats calcs (#25391)
- moved num samples for cpu/mem to be usize from u64
- generic float function to round to 2 decimal places added
- removed unnecessary cast of f32 to f64
2024-09-25 14:13:00 +01:00
praveen-influx 29daa8332d
feat(telemetry): static values, cpu and mem metrics gathering (#25380)
- basic setup to initialise the static values for telemetry store added.
- cpu and memory used by influxdb3 is sampled at 1min interval
- some minor tidyups

Closes: https://github.com/influxdata/influxdb/issues/25370, https://github.com/influxdata/influxdb/issues/25371
2024-09-24 20:19:21 +01:00
praveen-influx c1a5e1b5fd
feat(telemetry): added basic types (#25374)
- `TelemetryStore` is exposed for holding telemetry samples
- added influxdb3_telemetry dependency to influxdb3 crate
2024-09-20 19:20:54 +01:00