influxdb/influxdb3_catalog
Michael Gattozzi 4e2cb630b3
fix: Prevent Catalog UUID races for new nodes (#26160)
When starting up a new cluster in Enterprise we might have multiple
nodes starting at the same time. We might have an issue wherby we have
multiple catalogs with different UUIDs in their in memory
representation.

For example:
- Let's say we have node0 and node1
- node0 and node1 start at the same time and both check object storage
  to see if there is a catalog to load
- They both see there is no catalog
- They both create a new one by generating a UUID and persisting it to
  object storage
- Whichever is written second is now the one with the correct UUID in
  their in memory representation while the other will not have the
  correct one until restarted likely

This in practice isn't an issue today as Trevor notes in
https://github.com/influxdata/influxdb_pro/issues/600, but it could be
once we start using `--cluster-id` for licensing purposes. In order to
prevent this we instead make the write to object storage use the Put
mode. If it exists then the write will fail and the node that lost the
race will instead just load the other's catalog.

For example if node1 wins the race then node0 will load the catalog
created by node1 and use that UUID instead.

As this is hard to create a test for as it involves a race condition to
happen I have not included one as we could never really be sure it was
taken care of and we rely on the underlying object store we are writing
to to handle this for us. It's also not likely to happen given this is
only on a new cluster being initiated for the first time decreasing the
chances of it occurring in the first place.
2025-03-18 11:25:08 -04:00
..
src fix: Prevent Catalog UUID races for new nodes (#26160) 2025-03-18 11:25:08 -04:00
Cargo.toml feat: catalog checkpoints (#26126) 2025-03-11 18:20:36 -04:00