* feat: log reason for triggering persistence (#1957)
Don't panic on no eligible chunks to persist (#1966)
* fix: fix conditions and logging
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: kodiakhq[bot] <49736102+kodiakhq[bot]@users.noreply.github.com>
The interplay between mutable_linger_seconds, late_arrive_window and persist_age_threshold_seconds can be tricky to reason about. I realized that the lifecycle rules can be simplified by removing mutable_linger_seconds and instead using late_arrive_window_seconds for the same purpose. Semantically, they basically mean the same thing. We want to give data around this amount of time to arrive before the system persists it, which gives it more of an opportunity to persist non-overlapping data.
When a partition goes cold for writes, after we've waiting past this window, we should compact and persist that partition. This removes one unnecessary knob from the lifecycle configuration and also removes the potential for conflicting configuration options.
* feat: add split and persist operation
* docs: Improve doc strings
* refactor: use for loop rather than map
* refactor: Make it clear that the lifecycle policy picks the split timestamp
* fix: race condition
* docs: improve comments
* fix: logical merge conflict
* fix: clippy
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
This will be followed by a cleanup PR that will no longer make mutable_linger_seconds an option
and move the defaulting to the configuration layer.
Closes#1899
This can possibly help us better understand #1899.
The practical reason for increasing the log level for those
events is that we cannot currently ingest debug logs in our log aggregation system,
but it currently takes 7h to reproduce #1899 which means we don't have access to debug logs
from `kubectl logs` to help us troubleshoot.
That said, I do think that these logs are not debug logs but legit lifecycle information.
In particular if a log says "unexpected ..." I think it may be better regarded as an error/warning.
(If it becomes too verbose, it means its either a bug or not that unexpected).
`persist_row_threshold` should limit the rows of the post-compaction
output chunk (and hence the sum of rows over the input chunks), not
the number of rows of each individual input chunk.
Fixes#1874.