Merge pull request #6324 from influxdata/dom/drop-wal-segments
feat(ingester2): drop WAL segments after persistpull/24376/head
commit
533a6581be
|
@ -29,6 +29,49 @@ pub(crate) async fn periodic_rotation(
|
|||
"rotated wal"
|
||||
);
|
||||
|
||||
// TEMPORARY HACK: wait 5 seconds for in-flight writes to the old WAL
|
||||
// segment to complete before draining the partitions.
|
||||
//
|
||||
// This can occur because writes to the WAL & buffer tree are not atomic
|
||||
// (avoiding a serialising mutex in the write path).
|
||||
//
|
||||
// A flawed solution would be to have this code read the current
|
||||
// SequenceNumber after rotation, and then wait until at least that
|
||||
// sequence number has been buffered in the BufferTree. This may work in
|
||||
// most cases, but is racy / not deterministic - writes are not ordered,
|
||||
// so sequence number 5 might be buffered before sequence number 1.
|
||||
//
|
||||
// As a temporary hack, wait 5 seconds for in-flight writes to complete
|
||||
// (which should be more than enough time) before proceeding under the
|
||||
// assumption that they have indeed completed, and all writes from the
|
||||
// previous WAL segment are now buffered. Because they're buffered, the
|
||||
// persist operation performed next will persist all the writes that
|
||||
// were in the previous WAL segment, and therefore at the end of the
|
||||
// persist operation the WAL segment can be dropped.
|
||||
//
|
||||
// The potential downside of this hack is that in the very unlikely
|
||||
// situation that an in-flight write has not completed before the
|
||||
// persist operation starts (after the 5 second sleep) and the WAL entry
|
||||
// for it is dropped - we then reduce the durability of that write until
|
||||
// it is persisted next time, or it is lost after an ingester crash
|
||||
// before the next rotation.
|
||||
//
|
||||
// In the future, a proper fix will be to keep the set of sequence
|
||||
// numbers wrote to each partition buffer, and each WAL segment as a
|
||||
// bitmap, and after persistence submit the partition's bitmap to the
|
||||
// WAL for it to do a set difference to derive the remaining sequence
|
||||
// IDs, and therefore number of references to the WAL segment. Once the
|
||||
// set of remaining IDs is empty (all data is persisted), the segment is
|
||||
// safe to delete. This content-addressed reference counting technique
|
||||
// has the added advantage of working even with parallel / out-of-order
|
||||
// / hot partition persists that span WAL segments, and means there's no
|
||||
// special code path between "hot partition persist" and "wal rotation
|
||||
// persist" - it all works the same way!
|
||||
//
|
||||
// TODO: this properly as described above.
|
||||
|
||||
tokio::time::sleep(Duration::from_secs(5)).await;
|
||||
|
||||
// Drain the BufferTree of partition data and persist each one.
|
||||
//
|
||||
// Writes that landed into the partition buffer after the rotation but
|
||||
|
@ -86,6 +129,16 @@ pub(crate) async fn periodic_rotation(
|
|||
closed_id = %stats.id(),
|
||||
"partitions persisted"
|
||||
);
|
||||
|
||||
handle
|
||||
.delete(stats.id())
|
||||
.await
|
||||
.expect("failed to drop wal segment");
|
||||
|
||||
info!(
|
||||
closed_id = %stats.id(),
|
||||
"dropped persisted wal segment"
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
|
|
Loading…
Reference in New Issue