mirror of https://github.com/milvus-io/milvus.git
[skip ci]Add dropcollection design doc (#11450)
See also: #11426 Signed-off-by: yangxuan <xuan.yang@zilliz.com>pull/11695/head
parent
54b40da4c5
commit
eb41afc661
|
@ -0,0 +1,89 @@
|
|||
# DropCollection release resources
|
||||
|
||||
## Before this enhancement
|
||||
|
||||
**When dropping a collection**
|
||||
|
||||
1. DataNode releases the flowgraph of this collection and drops all the data in buffer.
|
||||
2. DataCoord has no idea whether a collection is dropped or not.
|
||||
- DataCoord will make DataNode watch DmChannels of dropped collections.
|
||||
- Blob files will never be removed even if the collection is dropped.
|
||||
|
||||
**For not in used binlogs on blob storage: Why are there such binlogs**
|
||||
- A failure flush.
|
||||
- A failure compaction.
|
||||
- Dropped and out-of timetravel collection binlogs.
|
||||
|
||||
This enhancement is focused on solving these 2 problems.
|
||||
|
||||
## Object1 DropCollection
|
||||
|
||||
DataNode ignites Flush&Drop
|
||||
receive drop collection msg ->
|
||||
cancel compaction ->
|
||||
flush all insert buffer and delete buffer ->
|
||||
release the flowgraph
|
||||
|
||||
**Plan 1: Picked**
|
||||
|
||||
Add a `dropped` flag in `SaveBinlogPathRequest` proto
|
||||
|
||||
DN
|
||||
- Flush all segment in this vChannel, When Flush&Drop, set the `dropped` flag true.
|
||||
- If fail, retry at most 10 times and restart
|
||||
|
||||
DC
|
||||
- DataCoord marks segmentInfo as `dropped`, doesn't remove segmentInfos from Etcd
|
||||
- When recovery, check if the segments in the vchannel are all dropped
|
||||
- if not, recover before the drop
|
||||
- if so, no need to recover the vchannel
|
||||
|
||||
Pros:
|
||||
1. Easiest approch in both DN and DC
|
||||
2. DN can reuse the current flush manager procedure
|
||||
Cons:
|
||||
1. The No. rpc call is equal to the No. segments in a collection, expensive
|
||||
|
||||
---
|
||||
|
||||
**Plan 2: Enhance later**
|
||||
|
||||
Add a new rpc `FlushAndDrop`, it's a vchannel scope rpc.
|
||||
|
||||
Pros:
|
||||
1. much lesser rpc calls, equal to shard-numbers.
|
||||
2. More clarity of flush procedure in DN.
|
||||
Cons:
|
||||
1. More efforts in DN and DC.
|
||||
|
||||
```
|
||||
message FlushAndDropRequest {
|
||||
common.MsgBase base = 1;
|
||||
string channelID = 2;
|
||||
int64 collectionID = 3;
|
||||
repeated SegmentBinlogPaths segment_binlog_paths = 6;
|
||||
}
|
||||
|
||||
message SegmentBinlogPaths {
|
||||
int64 segmentID = 1;
|
||||
CheckPoint checkPoint = 2;
|
||||
repeated FieldBinlog field2BinlogPaths = 2;
|
||||
repeated FieldBinlog field2StatslogPaths = 3;
|
||||
repeated DeltaLogInfo deltalogs = 4;
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Object2: DataCoord GC for not in used binlogs
|
||||
|
||||
### How to clear unknown binlogs?
|
||||
DataCoord runs a background GC goroutine, triggers every 1 day:
|
||||
1. Get all minIO/S3 paths(keys).
|
||||
2. Filter out keys not in segmentInfo.
|
||||
3. According to the meta of blobs from minIO/S3, remove binlogs that exist more than 1 day.
|
||||
- **Why 1 day: **Maybe there are newly uploaded binlogs from flush/compaction
|
||||
|
||||
### How to clear dropped-collection's binlogs?
|
||||
- DataCoord checks all dropped-segments, remove the binlogs recorded if they've been dropped by 1 day.
|
||||
- DataCoord keeps the etcd segmentInfo meta.
|
Loading…
Reference in New Issue