milvus/docs/design_docs/20210604-datanode_flowgraph...

# DataNode Flowgraph Recovery Design

update: 6.4.2021, by [Goose](https://github.com/XuanYang-cn)
update: 6.21.2021, by [Goose](https://github.com/XuanYang-cn)

## 1. Common Sense

A. One message stream to one vchannel, so there are one start and one end position in one message pack.

B. Only when DataNode flushes, DataNode will update every segment's position.
An optimization: update position of

1. Current flushing segment
2. StartPosition of segments has never been flushed.

C. DataNode auto-flush is a valid flush.

D. DDL messages are now in DML Vchannels.

## 2. Segments in Flowgraph

![segments](graphs/segments.png)

## 3. Flowgraph Recovery

### A. Save checkpoints

When a flowgraph flushes a segment, we need to save these things:

- current segment's binlog paths.
- current segment positions.
- all other segments' current positions from the replica (If a segment hasn't been flushed, save the position when DataNode first meets it).

Whether save successfully:

- If succeeded, flowgraph updates all segments' positions to the replica.
- If not
  - For a grpc failure(this failure will appear after many times retry internally), crash itself.
  - For a normal failure, retry save 10 times, if still fails, crash itself.

### B. Recovery from a set of checkpoints

1. We need all positions of all segments in this vchannel `p1, p2, ... pn`.

Proto design for WatchDmChannelReq:

```proto
message VchannelInfo {
  int64 collectionID = 1;
  string channelName = 2;
  msgpb.MsgPosition seek_position = 3;
  repeated SegmentInfo unflushedSegments = 4;
  repeated int64 flushedSegments = 5;
}

message WatchDmChannelsRequest {
  common.MsgBase base = 1;
  repeated VchannelInfo vchannels = 2;
}
```

2. We want to filter msgPacks based on these positions.

![recovery](graphs/flowgraph_recovery_design.png)

Supposing we have segments `s1, s2, s3`, corresponding positions `p1, p2, p3`

- Sort positions in reverse order `p3, p2, p1`
- Get segments dup range time: `s3 ( p3 > mp_px > p1)`, `s2 (p2 > mp_px > p1)`, `s1(zero)`
- Seek from the earliest, in this example `p1`
- Then for every msgPack after seeking `p1`, the pseudocode:

```go
const filter_threshold = recovery_time
// mp means msgPack
for mp := seeking(p1) {
    if mp.position.endtime < filter_threshold {
        if mp.position < p3 {
            filter s3
        }
        if mp.position < p2 {
            filter s2
        }
    }
}
```
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00			`# DataNode Flowgraph Recovery Design`

			`update: 6.4.2021, by [Goose](https://github.com/XuanYang-cn)`
update datanode design docs (#5939) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-21 10:32:12 +00:00			`update: 6.21.2021, by [Goose](https://github.com/XuanYang-cn)`
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00
			`## 1. Common Sense`
[skip ci] update docs (#9726) Signed-off-by: ThyeeZz <jialian.ji@zilliz.com> 2021-10-12 11:18:33 +00:00
[skip e2e] Refine design doc (#13909) Signed-off-by: Binbin Lv <binbin.lv@zilliz.com> 2021-12-21 13:19:35 +00:00			`A. One message stream to one vchannel, so there are one start and one end position in one message pack.`
[skip ci] Format markdown & update typo (#11394) Signed-off-by: shaoyue.chen <shaoyue.chen@zilliz.com> 2021-11-08 02:40:59 +00:00
[skip e2e]Update component names (#13583) Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com> 2021-12-17 10:52:15 +00:00			`B. Only when DataNode flushes, DataNode will update every segment's position.`
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00			`An optimization: update position of`
[skip ci] update docs (#9726) Signed-off-by: ThyeeZz <jialian.ji@zilliz.com> 2021-10-12 11:18:33 +00:00
[skip ci]Update the bullet format (#13266) Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com> 2021-12-13 11:19:43 +00:00			`1. Current flushing segment`
			`2. StartPosition of segments has never been flushed.`
[skip ci] Format markdown & update typo (#11394) Signed-off-by: shaoyue.chen <shaoyue.chen@zilliz.com> 2021-11-08 02:40:59 +00:00
			`C. DataNode auto-flush is a valid flush.`

			`D. DDL messages are now in DML Vchannels.`
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00
update datanode design docs (#5939) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-21 10:32:12 +00:00			`## 2. Segments in Flowgraph`

			`![segments](graphs/segments.png)`

			`## 3. Flowgraph Recovery`
[skip ci] update docs (#9726) Signed-off-by: ThyeeZz <jialian.ji@zilliz.com> 2021-10-12 11:18:33 +00:00
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00			`### A. Save checkpoints`
[skip ci] update docs (#9726) Signed-off-by: ThyeeZz <jialian.ji@zilliz.com> 2021-10-12 11:18:33 +00:00
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00			`When a flowgraph flushes a segment, we need to save these things:`
[skip ci] update docs (#9726) Signed-off-by: ThyeeZz <jialian.ji@zilliz.com> 2021-10-12 11:18:33 +00:00
[skip ci]Update design docs (#11626) Signed-off-by: shiyu22 <shiyu.chen@zilliz.com> 2021-11-11 05:11:15 +00:00			`- current segment's binlog paths.`
			`- current segment positions.`
[skip e2e]Update component names (#13583) Signed-off-by: yanliang567 <yanliang.qiao@zilliz.com> 2021-12-17 10:52:15 +00:00			`- all other segments' current positions from the replica (If a segment hasn't been flushed, save the position when DataNode first meets it).`
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00
			`Whether save successfully:`
[skip ci] update docs (#9726) Signed-off-by: ThyeeZz <jialian.ji@zilliz.com> 2021-10-12 11:18:33 +00:00
[skip e2e] Add note for design doc (#13587) Signed-off-by: yhmo <yihua.mo@zilliz.com> 2021-12-17 08:58:46 +00:00			`- If succeeded, flowgraph updates all segments' positions to the replica.`
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00			`- If not`
[skip e2e] Refine design docs (#13966) Signed-off-by: Binbin Lv <binbin.lv@zilliz.com> 2021-12-22 09:09:40 +00:00			`- For a grpc failure(this failure will appear after many times retry internally), crash itself.`
			`- For a normal failure, retry save 10 times, if still fails, crash itself.`
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00
			`### B. Recovery from a set of checkpoints`
[skip ci] update docs (#9726) Signed-off-by: ThyeeZz <jialian.ji@zilliz.com> 2021-10-12 11:18:33 +00:00
[skip ci]Add punctuation in datanode_flowgraph_recovery_design_0604_2021.md (#10644) Signed-off-by: JackLCL <chenglong.li@zilliz.com> 2021-10-26 06:32:56 +00:00			1. We need all positions of all segments in this vchannel `p1, p2, ... pn`.
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00
[skip ci] Add note for design doc (#13133) Signed-off-by: yhmo <yihua.mo@zilliz.com> 2021-12-10 02:29:33 +00:00			`Proto design for WatchDmChannelReq:`
[skip ci] update docs (#9726) Signed-off-by: ThyeeZz <jialian.ji@zilliz.com> 2021-10-12 11:18:33 +00:00
			```proto
update datanode design docs (#5939) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-21 10:32:12 +00:00			`message VchannelInfo {`
			`int64 collectionID = 1;`
			`string channelName = 2;`
Decouple mq module from internal proto definition (#22536) Signed-off-by: jaime <yun.zhang@zilliz.com> 2023-03-04 15:21:50 +00:00			`msgpb.MsgPosition seek_position = 3;`
update datanode design docs (#5939) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-21 10:32:12 +00:00			`repeated SegmentInfo unflushedSegments = 4;`
			`repeated int64 flushedSegments = 5;`
			`}`

			`message WatchDmChannelsRequest {`
			`common.MsgBase base = 1;`
			`repeated VchannelInfo vchannels = 2;`
			`}`
			```
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00
			`2. We want to filter msgPacks based on these positions.`

			`![recovery](graphs/flowgraph_recovery_design.png)`

[skip ci]Fix doc error in design doc (#11630) Signed-off-by: JackLCL <chenglong.li@zilliz.com> 2021-11-11 05:25:00 +00:00			Supposing we have segments `s1, s2, s3`, corresponding positions `p1, p2, p3`
[skip ci] update docs (#9726) Signed-off-by: ThyeeZz <jialian.ji@zilliz.com> 2021-10-12 11:18:33 +00:00
			- Sort positions in reverse order `p3, p2, p1`
			- Get segments dup range time: `s3 ( p3 > mp_px > p1)`, `s2 (p2 > mp_px > p1)`, `s1(zero)`
			- Seek from the earliest, in this example `p1`
			- Then for every msgPack after seeking `p1`, the pseudocode:
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00
			```go
			`const filter_threshold = recovery_time`
			`// mp means msgPack`
			`for mp := seeking(p1) {`
fix some typos (#27851) 1. fix some typos in md,yaml #22893 Signed-off-by: Sheldon <chuanfeng.liu@zilliz.com> 2023-10-24 01:30:10 +00:00			`if mp.position.endtime < filter_threshold {`
Add design for datanode flowgraph recovery (#5562) Signed-off-by: yangxuan <xuan.yang@zilliz.com> 2021-06-03 07:27:33 +00:00			`if mp.position < p3 {`
			`filter s3`
			`}`
			`if mp.position < p2 {`
			`filter s2`
			`}`
			`}`
			`}`
			```