Update DataNode recovery design (#5578)

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2021-06-03 19:42:34 +08:00 · 2021-06-03 19:42:34 +08:00 · 03a2052343
parent aa8a038305
commit 03a2052343
1 changed files with 33 additions and 52 deletions
--- a/docs/design_docs/datanode_recovery_design_0513_2021.md
+++ b/docs/design_docs/datanode_recovery_design_0513_2021.md
@ -1,49 +1,53 @@
 # DataNode Recovery Design

-update: 5.21.2021, by [Goose](https://github.com/XuanYang-cn)
+update: 5.21.2021, by [Goose](https://github.com/XuanYang-cn)  
+update: 6.03.2021, by [Goose](https://github.com/XuanYang-cn)
+
+## What's DataNode?
+
+DataNode processes insert data and persists them.
+
+DataNode is based on flowgraph, each flowgraph cares about only one vchannel. There're ddl messages, dml
+messages, and timetick messages inside one vchannel, FIFO log stream.
+
+One vchannel only contains dml messages of one collection. A collection consists of many segments, hence
+a vchannel contains dml messsages of many segments. **Most importantly, the dml messages of the same segment 
+can appear in anywhere in vchannel.**
+
+## What does DataNode recovery really mean?
+
+DataNode is stateless, but vchannel has states. DataNode's statelessness is guranteed by DataService, which
+means the vchannel's states is maintained by DataService. So DataNode recovery has no different as starting.
+
+So what's DataNode's starting procedure?

 ## Objectives

-DataNode is stateless. It does whatever DataService tells, so recovery is not a difficult thing for datanode.
-Once datanode subscribes certain vchannels, it starts working till crash. So the key to recovery is consuming
-vchannels at the right position. What's processed no longer need to be processed again, what's not processed is
-the key.
-
-What's the line between processed or not for DataNode? Wether the data is flushed into persistent storage, which's
- the only job of DataNode. So recovering a DataNode needs the last positions of flushed data in every vchannels.
-Luckily, this information will be told by DataService, DataNode only worries about updating positions after flushing.
-
-There's more to fully recover a DataNode. DataNode replicates collection schema in memory to decode and encode
-data. Once it recovers to an older position of insert channels, it needs the collection schema snapshots from
-that exactly position. Luckily again, the snapshots will be provided via MasterService.
-
-So DataNode needs to achieve the following 3 objectives.
-
-### 1. Service Registration
+### 1. Serveice Registration

 DataNode registers itself to Etcd after grpc server started, in *INITIALIZING* state.

-### 2. Service discovery
+### 2. Service Discovery

-DataNode discovers DataService and MasterService, in *HEALTHY* state.
+DataNode discovers DataService and MasterService, in *HEALTHY* and *IDLE* state.

-### 3. Recovery state
+### 3. Flowgraph Recovery

-After stage 1&2, DataNode is healthy but IDLE. DataNode starts to work until the following happens.
+The detailed design can be found at [datanode flowgraph recovery design](datanode_flowgraph_recovery_design_0604_2021.md).

- DataService info the vchannels and positions.
+After DataNode subscribes to a stateful vchannel, DataNode starts to work, or more specifically, flowgraph starts to work. 

- DataNode replicates the snapshots of collection schema at the positions to which these vchannel belongs.
+Vchannel is stateful because we don't want to process twice what's already processed. And a "processed" message means its
+already persistant. In DataNode's terminology, a message is processed if it's been flushed.

- DataNode initializes flowgraphs and subscribes to these vchannels
+DataService tells DataNode stateful vchannel infos through RPC `WatchDmChannels`, so that DataNode won't process
+the same messages over and over again. So flowgraph needs ability to comsume messages in the middle of a vchannel.

-There're some problems I haven't thought of.
+DataNode tells DataService vchannel states after each flush through RPC `SaveBinlogPaths`, so that DataService
+keep the vchannel states update.

- What if DataService is unavaliable, by network failure, DataService crashing, etc.
- What if MasterService is unavaliable, by network failure, MasterService crashing, etc.
- What if MinIO is unavaliable, by network failure.

-## TODO
+## Some of the following interface/proto designs are outdate, will be updated soon

 ### 1. DataNode no longer interacts with Etcd except service registering

@ -51,15 +55,6 @@ There're some problems I haven't thought of.
    
   ![datanode_design](graphs/datanode_design_01.jpg)

-##### Auto-flush with manul-flush
-
-Manul-flush means that the segment is sealed, and DataNode is told to flush by DataService. The completion of
-manul-flush requires ddl and insert data both flushed, and a flush completed message will be published to
-msgstream by DataService. In this case, not only do binlog paths need to be stored, but also msg-positions.
-
-Auto-flush means that the segment isn't sealed, but the buffer of insert/ddl data in DataNode is full,
-DataNode automatically flushs these data. Those flushed binlogs' paths are buffered in DataNode, waiting for the next
-manul-flush and upload to DataServce together.

 ##### DataService RPC Design

@ -100,20 +95,6 @@ message DDLBinlogMeta {
 }
 ```
    
-#### **O1-2** DataNode registers itself to Etcd when started
-    
-### 2. DataNode gets start and end MsgPositions of all channels, and report to DataService after flushing
-
-   **O2-1**. Set start and end positions while publishing ddl messages. 0.5 Day
-
-   **O2-2**. [after **O4-1**] Get message positions in flowgraph and pass through nodes, report to DataService along with binlog paths. 1 Day
-
-   **O2-3**. [with **O1-1**] DataNode is no longer aware of whether if segment flushed, so SegmentFlushed messages should be sent by DataService. 1 Day
-
-### 3. DataNode recovery
-
-   **O3-1**. Flowgraph is initialized after DataService called WatchDmChannels, flowgraph is healthy if MasterService is available. 2 Day
-
 ### 4. DataNode with collection with flowgraph with vchannel designs

 #### The winner