mirror of https://github.com/milvus-io/milvus.git
Update DataNode recovery design (#5578)
Signed-off-by: yangxuan <xuan.yang@zilliz.com>pull/5779/head
parent
aa8a038305
commit
03a2052343
|
@ -1,49 +1,53 @@
|
|||
# DataNode Recovery Design
|
||||
|
||||
update: 5.21.2021, by [Goose](https://github.com/XuanYang-cn)
|
||||
update: 5.21.2021, by [Goose](https://github.com/XuanYang-cn)
|
||||
update: 6.03.2021, by [Goose](https://github.com/XuanYang-cn)
|
||||
|
||||
## What's DataNode?
|
||||
|
||||
DataNode processes insert data and persists them.
|
||||
|
||||
DataNode is based on flowgraph, each flowgraph cares about only one vchannel. There're ddl messages, dml
|
||||
messages, and timetick messages inside one vchannel, FIFO log stream.
|
||||
|
||||
One vchannel only contains dml messages of one collection. A collection consists of many segments, hence
|
||||
a vchannel contains dml messsages of many segments. **Most importantly, the dml messages of the same segment
|
||||
can appear in anywhere in vchannel.**
|
||||
|
||||
## What does DataNode recovery really mean?
|
||||
|
||||
DataNode is stateless, but vchannel has states. DataNode's statelessness is guranteed by DataService, which
|
||||
means the vchannel's states is maintained by DataService. So DataNode recovery has no different as starting.
|
||||
|
||||
So what's DataNode's starting procedure?
|
||||
|
||||
## Objectives
|
||||
|
||||
DataNode is stateless. It does whatever DataService tells, so recovery is not a difficult thing for datanode.
|
||||
Once datanode subscribes certain vchannels, it starts working till crash. So the key to recovery is consuming
|
||||
vchannels at the right position. What's processed no longer need to be processed again, what's not processed is
|
||||
the key.
|
||||
|
||||
What's the line between processed or not for DataNode? Wether the data is flushed into persistent storage, which's
|
||||
the only job of DataNode. So recovering a DataNode needs the last positions of flushed data in every vchannels.
|
||||
Luckily, this information will be told by DataService, DataNode only worries about updating positions after flushing.
|
||||
|
||||
There's more to fully recover a DataNode. DataNode replicates collection schema in memory to decode and encode
|
||||
data. Once it recovers to an older position of insert channels, it needs the collection schema snapshots from
|
||||
that exactly position. Luckily again, the snapshots will be provided via MasterService.
|
||||
|
||||
So DataNode needs to achieve the following 3 objectives.
|
||||
|
||||
### 1. Service Registration
|
||||
### 1. Serveice Registration
|
||||
|
||||
DataNode registers itself to Etcd after grpc server started, in *INITIALIZING* state.
|
||||
|
||||
### 2. Service discovery
|
||||
### 2. Service Discovery
|
||||
|
||||
DataNode discovers DataService and MasterService, in *HEALTHY* state.
|
||||
DataNode discovers DataService and MasterService, in *HEALTHY* and *IDLE* state.
|
||||
|
||||
### 3. Recovery state
|
||||
### 3. Flowgraph Recovery
|
||||
|
||||
After stage 1&2, DataNode is healthy but IDLE. DataNode starts to work until the following happens.
|
||||
The detailed design can be found at [datanode flowgraph recovery design](datanode_flowgraph_recovery_design_0604_2021.md).
|
||||
|
||||
- DataService info the vchannels and positions.
|
||||
After DataNode subscribes to a stateful vchannel, DataNode starts to work, or more specifically, flowgraph starts to work.
|
||||
|
||||
- DataNode replicates the snapshots of collection schema at the positions to which these vchannel belongs.
|
||||
Vchannel is stateful because we don't want to process twice what's already processed. And a "processed" message means its
|
||||
already persistant. In DataNode's terminology, a message is processed if it's been flushed.
|
||||
|
||||
- DataNode initializes flowgraphs and subscribes to these vchannels
|
||||
DataService tells DataNode stateful vchannel infos through RPC `WatchDmChannels`, so that DataNode won't process
|
||||
the same messages over and over again. So flowgraph needs ability to comsume messages in the middle of a vchannel.
|
||||
|
||||
There're some problems I haven't thought of.
|
||||
DataNode tells DataService vchannel states after each flush through RPC `SaveBinlogPaths`, so that DataService
|
||||
keep the vchannel states update.
|
||||
|
||||
- What if DataService is unavaliable, by network failure, DataService crashing, etc.
|
||||
- What if MasterService is unavaliable, by network failure, MasterService crashing, etc.
|
||||
- What if MinIO is unavaliable, by network failure.
|
||||
|
||||
## TODO
|
||||
## Some of the following interface/proto designs are outdate, will be updated soon
|
||||
|
||||
### 1. DataNode no longer interacts with Etcd except service registering
|
||||
|
||||
|
@ -51,15 +55,6 @@ There're some problems I haven't thought of.
|
|||
|
||||

|
||||
|
||||
##### Auto-flush with manul-flush
|
||||
|
||||
Manul-flush means that the segment is sealed, and DataNode is told to flush by DataService. The completion of
|
||||
manul-flush requires ddl and insert data both flushed, and a flush completed message will be published to
|
||||
msgstream by DataService. In this case, not only do binlog paths need to be stored, but also msg-positions.
|
||||
|
||||
Auto-flush means that the segment isn't sealed, but the buffer of insert/ddl data in DataNode is full,
|
||||
DataNode automatically flushs these data. Those flushed binlogs' paths are buffered in DataNode, waiting for the next
|
||||
manul-flush and upload to DataServce together.
|
||||
|
||||
##### DataService RPC Design
|
||||
|
||||
|
@ -100,20 +95,6 @@ message DDLBinlogMeta {
|
|||
}
|
||||
```
|
||||
|
||||
#### **O1-2** DataNode registers itself to Etcd when started
|
||||
|
||||
### 2. DataNode gets start and end MsgPositions of all channels, and report to DataService after flushing
|
||||
|
||||
**O2-1**. Set start and end positions while publishing ddl messages. 0.5 Day
|
||||
|
||||
**O2-2**. [after **O4-1**] Get message positions in flowgraph and pass through nodes, report to DataService along with binlog paths. 1 Day
|
||||
|
||||
**O2-3**. [with **O1-1**] DataNode is no longer aware of whether if segment flushed, so SegmentFlushed messages should be sent by DataService. 1 Day
|
||||
|
||||
### 3. DataNode recovery
|
||||
|
||||
**O3-1**. Flowgraph is initialized after DataService called WatchDmChannels, flowgraph is healthy if MasterService is available. 2 Day
|
||||
|
||||
### 4. DataNode with collection with flowgraph with vchannel designs
|
||||
|
||||
#### The winner
|
||||
|
|
Loading…
Reference in New Issue