[skip ci]Improve milvus_timesync_en.md (#11623)

Signed-off-by: tumao <yan.wang@zilliz.com>
pull/11637/head
Tumao 2021-11-11 13:19:14 +08:00 committed by GitHub
parent 64232f1737
commit 5227db3fb4
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 19 additions and 13 deletions

View File

@ -3,6 +3,7 @@
`Time Synchronization` is the kernel part of Milvus 2.0; it affects all components of the system. This document describes the detailed design of `Time Synchronization`.
There are 2 kinds of events in Milvus 2.0:
- DDL events
- create collection
- drop collection
@ -17,20 +18,20 @@ All events have a `Timestamp` to indicate when this event occurs.
Suppose there are two users, `u1` and `u2`. They connect to Milvus and do the following operations at the respective timestamps.
| ts | u1 | u2 |
|-----------|----------------------|--------------|
| t0 | create Collection C0 | - |
| t2 | - | search on C0 |
| t5 | insert A1 into C0 | - |
| t7 | - | search on C0 |
| t10 | insert A2 | - |
| t12 | - | search on C0 |
| t15 | delete A1 from C0 | - |
| t17 | - | search on C0 |
| ts | u1 | u2 |
| --- | -------------------- | ------------ |
| t0 | create Collection C0 | - |
| t2 | - | search on C0 |
| t5 | insert A1 into C0 | - |
| t7 | - | search on C0 |
| t10 | insert A2 | - |
| t12 | - | search on C0 |
| t15 | delete A1 from C0 | - |
| t17 | - | search on C0 |
Ideally, `u2` expects `C0` to be empty at `t2`, and could only see `A1` at `t7`; while `u2` could see both `A1` and `A2` at `t12`, but only see `A2` at `t17`.
It's easy to achieve this in a `single-node` database. But for a `Distributed System`, such like `Milvus`, it's a little difficult; the following problems need to be solved.
It's easy to achieve this in a `single-node` database. But for a `Distributed System`, such as `Milvus`, it's a little difficult; the following problems need to be solved.
1. If `u1` and `u2` are on different nodes, and their time clock is not synchronized. To give an extreme example, suppose that the time of `u2` is 24 hours later than `u1`, then all the operations of `u1` can't been seen by `u2` until next day.
2. Network latency. If `u2` starts the `Search on C0` at `t17`, then how can it be guaranteed that all the `events` before `t17` have been processed? If the events of `delete A1 from C0` has been delayed due to the network latency, then it would lead to incorrect state: `u2` would see both `A1` and `A2` at `t17`.
@ -47,7 +48,7 @@ Like [TiKV](https://github.com/tikv/tikv), Milvus 2.0 provides `TSO` service. Al
service RootCoord {
...
rpc AllocTimestamp(AllocTimestampRequest) returns (AllocTimestampResponse) {}
...
...
}
message AllocTimestampRequest {
@ -61,6 +62,7 @@ message AllocTimestampResponse {
uint32 count = 3;
}
```
`Timestamp` is of type `uint64`, containing physical and logical parts.
This is the format of `Timestamp`
@ -70,8 +72,10 @@ This is the format of `Timestamp`
In an `AllocTimestamp` request, if `AllocTimestampRequest.count` is greater than `1`, `AllocTimestampResponse.timestamp` indicates the first available timestamp in the response.
## Time Synchronization
To understand the `Time Synchronization` better, let's introduce the data operation of Milvus 2.0 briefly.
Taking `Insert Operation` as an example.
- User can configure lots of `Proxy` to achieve load balancing, in `Milvus 2.0`
- User can use `SDK` to connect to any `Proxy`
- When `Proxy` receives `Insert` Request from `SDK`, it splits `InsertMsg` into different `MsgStream` according to the hash value of `Primary Key`
@ -82,6 +86,7 @@ Taking `Insert Operation` as an example.
![proxy insert](./graphs/timesync_proxy_insert_msg.png)
Based on the above information, we know that the `MsgStream` has the following characteristics:
- In `MsgStream`, `InsertMsg` from the same `Proxy` must be incremented in timestamp
- In `MsgStream`, `InsertMsg` from different `Proxy` have no relationship in timestamp
@ -96,6 +101,7 @@ So the second problem has turned into this: after reading a message from `MsgStr
For example, when reading a message with timestamp `110` produced by `Proxy2`, but the message with timestamp `80` produced by `Proxy1`, is still in the `MsgStream`. How can this situation be handled?
The following graph shows the core logic of `Time Synchronization System` in `Milvus 2.0`; it should solve the second problem.
- Each `Proxy` will periodically report its latest timestamp of every `MsgStream` to `RootCoord`; the default interval is `200ms`
- For each `Msgstream`, `Rootcoord` finds the minimum timestamp of all `Proxy` on this `Msgstream`, and inserts this minimum timestamp into the `Msgstream`
- When the consumer reads the timestamp inserted by the `RootCoord` on the `MsgStream`, it indicates that all messages with smaller timestamp have been consumed, so all actions that depend on this timestamp can be executed safely
@ -109,7 +115,7 @@ This is the `Proto` that used by `Proxy` to report timestamp to `RootCoord`:
service RootCoord {
...
rpc UpdateChannelTimeTick(internal.ChannelTimeTickMsg) returns (common.Status) {}
...
...
}
message ChannelTimeTickMsg {