[skip ci]Fix typo & grammar in design doc of drop collection (#8358)

Signed-off-by: Jael Gu <mengjia.gu@zilliz.com>
pull/8409/head
Jael Gu 2021-09-23 16:40:25 +08:00 committed by GitHub
parent 11de20c8e5
commit d284f8dbc8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
1 changed files with 24 additions and 28 deletions

View File

@ -1,5 +1,5 @@
# Drop Collection
`Milvus 2.0` use `Collection` to represent a set of data, like `Table` in traditional database. Users can create or drop `Collection`. Altering the `Schema` of `Collection` is not supported yet. This article introduces the execution path of `Drop Collection`, at the end of this article, you should know which components are involved in `Drop Collection`.
`Milvus 2.0` uses `Collection` to represent a set of data, like `Table` in traditional database. Users can create or drop `Collection`. Altering the `Schema` of `Collection` is not supported yet. This article introduces the execution path of `Drop Collection`. At the end of this article, you should know which components are involved in `Drop Collection`.
The execution flow of `Drop Collection` is shown in the following figure:
@ -22,7 +22,7 @@ message DropCollectionRequest {
}
```
2. When received the `DropCollection` request, the `Proxy` would wraps this request into `DropCollectionTask`, and pushs this task into `DdTaskQueue` queue. After that, `Proxy` would call method of `WatiToFinish` to wait until the task finished.
2. Once the `DropCollection` request is received, the `Proxy` would wrap this request into `DropCollectionTask`, and push this task into `DdTaskQueue` queue. After that, `Proxy` would call method of `WatiToFinish` to wait until the task is finished.
```go
type task interface {
TraceCtx() context.Context
@ -52,9 +52,9 @@ type DropCollectionTask struct {
}
```
3. There is a backgroud service in `Proxy`, this service would get the `DropCollectionTask` from `DdTaskQueue`, and executes it in three phases.
3. There is a background service in `Proxy`, this service would get the `DropCollectionTask` from `DdTaskQueue`, and execute it in three phases:
- `PreExecute`, do some static checking at this phase, such as check if `Collection Name` is legal etc.
- `Execute`, at thie phase, `Proxy` would send `DropCollection` request to `RootCoord` via `Grpc`,and wait the reponse, the `proto` is defined as follow:
- `Execute`, at this phase, `Proxy` would send `DropCollection` request to `RootCoord` via `Grpc`, and wait the response, the `proto` is defined as below:
```proto
service RootCoord {
...
@ -64,9 +64,9 @@ type DropCollectionTask struct {
...
}
```
- `PostExecute`, `Proxy` would delete the `Collection`'s meta from global meta table at this phase.
- `PostExecute`, `Proxy` would delete `Collection`'s meta from global meta table at this phase.
4. `RootCoord` would wraps the `DropCollection` request into `DropCollectionReqTask`, and then call function `executeTask`. `executeTask` would return until the `context` is done or `DropCollectionReqTask.Execute` returned.
4. `RootCoord` would wrap the `DropCollection` request into `DropCollectionReqTask`, and then call function `executeTask`. `executeTask` would return until the `context` is done or `DropCollectionReqTask.Execute` is returned.
```go
type reqTask interface {
Ctx() context.Context
@ -81,13 +81,13 @@ type DropCollectionReqTask struct {
}
```
5. Firstly, `RootCoord` would delete `Collection`'s meta from `metaTable`, including `schema`,`partition`, `segment`,`index`, all of these delete operations are committed in one transaction.
5. Firstly, `RootCoord` would delete `Collection`'s meta from `metaTable`, including `schema`,`partition`, `segment`,`index`. All of these delete operations are committed in one transaction.
6. After `Collection`'s meta has been deleted from `metaTable`, `Milvus` would consider this collection has been deleted successfully.
7. `RootCoord` would alloc a timestamp from `TSO` before deleting `Collection`'s meta from `metaTable`, and this timestamp is considered as the point when the collection was deleted.
7. `RootCoord` would alloc a timestamp from `TSO` before deleting `Collection`'s meta from `metaTable`. This timestamp is considered as the point when the collection was deleted.
8. `RoooCoord` would send a message of `DropCollectionRequest` into `MsgStream`, and other components, who has subscribe to the `MsgStream`, would be notified. The `Proto` of `DropCollectionRequest` is defined as follow:
8. `RootCoord` would send a message of `DropCollectionRequest` into `MsgStream`. Thus other components, who have subscribed to the `MsgStream`, would be notified. The `Proto` of `DropCollectionRequest` is defined as below:
```proto
message DropCollectionRequest {
common.MsgBase base = 1;
@ -101,7 +101,7 @@ message DropCollectionRequest {
9. After these operations, `RootCoord` would update internal timestamp.
10. Then `RootCoord` would start a `ReleaseCollection` request to `QueryCoord` via `Grpc` , notify `QueryCoord` to release all the resouces that related to this `Collection`. This `Grpc` request is done in another `goroutine`, so it would not block the main thread. The `proto` is defined as follow:
10. Then `RootCoord` would start a `ReleaseCollection` request to `QueryCoord` via `Grpc` , notify `QueryCoord` to release all resources that related to this `Collection`. This `Grpc` request is done in another `goroutine`, so it would not block the main thread. The `proto` is defined as follows:
```proto
service QueryCoord {
...
@ -119,7 +119,7 @@ message ReleaseCollectionRequest {
}
```
11. At last, `RootCoord` would send `InvalidateCollectionMetaCache` request to each `Proxy`, notify `Proxy` to remove `Collection`'s meta, the `proto` is defined as follow:
11. At last, `RootCoord` would send `InvalidateCollectionMetaCache` request to each `Proxy`, notify `Proxy` to remove `Collection`'s meta. The `proto` is defined as follows:
```proto
service Proxy {
...
@ -136,16 +136,16 @@ message InvalidateCollMetaCacheRequest {
}
```
12. The execution flow of `QueryCoord.ReleaseCollection` is shown in the follwing figure.
12. The execution flow of `QueryCoord.ReleaseCollection` is shown in the following figure:
![releae_collection](./graphs/dml_release_collection.png)
![release_collection](./graphs/dml_release_collection.png)
13. `QueryCoord` would wraps `ReleaseCollection` into `ReleaseCollectionTask`, and push the task into `TaskScheduler`
13. `QueryCoord` would wrap `ReleaseCollection` into `ReleaseCollectionTask`, and push the task into `TaskScheduler`
14. There is a backgroud service in `QueryCoord`, this service would get the `ReleaseCollectionTask` from `TaskScheduler`, and executes it in three phases.
14. There is a background service in `QueryCoord`. This service would get the `ReleaseCollectionTask` from `TaskScheduler`, and execute it in three phases:
- `PreExecute`, `ReleaseCollectionTask` would only print debug log at this phase.
- `Execute`, there are two jobs at this phase:
- send a `ReleaseDQLMessageStream` request to `RootCoord` via `Grpc`, `RootCoord` would redirect the `ReleaseDQLMessageStream` request to each `Proxy`, notify the `Proxy` that not processing any message of this `Collection` anymore. The `proto` is defined as follow:
- send a `ReleaseDQLMessageStream` request to `RootCoord` via `Grpc`, `RootCoord` would redirect the `ReleaseDQLMessageStream` request to each `Proxy`, and notify the `Proxy` that stop processing any message of this `Collection` anymore. The `proto` is defined as follows:
```proto
message ReleaseDQLMessageStreamRequest {
common.MsgBase base = 1;
@ -153,7 +153,7 @@ message InvalidateCollMetaCacheRequest {
int64 collectionID = 3;
}
```
- send a `ReleaseCollection` request to each `QueryNode` via `Grpc`, notify the `QueryNode` to release all the resources related to this `Collection`, including `Index`, `Segment`, `FlowGraph`, etc. `QueryNode` would no longer read any message from this `Collection`'s `MsgStream` anymore
- send a `ReleaseCollection` request to each `QueryNode` via `Grpc`, and notify the `QueryNode` to release all the resources related to this `Collection`, including `Index`, `Segment`, `FlowGraph`, etc. `QueryNode` would no longer read any message from this `Collection`'s `MsgStream` anymore
```proto
service QueryNode {
...
@ -172,18 +172,14 @@ message InvalidateCollMetaCacheRequest {
```
- `PostExecute`, `ReleaseCollectionTask` would only print debug log at this phase.
15. After these operations, `QueryCoord` would send `ReleaseCollection`'s reponse to `RootCoord
15. After these operations, `QueryCoord` would send `ReleaseCollection`'s response to `RootCoord`.
16. At `Step 8`, `RoooCoord` has sent a message of `DropCollectionRequest` into `MsgStream`, `DataNode` would subscribe this `MsgStream`, so `DataNode` would be notified to released the resources. The execution flow is shown in the following figure.
16. At `Step 8`, `RootCoord` has sent a message of `DropCollectionRequest` into `MsgStream`. `DataNode` would subscribe this `MsgStream`, so that it would be notified to release related resources. The execution flow is shown in the following figure.
![releae_collection](./graphs/dml_release_flow_graph_on_data_node.png)
![release_collection](./graphs/dml_release_flow_graph_on_data_node.png)
17. In `DataNode`, each `MsgStream` will have a `FlowGraph`, all the messages are processed by that `FlowGraph`. When the `DataNode` receives the message of `DropCollectionRequest`, `DataNode` would notify `BackouGroundGC`, which is a background service on `DataNode`, to release the resouces.
17. In `DataNode`, each `MsgStream` will have a `FlowGraph`, which processes all messages. When the `DataNode` receives the message of `DropCollectionRequest`, `DataNode` would notify `BackGroundGC`, which is a background service on `DataNode`, to release resources.
*Notes*:
1. Currently, the `DataCoord` has not response to the `DropCollection`, so the `Collection`'s `segment meta` are still exist in the `DataCoord`'s `metaTable`, and the `Binlog` files belongs to this `Collection` are still exist on the persistent storage.
2. Currently, the `IndexCoord` has not response to the `DropCollection`, so the `Collection`'s `index file` are still exist on the persistent storage.
1. Currently, the `DataCoord` doesn't have response to the `DropCollection`. So the `Collection`'s `segment meta` still exists in the `DataCoord`'s `metaTable`, and the `Binlog` files belonging to this `Collection` still exist in the persistent storage.
2. Currently, the `IndexCoord` doesn't have response to the `DropCollection`. So the `Collection`'s `index file` still exists in the persistent storage.