mirror of https://github.com/milvus-io/milvus.git
141 lines
5.0 KiB
Markdown
141 lines
5.0 KiB
Markdown
|
# Create Collection
|
||
|
|
||
|
`Milvus 2.0` uses `Collection` to represent a set of data, like `Table` in a traditional database. User can create or drop `Collection`.
|
||
|
This article introduces the execution path of `CreateCollection`, at the end of this article, you should know which components are involved in `CreateCollection`.
|
||
|
|
||
|
The execution flow of `CreateCollection` is shown in the following figure:
|
||
|
|
||
|
![create_collection](./graphs/dml_create_collection.png)
|
||
|
|
||
|
1. Firstly, `SDK` starts a `CreateCollection` request to `Proxy` via `Grpc`, the `proto` is defined as follows:
|
||
|
|
||
|
```proto
|
||
|
service MilvusService {
|
||
|
...
|
||
|
|
||
|
rpc CreateCollection(CreateCollectionRequest) returns (common.Status) {}
|
||
|
|
||
|
...
|
||
|
}
|
||
|
|
||
|
message CreateCollectionRequest {
|
||
|
// Not useful for now
|
||
|
common.MsgBase base = 1;
|
||
|
// Not useful for now
|
||
|
string db_name = 2;
|
||
|
// The unique collection name in milvus.(Required)
|
||
|
string collection_name = 3;
|
||
|
// The serialized `schema.CollectionSchema`(Required)
|
||
|
bytes schema = 4;
|
||
|
// Once set, no modification is allowed (Optional)
|
||
|
// https://github.com/milvus-io/milvus/issues/6690
|
||
|
int32 shards_num = 5;
|
||
|
}
|
||
|
|
||
|
message CollectionSchema {
|
||
|
string name = 1;
|
||
|
string description = 2;
|
||
|
bool autoID = 3; // deprecated later, keep compatible with c++ part now
|
||
|
repeated FieldSchema fields = 4;
|
||
|
}
|
||
|
|
||
|
```
|
||
|
|
||
|
2. When receiving the `CreateCollection` request, `Proxy` would wrap this request into `CreateCollectionTask`, and pushes this task into `DdTaskQueue` queue. After that, `Proxy` would call `WaitToFinish` method to wait until the task is finished.
|
||
|
|
||
|
```go
|
||
|
type task interface {
|
||
|
TraceCtx() context.Context
|
||
|
ID() UniqueID // return ReqID
|
||
|
SetID(uid UniqueID) // set ReqID
|
||
|
Name() string
|
||
|
Type() commonpb.MsgType
|
||
|
BeginTs() Timestamp
|
||
|
EndTs() Timestamp
|
||
|
SetTs(ts Timestamp)
|
||
|
OnEnqueue() error
|
||
|
PreExecute(ctx context.Context) error
|
||
|
Execute(ctx context.Context) error
|
||
|
PostExecute(ctx context.Context) error
|
||
|
WaitToFinish() error
|
||
|
Notify(err error)
|
||
|
}
|
||
|
|
||
|
type createCollectionTask struct {
|
||
|
Condition
|
||
|
*milvuspb.CreateCollectionRequest
|
||
|
ctx context.Context
|
||
|
rootCoord types.RootCoord
|
||
|
result *commonpb.Status
|
||
|
schema *schemapb.CollectionSchema
|
||
|
}
|
||
|
```
|
||
|
|
||
|
3. There is a background service in `Proxy`, this service would get the `CreateCollectionTask` from `DdTaskQueue`, and execute it in three phases.
|
||
|
|
||
|
- `PreExecute`, do some static checking at this phase, such as check if `Collection Name` and `Field Name` are legal, if there are duplicate columns, etc.
|
||
|
- `Execute`, at this phase, `Proxy` would send `CreateCollection` request to `RootCoord` via `Grpc`, and wait for response, the `proto` is defined as follows:
|
||
|
|
||
|
```proto
|
||
|
service RootCoord {
|
||
|
...
|
||
|
|
||
|
rpc CreateCollection(milvus.CreateCollectionRequest) returns (common.Status){}
|
||
|
|
||
|
...
|
||
|
}
|
||
|
```
|
||
|
|
||
|
- `PostExecute`, `CreateCollectonTask` does nothing at this phase, and return directly.
|
||
|
|
||
|
4. `RootCoord` would wrap the `CreateCollection` request into `CreateCollectionReqTask`, and then call function `executeTask`. `executeTask` would return until the `context` is done or `CreateCollectionReqTask.Execute` is returned.
|
||
|
|
||
|
```go
|
||
|
type reqTask interface {
|
||
|
Ctx() context.Context
|
||
|
Type() commonpb.MsgType
|
||
|
Execute(ctx context.Context) error
|
||
|
Core() *Core
|
||
|
}
|
||
|
|
||
|
type CreateCollectionReqTask struct {
|
||
|
baseReqTask
|
||
|
Req *milvuspb.CreateCollectionRequest
|
||
|
}
|
||
|
```
|
||
|
|
||
|
5. `CreateCollectionReqTask.Execute` would alloc `CollecitonID` and default `PartitionID`, and set `Virtual Channel` and `Physical Channel`, which are used by `MsgStream`, then write the `Collection`'s meta into `metaTable`
|
||
|
|
||
|
6. After `Collection`'s meta written into `metaTable`, `Milvus` would consider this collection has been created successfully.
|
||
|
|
||
|
7. `RootCoord` would alloc a timestamp from `TSO` before writing `Collection`'s meta into `metaTable`, and this timestamp is considered as the point when the collection was created
|
||
|
|
||
|
8. At last `RootCoord` will send a message of `CreateCollectionRequest` into `MsgStream`, and other components, who have subscribed to the `MsgStream`, would be notified. The `Proto` of `CreateCollectionRequest` is defined as follows:
|
||
|
|
||
|
```proto
|
||
|
message CreateCollectionRequest {
|
||
|
common.MsgBase base = 1;
|
||
|
string db_name = 2;
|
||
|
string collectionName = 3;
|
||
|
string partitionName = 4;
|
||
|
int64 dbID = 5;
|
||
|
int64 collectionID = 6;
|
||
|
int64 partitionID = 7;
|
||
|
// `schema` is the serialized `schema.CollectionSchema`
|
||
|
bytes schema = 8;
|
||
|
repeated string virtualChannelNames = 9;
|
||
|
repeated string physicalChannelNames = 10;
|
||
|
}
|
||
|
|
||
|
```
|
||
|
|
||
|
9. After the above operations, `RootCoord` would update the internal timestamp and return, so `Proxy` would get the response.
|
||
|
|
||
|
_Notes:_
|
||
|
|
||
|
1. In `Proxy`, all `DDL` requests will be wrapped into `task`, and push the `task` into `DdTaskQueue`.
|
||
|
A background service will read a new `task` from `DdTaskQueue` only when the previous one is finished.
|
||
|
So all the `DDL` requests are executed serially on `Proxy`.
|
||
|
|
||
|
2. In `RootCoord`, all `DDL` requests will be wrapped into `reqTask`, but there is no task queue, so the `DDL` requests will be executed in parallel on `RootCoord`.
|