5.0 KiB
Create Collection
Milvus 2.0
uses Collection
to represent a set of data, like Table
in a traditional database. User can create or drop Collection
.
This article introduces the execution path of CreateCollection
, at the end of this article, you should know which components are involved in CreateCollection
.
The execution flow of CreateCollection
is shown in the following figure:
- Firstly,
SDK
starts aCreateCollection
request toProxy
viaGrpc
, theproto
is defined as follows:
service MilvusService {
...
rpc CreateCollection(CreateCollectionRequest) returns (common.Status) {}
...
}
message CreateCollectionRequest {
// Not useful for now
common.MsgBase base = 1;
// Not useful for now
string db_name = 2;
// The unique collection name in milvus.(Required)
string collection_name = 3;
// The serialized `schema.CollectionSchema`(Required)
bytes schema = 4;
// Once set, no modification is allowed (Optional)
// https://github.com/milvus-io/milvus/issues/6690
int32 shards_num = 5;
}
message CollectionSchema {
string name = 1;
string description = 2;
bool autoID = 3; // deprecated later, keep compatible with c++ part now
repeated FieldSchema fields = 4;
}
- When receiving the
CreateCollection
request,Proxy
would wrap this request intoCreateCollectionTask
, and pushes this task intoDdTaskQueue
queue. After that,Proxy
would callWaitToFinish
method to wait until the task is finished.
type task interface {
TraceCtx() context.Context
ID() UniqueID // return ReqID
SetID(uid UniqueID) // set ReqID
Name() string
Type() commonpb.MsgType
BeginTs() Timestamp
EndTs() Timestamp
SetTs(ts Timestamp)
OnEnqueue() error
PreExecute(ctx context.Context) error
Execute(ctx context.Context) error
PostExecute(ctx context.Context) error
WaitToFinish() error
Notify(err error)
}
type createCollectionTask struct {
Condition
*milvuspb.CreateCollectionRequest
ctx context.Context
rootCoord types.RootCoord
result *commonpb.Status
schema *schemapb.CollectionSchema
}
-
There is a background service in
Proxy
, this service would get theCreateCollectionTask
fromDdTaskQueue
, and execute it in three phases.PreExecute
, do some static checking at this phase, such as check ifCollection Name
andField Name
are legal, if there are duplicate columns, etc.Execute
, at this phase,Proxy
would sendCreateCollection
request toRootCoord
viaGrpc
, and wait for response, theproto
is defined as follows:
service RootCoord { ... rpc CreateCollection(milvus.CreateCollectionRequest) returns (common.Status){} ... }
PostExecute
,CreateCollectonTask
does nothing at this phase, and return directly.
-
RootCoord
would wrap theCreateCollection
request intoCreateCollectionReqTask
, and then call functionexecuteTask
.executeTask
would return until thecontext
is done orCreateCollectionReqTask.Execute
is returned.
type reqTask interface {
Ctx() context.Context
Type() commonpb.MsgType
Execute(ctx context.Context) error
Core() *Core
}
type CreateCollectionReqTask struct {
baseReqTask
Req *milvuspb.CreateCollectionRequest
}
-
CreateCollectionReqTask.Execute
would allocCollecitonID
and defaultPartitionID
, and setVirtual Channel
andPhysical Channel
, which are used byMsgStream
, then write theCollection
's meta intometaTable
-
After
Collection
's meta written intometaTable
,Milvus
would consider this collection has been created successfully. -
RootCoord
would alloc a timestamp fromTSO
before writingCollection
's meta intometaTable
, and this timestamp is considered as the point when the collection was created -
At last
RootCoord
will send a message ofCreateCollectionRequest
intoMsgStream
, and other components, who have subscribed to theMsgStream
, would be notified. TheProto
ofCreateCollectionRequest
is defined as follows:
message CreateCollectionRequest {
common.MsgBase base = 1;
string db_name = 2;
string collectionName = 3;
string partitionName = 4;
int64 dbID = 5;
int64 collectionID = 6;
int64 partitionID = 7;
// `schema` is the serialized `schema.CollectionSchema`
bytes schema = 8;
repeated string virtualChannelNames = 9;
repeated string physicalChannelNames = 10;
}
- After the above operations,
RootCoord
would update the internal timestamp and return, soProxy
would get the response.
Notes:
-
In
Proxy
, allDDL
requests will be wrapped intotask
, and push thetask
intoDdTaskQueue
. A background service will read a newtask
fromDdTaskQueue
only when the previous one is finished. So all theDDL
requests are executed serially onProxy
. -
In
RootCoord
, allDDL
requests will be wrapped intoreqTask
, but there is no task queue, so theDDL
requests will be executed in parallel onRootCoord
.