OSPP 2024 project:
https://summer-ospp.ac.cn/org/prodetail/247410235?list=org&navpage=org
Solutions:
- parser (planparserv2)
- add CallExpr in planparserv2/Plan.g4
- update parser_visitor and show_visitor
- grpc protobuf
- add CallExpr in plan.proto
- execution (`core/src/exec`)
- add `CallExpr` `ValueExpr` and `ColumnExpr` (both logical and
physical) for function call and function parameters
- function factory (`core/src/exec/expression/function`)
- create a global hashmap when starting milvus (see server.go)
- the global hashmap stores function signatures and their function
pointers, the CallExpr in execution engine can get the function pointer
by function signature.
- custom functions
- empty(string)
- starts_with(string, string)
- add cpp/go unittests and E2E tests
closes: #36559
Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
Relatedt #36887
DirectFoward streaming delete will cause memory usage explode if the
segments number was large. This PR add batching delete API and using it
for direct forward implementation.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #35576
This pr is to cover those cases when queryHook optimize search params
and make the result size insufficient, add retry search mechanism and
add related metrics for alarming.
---------
Signed-off-by: chasingegg <chao.gao@zilliz.com>
issue: #36686
This pr will remove pre-marking segments as L2 during clustering
compaction in version 2.5, and ensure compatibility with version 2.4.
The core of this change is to **ensure that the many-to-many lineage
derivation logic is correct, making sure that both the parent and child
cannot simultaneously exist in the target segment view.**
feature:
- Clustering compaction no longer marks the input segments as L2.
- Add a new field `is_invisible` to `segmentInfo`, and mark segments
that have completed clustering but have not yet built indexes as
`is_invisible` to prevent them from being loaded prematurely."
- Do not mark the input segment as `Dropped` before the clustering
compaction is completed.
- After compaction fails, only the result segment needs to be marked as
Dropped.
compatibility:
- If the upgraded task has not failed, there are no compatibility
issues.
- If the status after the upgrade is `MetaSaved`, then skip the stats
task based on whether TmpSegments is empty.
- If the failure occurs before `MetaSaved`:
- there are no ResultSegments, and InputSegments have not been marked as
dropped yet.
- the level of input segments need to revert to LastLevel
- If the failure occurs after `MetaSaved`:
- ResultSegments have already been generated, and InputSegments have
been marked as Dropped. At this point, simply make the ResultSegments
visible.
- the level of ResultSegments needs to be set to L1(in order to
participate in mixCompaction)
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
issue: #36464
This PR enable balance on querynode with different mem capacity, for
query node which has more mem capactity will be assigned more records,
and query node with the largest difference between assignedScore and
currentScore will have a higher priority to carry the new segment.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Native support for Google cloud storage using the Google Cloud Storage
libraries. Authentication is performed using GCS service account
credentials JSON.
Currently, Milvus supports Google Cloud Storage using S3-compatible APIs
via the AWS SDK. This approach has the following limitations:
1. Overhead: Translating requests between S3-compatible APIs and GCS can
introduce additional overhead.
2. Compatibility Limitations: Some features of the original S3 API may
not fully translate or work as expected with GCS.
To address these limitations, This enhancement is needed.
Related Issue: #36212
issue: #35821
After collection loaded, if we need to increase/decrease collection's
replica, we need to release and load it again.
milvus offers 4 solution to update loaded collection's replica, this PR
aims to dynamic change the replica number without release, and after
replica number changed, milvus will execute load replica or release
replica in async, and the replica loaded status can be checked by
getReplicas API.
Notice that if set too much replicas than querynode can afford,the new
replica won't be loaded successfully until enough querynode joins.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
This PR introduce stats task for import:
1. Define new `Stats` and `IndexBuilding` states for importJob
2. Add new stats step to the import process: trigger the stats task and
wait for its completion
3. Abort stats task if import job failed
issue: https://github.com/milvus-io/milvus/issues/33744
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #33744
This PR includes the following changes:
1. Added a new task type to the task scheduler in datacoord: stats task,
which sorts segments by primary key.
2. Implemented segment sorting in indexnode.
3. Added a new field `FieldStatsLog` to SegmentInfo to store token index
information.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Prevent segments to be written larger than maxSize * expansionRate
See also: #35584
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
---------
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #33285
- move streaming related proto into pkg.
- add v2 message type and change flush message into v2 message.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
- implement streaming service client.
- implement producing and consuming service client by streaming coord
client and streaming node client.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #35087
after qc restarts, and target is not ready yet, if dist_handler try to
update segment dist, it will set legacy level to l0 segment, which may
cause l0 segment be moved to other node, cause search/query failed.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #34746
This PR add segment level field in response of
`GetPersistentSegmentInfo` and `GetQuerySegmentInfo`
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
1. Move the common modules of streamingNode and dataNode to flushcommon
2. Add new GetVChannels interface for rootcoord
issue: https://github.com/milvus-io/milvus/issues/33285
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
This PR removes the dependency of compaction on the ID allocator by
pre-allocating the logID and segmentID.
issue: https://github.com/milvus-io/milvus/issues/33957
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #33285
- add idAlloc interface
- fix binary unsafe bug for message
- fix service discovery lost when repeated address with different server
id
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #33285
- add two grpc resolver (by session and by streaming coord assignment
service)
- add one grpc balancer (by serverID and roundrobin)
- add lazy conn to avoid block by first service discovery
- add some utility function for streaming service
Signed-off-by: chyezh <chyezh@outlook.com>
related: #33544
mainly changes in three aspects:
1. enable setting group_size for group by function
2. separate normal reduce and group by reduce
3. eleminate uncessary padding in search result for reducing
Signed-off-by: MrPresent-Han <chun.han@gmail.com>
Co-authored-by: MrPresent-Han <chun.han@gmail.com>
issue: #33285
- implement producing and consuming server of message
- implement management operation for streaming node server
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #34304
cosine is more widely used in float vectors, and cosine and hamming
distance are 'metrics' which have good geometric properties
Signed-off-by: chasingegg <chao.gao@zilliz.com>