Commit Graph

20660 Commits (10kcp)

Author SHA1 Message Date
congqixia c8ba682aaf
enhance: [2.4] Use cancel label for ctx canceled storage op (#37468) (#37491)
Cherry-pick from master
pr: #37468

Previously failed label is used for canceled storage op, which may cause
wrong alarm when user cancel load operation or etc. This PR utilizes
cancel label when such case happens.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-07 12:38:26 +08:00
cai.zhang 651a56e3dd
enhance: [2.4]Update the template expression proto to improve transmission efficiency (#37485)
issue: #36672 

master pr: #37484

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-07 12:12:25 +08:00
Zhen Ye cea8c756d4
fix: repeated error code in milvus and segcore (#37449)
issue: #37357
pr: #37359

Signed-off-by: chyezh <chyezh@outlook.com>
2024-11-07 10:46:25 +08:00
cai.zhang 4ae5337343
enhance: [2.4] Refine error message for contains array (#37443)
issue: #36221 

master pr: #37383

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-07 10:40:25 +08:00
XuanYang-cn 20534a3f7b
fix: [cp24]Saperate L0 and Mix trigger interval (#37319)
See also: #37108
pr: #37190

- Add MixCompactionTriggerInterval, default 60s
- Add L0CompactionTriggerInterval, default 10s
- Export Single related compaction configs
- Raise SingleCompactionDeltaLogMaxSize from 2MB to 16MB

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-06 11:10:26 +08:00
yellow-shine af5e32d00b
enhance: refine the pipeline (#37456)
https://github.com/milvus-io/milvus/pull/37412

---------

Signed-off-by: Yellow Shine <sammy.huang@zilliz.com>
2024-11-06 10:24:30 +08:00
sre-ci-robot 28cb357de3
[automated] Bump milvus version to v2.4.15 (#37457)
Bump milvus version to v2.4.15
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-05 21:18:32 +08:00
congqixia b7c80f9b83
enhance: Bump milvus & proto version to v2.4.15 (#37435)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-05 14:46:24 +08:00
congqixia c195f9f76a
enhance: [2.4] Pass rpc stats via gin.Context (#37440)
Cherry pick from master
pr: #37439
Related #37223

RPC stats worked in middleware but faild to get method & collection info

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-05 14:24:24 +08:00
wei liu 6b69170a64
fix: proxy retry to get shard leader on unloaded collection (#37326)
issue: #37115

pr#37116 let proxy retry to get shard leader if error happens, which
cause if search/query on a unloaded collection, which will keep retrying
until ctx done.

This PR add error type check to skip retry on ErrCollectionLoaded.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-05 11:02:25 +08:00
yihao.dai 380662153f
fix: [2.4] Revert "enhance: Support db for bulkinsert (#37012) (#37017)" (#37421)
This reverts commit d6adc62765.

issue: https://github.com/milvus-io/milvus/issues/31273

pr: https://github.com/milvus-io/milvus/pull/37420

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-11-05 10:48:24 +08:00
wei liu eb712f0db9
fix: dead lock if query node crash during shard client init (#37354)
issue: #37115

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-05 10:46:32 +08:00
XuanYang-cn 28fd217e27
fix: [cp24]l0RowCount metrics value always empty (#37307)
See also: #36953
pr: #37306

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-11-04 15:34:24 +08:00
cai.zhang 4fb86eb17d
fix: [2.4] Fix the bug where some expressions do not correctly parse the value (#37342)
issue: #37274

master pr: #37341

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-11-03 18:42:23 +08:00
congqixia ce7d4090f1
enhance: [2.4] Move forward l0 logic out of delta lock (#37340)
Cherry pick from master
pr: #37337
Related to #35303

`deleteMut` shall be protecting streaming delete buffer, forward l0
could be move out of the rlock section to reduce tsafe impact from
loading segments.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-11-01 14:42:22 +08:00
wei liu 3c09d42bfc
fix: [skip e2e] TestNodeDownOnSingleReplica has unstable result (#37288) (#37350)
issue: #37289
pr: #37288
those test case use search to verify replica's status, but if the search
gap is 1s, the node down's effect may be fixed up by balance.

This PR remove the 1 second gap between search operation.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-11-01 13:48:22 +08:00
SimFG d0e78cef06
enhance: [2.4] update the expr version to fix the method call error (#37260)
/kind improvement
- pr: #37259

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-31 15:00:23 +08:00
XuanYang-cn 6109e9d69e
fix: Skip mark compaction timeout for mix and l0 compaction (#37118) (#37194)
Timeout is a bad design for long running tasks, especially using a
static timeout config. We should monitor execution progress and fail the
task if the progress has been stale for a long time.

This pr is a small patch to stop DC from marking compaction tasks
timeout, while still waiting for DN to finish. The design is
self-conflicted. After this pr, mix and L0 compaction are no longer
controlled by DC timeout, but clustering is still under timeout control.

The compaction queue capacity grows larger for priority calc, hence
timeout compactions appears more often, and when timeout, the queuing
tasks will be timeout too, no compaction will success after.

See also: #37108, #37015
pr: #37118

---------

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-31 10:36:21 +08:00
congqixia 1a09d6385e
enhance: [2.4] Release compacted growing segment if in dropped list (#37245) (#37266)
Cherry-pick from master
pr: #37245
See also #37205

Previously releasing growing segments could be triggered by two
conditions:

- Sealed Segment with same id is loaded
- Segment start position is before target checkpoint ts

Which has a worst case that the corresponding sealed segment is
compacted and the checkpoint is pinned by a growing l0 segment.

This PR introduces a new rule that: a growing segment could be released
if the segment id appeared in current target dropped segment id list.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-31 10:14:22 +08:00
nico 771fad51b3
test: update pymilvus version and test cases (#37301)
Signed-off-by: nico <cheng.yuan@zilliz.com>
2024-10-31 09:40:22 +08:00
congqixia 37d691f458
fix: [2.4] Rectify `OffsetOrderedArray` contain logic (#37309)
Cherry pick from master
pr: #37305 
Related to #36887

Remove non-hit pk delete record logic does not work since
`insert_record_.contain` does not work due to logic problem.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-30 21:16:22 +08:00
congqixia a2a51c489e
fix: [2.4] Check resource when loading deltalogs (#37195) (#37263)
Cherry pick from master
pr: #37195
Related to #36887

`LoadDeltaLogs` API did not check memory usage. When system is under
high delete load pressure, this could result into OOM quit.

This PR add resource check for `LoadDeltaLogs` actions and separate
internal deltalog loading function with public one.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-30 11:54:41 +08:00
yellow-shine ce7fbb9439 Bump milvus version to v2.4.14 (#37252)
Signed-off-by: Yellow Shine <sammy.huang@zilliz.com>
2024-10-29 21:34:29 +08:00
aoiasd 8370caa4a6
enhance: [Cherry-pick]Add collection name label for some metric (#36951) (#37159)
pr: https://github.com/milvus-io/milvus/pull/36951

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-10-29 17:38:22 +08:00
cai.zhang 05c40522ce
enhance: [cherry-pick ]Enhance the expression template to support AND and OR operations (#37217)
issue: #36672

master pr: #37033

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-29 15:38:40 +08:00
congqixia 3d1e81fb31
fix: [2.4] Use singleton delete pool and avoid goroutine leakage (#37225)
Cherry-pick from master
pr: #37220
Related to #36887

Previously using newly create pool per request shall cause goroutine
leakage. This PR change this behavior by using singleton delete pool.
This change could also provide better concurrency control over delete
memory usage.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 14:44:23 +08:00
congqixia 0b284ccc23
enhance: Bump milvus & proto version to v2.4.14 (#37198)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:44:25 +08:00
congqixia 49147524be
enhance: [2.4] Use middleware to observe restful v2 in/out rpc stats (#37224)
Cherry pick from master
pr: #37223
Related to #36102

Previous PR #36107 add grpc inteceptor to observe rpc stats. Using same
strategy, this pr add gin middleware to observer restful v2 rpc stats.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-29 10:26:24 +08:00
congqixia b44ef8207e
fix: [2.4] Check whether new collection name is alias (#36981) (#37208)
Cherry pick from master
pr: #36981

Related to #36963

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 22:46:24 +08:00
wei liu 79e6ef2617
fix: Search/Query may failed during updating delegator cache (#37174)
issue: #37115
pr: #37116
casue init query node client is too heavy, so we remove
updateShardClient from leader mutex, which cause much more concurrent
cornor cases.

This PR delay query node client's init operation until `getClient` is
called, then use leader mutex to protect updating shard client progress
to avoid concurrent issues.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-28 20:08:25 +08:00
cai.zhang 9c0f59488a
feat: [cherry-pick]The expression supports filling elements through templates (#37058)
issue: #36672 

master pr: #37033 

milvus-proto pr: https://github.com/milvus-io/milvus-proto/pull/332

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-10-28 15:17:30 +08:00
XuanYang-cn 4cb5b2c3b5
fix: [cp24]Exlude L0 compaction when clustering is executing (#37142)
Also remove conflit check when executing L0. The exclusive is already
guarenteed in scheduler

See also: #37140
pr: #37141

---------

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-10-28 15:01:30 +08:00
congqixia 223badc482
fix: [2.4] Ref collection meta when load l0 segment meta only (#37179)
Cherry pick from master
pr: #37178
Related to #37177

Previous PR #37160

Collection meta is not ref-ed when loading l0 segment in `RemoteLoad`
policy, which cause collection meta release when lots of l0 segment
released.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-28 14:07:30 +08:00
congqixia 9d37ade24f
enhance: [2.4] Make skip load work for all branches (#37161)
Cherry-pick from master
pr: #37160
Related to #37112

Skip load logic used to work only when there is multiple segment load
info entires in load request. In continous delete case, delegator still
loads l0 segment, which occupies lot of memory.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 22:11:30 +08:00
yihao.dai d30e27e6f9
enhance: Make dataNode.import.maxConcurrentTaskNum dynamic (#37102) (#37103)
Resize import execution pool when config
`dataNode.import.maxConcurrentTaskNum` update.

issue: https://github.com/milvus-io/milvus/issues/37095

pr: https://github.com/milvus-io/milvus/pull/37102

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 18:21:29 +08:00
yihao.dai da897e41f4
fix: Fix collection leak in querynode (#37061) (#37079)
Unref the removed L0 segment count.

issue: https://github.com/milvus-io/milvus/issues/36918

pr: https://github.com/milvus-io/milvus/pull/37061

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 18:19:39 +08:00
SimFG ae4ce9bbba
enhance: [2.4] allow to delete data when disk quota exhausted (#37139)
- issue: #37133
- pr: #37134

Signed-off-by: SimFG <bang.fu@zilliz.com>
2024-10-25 16:07:32 +08:00
Xiaofan 2dc89b1cad
enhance: upgrade minio dependency (#37089)
fix #34910
upgrade minio dependency

Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
2024-10-25 15:05:30 +08:00
wei liu 057bfbe678
fix: Delegator may becomes unserviceable after querycoord restart (#37055) (#37100)
issue: #37054
pr: #37055
after querycoord restart, segment_checker may release segment by mistake
due to next target isn't ready yet.

This PR requires release segment must happens after next target is
ready.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-25 14:55:31 +08:00
foxspy ba8328727f
enhance: Update Knowhere version (#37132)
/kind branch-feature

release note:
https://github.com/zilliztech/knowhere/releases/tag/v2.3.12

Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
2024-10-25 14:47:30 +08:00
yihao.dai ca2057c57d
enhance: Tidy import options (#37077) (#37078)
1. Tidy import options.
2. Tidy common import util functions.

issue: https://github.com/milvus-io/milvus/issues/34150

pr: https://github.com/milvus-io/milvus/pull/37077

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-25 14:35:45 +08:00
congqixia 6bc8aba17f
enhance: [2.4] Batch forward delete when using DirectForward (#37076) (#37107)
Cherry pick from master
pr: #37076
Related #36887

DirectFoward streaming delete will cause memory usage explode if the
segments number was large. This PR add batching delete API and using it
for direct forward implementation.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 11:53:29 +08:00
congqixia 79891f047d
enhance: [2.4]Skip load delta data in delegater when using RemoteLoad (#37082) (#37112)
Cherry-pick from master
pr: #37082
Related to #35303

Delta data is not needed when using `RemoteLoad` l0 forward policy. By
skipping load delta data, memory pressure could be eased if l0 segment
size/number is large.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-25 11:41:30 +08:00
wei liu 59b2563029
fix: Dynamic release parition may fail search/query. (#37049) (#37099)
issue: #33550
pr: #37049
cause wrong impl of UpdateCollectionNextTarget, if ReleaseCollection and
UpdateCollectionNextTarget happens at same time, the the released
partition's segment list may be add to target again, and delegator will
be marked as unserviceable due to lack of segment.

This PR fix the impl of UpdateCollectionNextTarget

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-10-24 18:01:30 +08:00
congqixia 3db137f4ad
enhance: [2.4] Add metrics for querynode delete buffer info (#37081) (#37097)
Cherry pick from master
pr: #37081
Related to #35303

This PR add metrics for querynode delegator delete buffer information,
which is related to dml quota logic.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-24 16:21:37 +08:00
yellow-shine 2abc28ec4f
enhance: [2.4]update nightly ci (#36423)
https://github.com/milvus-io/milvus/pull/37080

1. enhance: send email to qa when nightly ci failure

Signed-off-by: Yellow Shine <sammy.huang@zilliz.com>
2024-10-23 17:45:34 +08:00
presburger 27a4fe002a
enhance:change gpu default mem pool size (#36969)
Signed-off-by: yusheng.ma <yusheng.ma@zilliz.com>
2024-10-23 17:17:28 +08:00
yihao.dai d6adc62765
enhance: Support db for bulkinsert (#37012) (#37017)
issue: https://github.com/milvus-io/milvus/issues/31273

pr: https://github.com/milvus-io/milvus/pull/37012

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-10-23 16:31:29 +08:00
congqixia b958c56f48
fix: [2.4] Rectify delete buffer row count quota value (#37060) (#37068)
Cherry pick from master
pr: #37060

Related to #37057

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-23 14:13:28 +08:00
congqixia 7eba3aa67e
fix: [2.4] Pass full field list when partial load enabled (#37053) (#37063)
Cherry-pick from master
pr: #37053

Related to #37038

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-10-23 11:03:28 +08:00