Commit Graph

18863 Commits (578c38a2f7cc5451b87fb54509c52b181e7c6e97)

Author SHA1 Message Date
aoiasd 7c234f23c3
fix: double buffer was invalid when put entry which size larger than max size (#31549)
relate: https://github.com/milvus-io/milvus/issues/31548

---------

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-03-23 21:09:07 +08:00
congqixia 368180bce4
fix: [2.3] Check nodeID before update channel checkpoint (#31473) (#31508)
Cherry-pick from master
pr: #31473
See also #31470 #31506

This PR adds nodeID assignment verification before updating channel
checkpoints.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-23 07:07:07 +08:00
Jiquan Long ab059bb064
enhance: add more metrics (#31271) (#31511)
/kind improvement
pr: #31271 
fix: https://github.com/milvus-io/milvus/issues/31272

This pr add more metrics, which are:

Slow query count, which the duration considered as slow can be
configurable;
Number of deleted entities;
Number of entities per collection;
Number of loaded entities per collection;
Number of indexed entities;
Number of indexed entities, per collection, per index and whether it's a
vetor index;
Quota states (LongTimeTickDelay, MemoryExhuasted, DiskQuotaExhuasted)
per database;

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-03-22 16:11:07 +08:00
wei liu ef523bfef3
fix: Unstable ut TestGetClientFailed (#31296) (#31472)
issue: #31295
pr: #31296

This PR fix unstable ut TestGetClientFailed

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-22 11:01:07 +08:00
wei liu 0bf595a513
enhance: Speed up target recovery after query coord restart (#31240) (#31449)
issue: #28491
pr: #31240

after querycoord restart, it will pull a new target, which include
channel and segment list. when segments loaded on querynode has reached
the target, the collection could provide search/query. but if segment
list changes by time, ater querycoord pull a new target, it will takes a
few minutes to catch up the target's segment distribution. and before
that, query/search will fail due to lack of segments.

This PR save the current loaded target to meta storein querycoord's stop
progress, and recover it when query coord starts, to speed up the target
recovery time.

---------

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-22 10:27:17 +08:00
wei liu f8496dbc73
fix: Balance channel stuck forever due to logic dead lock (#31202) (#31455)
issue: #30816
pr: #31202

cause balance channel will stuck until leader view catch up the current
target, then start to unsub the old delegator. which make sure that the
new delegator can provide search before release old delegator. but
another logic in segment_checker skip loading segment during balance
channel. so during balance channel, if query node crash, new delegator
can't catch up target forever, then stuck forever.

This PR remove the rule that skip loading segment during balance channel
to avoid the logic dead lock here.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-21 18:11:07 +08:00
wei liu c8658d17f8
fix: Grpcclient return unrecoverable error (#31256) (#31452)
issue: #31222
pr: #31256

grpcclient's `call` func return a unrecoverable error, then the caller's
retry policy also breaks due to this unrecoverable error.

This PR introduce `retry.Handle`, the new func use `func() (bool,
error)` as input parameters, which return `shouldRetry` directly, to
avoid grpcclient return a unrecoverable error

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-21 11:59:12 +08:00
wei liu 6b761204ce
fix: Set node unreachable when get shard client failed (#31277) (#31451)
issue: #30531
pr: #31277

cause get client from `shardClientMgr`, doesn't means query node is
unavailable. because of the ref counter policy in `shardClientMgr`,
which will clean the client, if no collection use qn as shard leader.

This PR fix that set node unreachable when get shard client failed.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-21 11:57:08 +08:00
wei liu 5994b6a7b0
fix: Search doesn't expire shard leader cache (#31380) (#31450)
issue: #31351
pr: #31380
This PR fixed that search doesn't expire shard leader cache when send
request to query node failed, which make every request keep trying to
connect a offline query node

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-21 11:55:07 +08:00
groot 1ca7cba222
enhance: Support MinIO TLS connection (#31292)
issue: https://github.com/milvus-io/milvus/issues/30709
master pr: #31311

Signed-off-by: yhmo <yihua.mo@zilliz.com>
Co-authored-by: Chen Rao <chenrao317328@163.com>
2024-03-21 11:15:20 +08:00
congqixia 94f3aec80a
enhance: [Cherry-pick] Add metrics for querycoord current target cp lag (#31391) (#31463)
Cherry-pick from master
pr: #31391 #31399
See also #31390

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-21 10:17:07 +08:00
wei liu fef430daed
fix: Wrong behavior of CurrentTargetFirst/NextTargetFirst in target manager(#31379) (#31419)
issue: #31162
pr: #31379

when give scope CurrentTargetFirst/NextTargetFirst, it's expected to
scan both current and next target.

This PR fixed wrong behavior of CurrentTargetFirst/NextTargetFirst in
target manager, which may cause unexpected task generated, and load
collection may stuck forever due to dirty leader view.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-20 23:39:07 +08:00
cai.zhang 52a7eb9548
fix: Fix bug for get segment index state (#31429)
issue: #31361 
master pr: #31427 
2.4 pr: #31428

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-03-20 15:05:06 +08:00
congqixia 86e347a1a4
enhance: [2.3] Cache formatted key for param item (#31388) (#31402)
Cherry-pick from master
pr: #31388 
See also #30806

`formatKey` may cost lots of CPU on string processing under high QPS
scenario, this PR adds a formattedKeys cache preventing string operation
in each param get value.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-19 19:25:10 +08:00
cai.zhang ef530a2324
enhance: When describing an index, fetch the index info in batches (#31239)
issue: #29313 
master pr: #31238

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-03-15 16:37:09 +08:00
sre-ci-robot e77afcb5d5
[automated] Bump milvus version to v2.3.12 (#31303)
Bump milvus version to v2.3.12
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-03-15 16:19:05 +08:00
nico 75a86bc2d3
test: update test cases (#31253)
Signed-off-by: nico <cheng.yuan@zilliz.com>
2024-03-15 15:23:10 +08:00
Jiquan Long 50bfde92f2
fix: wrong num_entities used when mmap variable length data (#30848) (#31274)
https://github.com/milvus-io/milvus/issues/30728
pr: #30848

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-03-14 20:33:03 +08:00
congqixia 4e48a4de0e
enhance: Bump milvus & proto version to v2.3.12 (#31193)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-14 19:09:04 +08:00
jaime 5ddb0b435f
fix: revoke session may be ignored due to server context cancellation in advance (#31213)
issue: #31219
pr: #31220

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-14 19:05:04 +08:00
sre-ci-robot a33751a2d7
[automated] Update Pytest image changes (#31235)
Update Pytest image changes
See changes:
645cc0bdc3
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-03-14 09:59:11 +08:00
nico 645cc0bdc3
test: update test cases (#31161)
Signed-off-by: nico <cheng.yuan@zilliz.com>
2024-03-13 19:05:11 +08:00
sre-ci-robot 5386a2c43e
[automated] Update Pytest image changes (#31108)
Update Pytest image changes
See changes:
005dbf2b24
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-03-13 11:21:19 +08:00
chyezh 7105e0b261
fix: lost dbname when only passing collection id to describeCollection (#31177)
issue: #30931
pr: #31167

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-11 19:51:03 +08:00
aoiasd e747f15c80
fix: flush insert data with nil buffer (#31159)
relate: https://github.com/milvus-io/milvus/issues/31165

Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>
2024-03-11 17:43:03 +08:00
wei liu 9d712f4dd4
fix: Balance param use duplicated key (#31112) (#31141)
pr: #31112
issue: #31115
This PR fix balance check interval  param use duplicated key

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-11 15:03:02 +08:00
wei liu 855f71ac89
fix: Dirty sealed segment won't release after channel balance (#31095) (#31126)
issue: #31074
pr: #31095
This PR fix dirty sealed segment doesn't release after channel balance,
dirty sealed segment means segment doesn't exist in targets.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-11 15:01:11 +08:00
congqixia 3e7f2e8e7d
enhance: [Cherry-Pick] Use `ListIndexes` instead of `DescribeIndex` for qc broker (#31163)
Cherry pick from master 
pr: #31122

See also #31103

Since querycoord need index meta information from datacoord only, broker
shall use `ListIndexes` to skip segment index building check logic in
datacoord

This PR is also related to #30538, in which DescribeIndex caused lots of
memory usage and lead to OOM eventually

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-11 14:41:02 +08:00
pingliu 1dd4f4b4dc
enhance: jemalloc aarch64 platform use 64k pagesize. (#31114)
pr: https://github.com/milvus-io/milvus/pull/29522
enhance: jemalloc aarch64 platform use 64k pagesize.

Signed-off-by: ping.liu <ping.liu@zilliz.com>
2024-03-11 12:03:02 +08:00
congqixia 3c90475d55
enhance: [Cherry-pick] Add `ListIndexes` API from datacoord (#31104) (#31150)
Cherry-pick from master
pr: #31104
See also #31103

This PR add `listIndexes` API for datacoor server to list all indexes
for provided collection.
Comparing to the existing `DescribeIndex` API, the new one does NOT
check the segment index building progress to ease the burden when
invoking it

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-11 10:47:02 +08:00
sre-ci-robot 8211af3a95
[automated] Bump milvus version to v2.3.11 (#31148)
Bump milvus version to v2.3.11
Signed-off-by: sre-ci-robot sre-ci-robot@users.noreply.github.com

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-03-08 18:05:01 +08:00
Jiquan Long c37b7792f4
enhance: purge client infos periodically (#31037) (#31092)
https://github.com/milvus-io/milvus/issues/31007
pr: #31037 

---------

Signed-off-by: longjiquan <jiquan.long@zilliz.com>
2024-03-08 10:17:01 +08:00
zhuwenxing 542b46fb1e
test: add json and array datatype check in restful v1 (#31096)
pr: https://github.com/milvus-io/milvus/pull/31097

* When the collection is created using an SDK and includes array and
JSON datatypes in the schema, data can be inserted using the RESTful
API.
* When the collection is created using the RESTful API and includes JSON
and array datatypes in dynamic fields, data can also be inserted using
the RESTful API.

Signed-off-by: zhuwenxing <wenxing.zhu@zilliz.com>
2024-03-07 19:25:01 +08:00
nico 005dbf2b24
enhance: update pymilvus version (#31004)
Signed-off-by: nico <cheng.yuan@zilliz.com>
2024-03-07 15:17:02 +08:00
congqixia 383ff8b0b1
enhance: [2.3] Add flush trigger for channel cp updater (#31082)
See also #31024  #31058

Flush cost boosted from 2 seconds to 5 or more after the change of
channel updater. This PR add a manual trigger method to accelerate flush
procedure.

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-07 15:15:01 +08:00
yihao.dai 3eeeae8519
fix: Fix errors in the Index service APIs are ignored (#31077) (#31086)
In Index service APIs, return error if occurs instead of always
returning nil. Additionally, add more tests to cover this scenario.

issue: https://github.com/milvus-io/milvus/issues/31069,
https://github.com/milvus-io/milvus/issues/31027

pr: https://github.com/milvus-io/milvus/pull/31077

---------

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-06 22:55:00 +08:00
congqixia 53f5a67112
enhance: [Cherry-pick] Fix misleading log content & possible nil panic (#31021) (#31054)
Cherry pick from master
pr: #31021 

- Change load field log from "dy pool" to "load pool"
- Also defer delete when there is no error

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-06 16:09:01 +08:00
congqixia 6b5e19f6b7
enhance: Bump milvus & proto version to v2.3.11 (#31035)
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 17:15:00 +08:00
zhagnlu 095c94305c
fix: add GetSegments optimization to avoid meta mutex competition (#31026)
pr: #31025

Signed-off-by: luzhang <luzhang@zilliz.com>
Co-authored-by: luzhang <luzhang@zilliz.com>
2024-03-05 14:49:01 +08:00
yihao.dai 91d17870d6
enhance: Prevent the backlog of channelCP update tasks, perform batch updates of channelCPs (#30941) (#31024)
This PR includes the following adjustments:

1. To prevent channelCP update task backlog, only one task with the same
vchannel is retained in the updater. Additionally, the lastUpdateTime is
refreshed after the flowgraph submits the update task, rather than in
the callBack function.
2. Batch updates of multiple vchannel checkpoints are performed in the
UpdateChannelCheckpoint RPC (default batch size is 128). Additionally,
the lock for channelCPs in DataCoord meta has been switched from key
lock to global lock.
3. The concurrency of UpdateChannelCheckpoint RPCs in the datanode has
been reduced from 1000 to 10.

issue: https://github.com/milvus-io/milvus/issues/30004

pr: https://github.com/milvus-io/milvus/pull/30941

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-05 14:27:01 +08:00
congqixia b7635ed989
enhance: [Cherry-pick] Change proxy connection manager to concurrent safe (#31009)
Cherry-pick from master
pr: #31008 
See also #31007

This PR:
- Add param item for connection manager behavior: TTL & check interval
- Change clientInfo map to concurrent map

---------

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 14:13:00 +08:00
yihao.dai a5350f64a5
enhance: Reduce the memory usage of the timeTickSender (#30968) (#30991)
In the cache of the timeTickSender, retain only the latest stats instead
of storing stats for every time tick.

issue: https://github.com/milvus-io/milvus/issues/30967

pr: https://github.com/milvus-io/milvus/pull/30968

Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
2024-03-05 10:59:01 +08:00
congqixia 81b197267a
enhance: [Cherry-Pick] Add back load memory factor when esitmating memory resource (#30999)
Cherry-pick from master
pr: #30994
Segment load memory usage is underestimated due to removing the load
memroy factor. This PR adds it back to protect querynode OOM during some
extreme memory cases.

Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
2024-03-05 09:15:00 +08:00
jaime 336e0ae45e
enhance: index meta use independent rather than global meta lock (#30986)
issue: https://github.com/milvus-io/milvus/issues/30837
pr: #30869

Signed-off-by: jaime <yun.zhang@zilliz.com>
2024-03-05 08:48:59 +08:00
chyezh df09222029
fix: starve lock caused by slow GetCompactionTo method when too much segments (#30965)
issue: #30823
pr: #30963

Signed-off-by: chyezh <chyezh@outlook.com>
2024-03-04 20:51:00 +08:00
shaoyue 7014d352b3
fix: permissions on /milvus for OpenShift compatibility (#30937)
Fixes #25565
Cherry-pick 
pr: #30775

Signed-off-by: Guillaume Moutier <guillaume.moutier@gmail.com>
Signed-off-by: shaoyue.chen <shaoyue.chen@zilliz.com>
Co-authored-by: Guillaume Moutier <guimou@users.noreply.github.com>
2024-03-04 16:59:00 +08:00
XuanYang-cn bb2de0d964
fix: [cherry-pick] Clear DN unknown compaction tasks (#30972)
If DC restarted,  those unkonwn compaction tasks
will never get call back in DN, so that the segments in the compaction
task will be locked, unable to sync and compaction again, blocking cp
advance and compaction executing.

See also: #30137
pr: #30850

Signed-off-by: yangxuan <xuan.yang@zilliz.com>
2024-03-04 16:52:59 +08:00
wei liu db49b8524d
fix: Skip generate balance task when target not ready (#30725)
issue: #30723
pr: #30724

This PR skip generate balance task when collection's target isn't ready.
also refine the check stale logic in query coord's scheduler, if channel
exist in current or next target, task won't be canceled.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-04 11:38:59 +08:00
wei liu af54c3ba85
fix: Make datacoord client retry on index api (#30656)
pr: #30654

This PR add retry on all interface which belong to indexcoord in milvus
2.2 and. move to data coord in milvus 2.3, to prevent meet unimplemented
error during rolling upgrade from milvus 2.2 to 2.3.

Signed-off-by: Wei Liu <wei.liu@zilliz.com>
2024-03-04 11:37:09 +08:00
cai.zhang 38e3d6af3e
enhance: Optimize DescribeIndex to reduce lock contention (#30975)
issue: #29313
issue: #30443
master pr: #30939

Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
2024-03-04 11:30:59 +08:00