milvus

Commit Graph

Author	SHA1	Message	Date
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
Xiaofan	cb6eca8e91	fix: drop partition can not be successful if load failed (#38793 ) fix #38649 when partition load failed, the partition drop will also fail due to the wrong error message Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>	2024-12-30 19:42:52 +08:00
wei liu	25f0c82ceb	fix: Fix update loading collection's load config doesn't work (#38595 ) issue: #38594 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-12-25 18:02:51 +08:00
jaime	5afd0c0a2b	fix: Revert "Expose metrics of stanby coordinators (#27698 )" (#38620 ) issue: #38608 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-23 11:46:57 +08:00
jaime	78438ef41e	fix: revert optimize CPU usage for CheckHealth requests (#35589 ) (#38555 ) issue: #35563 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-19 00:38:45 +08:00
jaime	28fdbc4e30	enhance: optimize CPU usage for CheckHealth requests (#35589 ) issue: #35563 1. Use an internal health checker to monitor the cluster's health state, storing the latest state on the coordinator node. The CheckHealth request retrieves the cluster's health from this latest state on the proxy sides, which enhances cluster stability. 2. Each health check will assess all collections and channels, with detailed failure messages temporarily saved in the latest state. 3. Use CheckHealth request instead of the heavy GetMetrics request on the querynode and datanode Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-17 11:02:45 +08:00
SimFG	2afe2eaf3e	feat: support to replicate collection when the services contains the system tt msg (#37559 ) - issue: #37105 --------- Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-12-17 09:08:46 +08:00
tinswzy	27229f7907	enhance: refine exists log print with ctx (#38080 ) issue: #35917 Refines exists log print with ctx Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-14 22:36:44 +08:00
congqixia	051bc280dd	enhance: Make dynamic load/release partition follow targets (#38059 ) Related to #37849 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-12-05 16:24:40 +08:00
tinswzy	e76802f910	enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916 ) issue: #35917 This PR refine the querycoord meta related interfaces to ensure that each method includes a ctx parameter. Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-11-25 11:14:34 +08:00
jaime	257ecab84b	enhance: remove collection queryable check from health check (#37712 ) Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-18 10:50:38 +08:00
wei liu	266f8ef1f5	fix: Search may return less result after qn recover (#36549 ) issue: #36293 #36242 after qn recover, delegator may be loaded in new node, after all segment has been loaded, delegator becomes serviceable. but delegator's target version hasn't been synced, and if search/query comes, delegator will use wrong target version to filter out a empty segment list, which caused empty search result. This pr will block delegator's serviceable status until target version is synced --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-12 16:34:28 +08:00
congqixia	f5b06a3c9f	enhance: Invalidate collection cache when release collection (#37577 ) Related to #37395 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-12 10:16:29 +08:00
congqixia	dcc1b506dc	enhance: Add context trace for querycoord queryable check (#37524 ) When check health logic failed to collection not-queryable, the related reason is hard to find in log. This PR add context for log with trace id and print unqueryable collection info log. Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-11-08 14:00:26 +08:00
jaime	9d16b972ea	feat: add tasks page into management WebUI (#37002 ) issue: #36621 1. Add API to access task runtime metrics, including: - build index task - compaction task - import task - balance (including load/release of segments/channels and some leader tasks on querycoord) - sync task 2. Add a debug model to the webpage by using debug=true or debug=false in the URL query parameters to enable or disable debug mode. Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-10-28 10:13:29 +08:00
jaime	4746f47282	feat: management WebUI homepage (#36822 ) issue: #36784 1. Implement an embedded web server for WebUI access. 2. Complete the homepage development. Home page demo: <img width="2177" alt="iShot_2024-10-10_17 57 34" src="https://github.com/user-attachments/assets/38539917-ce09-4e54-a5b5-7f4f7eaac353"> Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-10-23 11:29:28 +08:00
wei liu	3cd0b26285	enhance: Enable dynamic update loaded collection's replica (#35822 ) issue: #35821 After collection loaded, if we need to increase/decrease collection's replica, we need to release and load it again. milvus offers 4 solution to update loaded collection's replica, this PR aims to dynamic change the replica number without release, and after replica number changed, milvus will execute load replica or release replica in async, and the replica loaded status can be checked by getReplicas API. Notice that if set too much replicas than querynode can afford，the new replica won't be loaded successfully until enough querynode joins. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-25 10:13:18 +08:00
congqixia	2fbc628994	feat: Support field partial load collection (#35416 ) Related to #35415 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-20 16:49:02 +08:00
wei liu	22ced010cd	enhance: make configure load param feature be compatible with old sdk (#35520 ) issue: #31570 #35521 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-08-20 10:30:55 +08:00
wei liu	9b37d3f517	enhance: Enable setting the replica number and resource group during collection creation (#34403 ) issue: #30040 --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-07-10 10:20:13 +08:00
jaime	0426390f06	enhance: improve check health (#33800 ) issue: #34264 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-07-01 10:16:06 +08:00
wei liu	935bc1fb71	fix: Fix GetReplicas API return nil status (#33715 ) issue: #33702 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-06-20 14:40:15 +08:00
wei liu	b13932bb55	enhance: Enable database level replica num and resource groups for loading collection (#33052 ) issue: #30040 This PR introduce two database level props: 1. database.replica.number 2. database.resource_groups User can set those two database props by AlterDatabase API, then can load collection without specified replica_num and resource groups. then it will use database level load param when try to load collections. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-29 10:59:43 +08:00
wei liu	a7f6193bfc	fix: query node may stuck at stopping progress (#33104 ) issue: #33103 when try to do stopping balance for stopping query node, balancer will try to get node list from replica.GetNodes, then check whether node is stopping, if so, stopping balance will be triggered for this replica. after the replica refactor, replica.GetNodes only return rwNodes, and the stopping node maintains in roNodes, so balancer couldn't find replica which contains stopping node, and stopping balance for replica won't be triggered, then query node will stuck forever due to segment/channel doesn't move out. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-20 10:21:38 +08:00
chyezh	293f14a8b9	fix: remove redundant replica recover (#32985 ) issue: #22288 - replica recover should be only triggered by replica recover Signed-off-by: chyezh <chyezh@outlook.com>	2024-05-13 15:25:32 +08:00
wei liu	ba02d54a30	enhance: update shard leader cache when leader location changed (#32470 ) issue: #32466 this PR enhance that when shard location changed, update proxy's shard leader cache. in case of query node failover case, proxy can find replica recover --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:05:29 +08:00
wei liu	d900e68440	fix: fix GetShardLeaders return empty node list (#32685 ) issue: #32449 to avoid GetShardLeaders return empty node list, this PR add node list check in both client side and server side. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-29 14:19:26 +08:00
chyezh	f06509bf97	fix: get replica should not report error when no querynode serve (#32536 ) issue: #30647 - Remove error report if there's no query node serve. It's hard for programer to use it to do resource management. - Change resource group `transferNode` logic to keep compatible with old version sdk. --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-25 19:25:24 +08:00
chyezh	b287fbaa2e	fix: return collection on recovering but not collection not loaded when target is not recovered (#32447 ) issue: #32398 Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-25 11:21:26 +08:00
smellthemoon	96d95e7743	enhance: fix pass error msg as channel name (#32511 ) Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2024-04-23 16:45:22 +08:00
congqixia	d7ff1bbe5c	enhance: Make querycoordv2 collection observer task driven (#32441 ) See also #32440 - Add loadTask in collection observer - For load collection/partitions, load task shall timeout as a whole - Change related constructor to load jobs --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-22 10:39:22 +08:00
chyezh	a8c8a6bb0f	fix: parameter check of TransferReplica and TransferNode (#32297 ) issue: #30647 - Same dst and src resource group should not be allowed in `TransferReplica` and `TransferNode`. - Remove redundant parameter check. Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-17 15:27:19 +08:00
chyezh	48fe977a9d	enhance: declarative resource group api (#31930 ) issue: #30647 - Add declarative resource group api - Add config for resource group management - Resource group recovery enhancement --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-15 08:13:19 +08:00
wei liu	c4806b69c4	enhance: Refactor leader view manager interface (#31133 ) issue: #31091 This PR add GetByFilter interface in leader view manager, instead of all kind of get func --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-10 15:13:36 +08:00
chyezh	a2502bde75	enhance: replica manager enhancement (#31496 ) issue: #30647 - ReplicaManager manage read only node now, and always do persistent of node distribution of replica. - All segment/channel checker using ReplicaManager to get read-only node or read-write node, but not ResourceManager. - ReplicaManager promise that only apply unique querynode to one replica in same collection now (replicas in same collection never hold same querynode at same time). - ReplicaManager promise that fairly node count assignment policy if multi replicas of collection is assigned to one resource group. - Move some parameters check into ReplicaManager to avoid data race. - Allow transfer replica to resource group that already load replica of same collection - Allow transfer node between resource groups that load replica of same collection --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-05 04:57:16 +08:00
congqixia	c2aad513c0	fix: Check collection nil before check load status (#31850 ) See also #31849 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-03 10:07:13 +08:00
wei liu	7471a8005f	fix: querycoord panic after node down (#31831 ) issue: #30519 Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-04-03 10:03:22 +08:00
wei liu	92971707de	enhance: Add restful api for devops to execute rolling upgrade (#29998 ) issue: #29261 This PR Add restful api for devops to execute rolling upgrade, including suspend/resume balance and manual transfer segments/channels. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-27 16:15:19 +08:00
wei liu	ddd918ba04	enhance: change frequency log to rated level (#31084 ) This PR change frequency log of check shard leader to rated level --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 16:39:02 +08:00
wei liu	efe8cecc88	enhance: refactor segment dist manager interface (#31073 ) issue: #31091 This PR add `GetByFilter` interface in segment dist manager, instead of all kind of get func Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-08 16:29:01 +08:00
zhenshan.cao	bb93b22c84	fix: should return collectionName in response of ListAliases (#30532 ) issue : https://github.com/milvus-io/milvus/issues/30369 Signed-off-by: zhenshan.cao <zhenshan.cao@zilliz.com>	2024-02-12 08:30:55 +08:00
aoiasd	f84d9a589a	fix: channel checker reduce balancing channels. (#30087 ) Ignore leader unavailable when channel checker judge repeat channel to avoid channel checker remove channels balancing. relate: https://github.com/milvus-io/milvus/issues/29841 https://github.com/milvus-io/milvus/issues/29838 Signed-off-by: aoiasd <zhicheng.yue@zilliz.com>	2024-01-26 10:59:00 +08:00
wei liu	5474bce9d2	fix: Choose wrong shard leader during balance channel (#29529 ) issue: #29523 readable shard leader should still be the old one during channel balance, if the new shard leader is not ready. This PR fixed that query coord choose wrong shard leader during balance channel Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-12-28 15:22:51 +08:00
wei liu	d081fd5481	enhance: Change some frequency log to rated level (#28897 ) This pr change some frequency log's level to rated. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-12-04 10:38:35 +08:00
wei liu	e0222b2ce3	refine target manager code style (#27883 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-10-25 00:44:12 +08:00
wayblink	e3f2122618	Expose metrics of stanby coordinators (#27698 ) Signed-off-by: wayblink <anyang.wang@zilliz.com>	2023-10-16 15:04:09 +08:00
yah01	be980fbc38	Refine state check (#27541 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-10-11 21:01:35 +08:00
yah01	63ac43a3b8	Refine errors for import (#27379 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-09-30 10:31:28 +08:00
yah01	6539a5ae2c	Refine DataCoord status (#27262 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-09-26 17:15:27 +08:00

1 2 3

111 Commits (eb046863485fdf3e130fc60484485c901b81276b)