milvus

Commit Graph

Author	SHA1	Message	Date
congqixia	cb7f2fa6fd	enhance: Use v2 package name for pkg module (#39990 ) Related to #39095 https://go.dev/doc/modules/version-numbers Update pkg version according to golang dep version convention --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2025-02-22 23:15:58 +08:00
Zhen Ye	c84a0748c4	enhance: add rw/ro streaming query node replica management (#38677 ) issue: #38399 - Embed the query node into streaming node to make delegator available at streaming node. - The embedded query node has a special server label `QUERYNODE_STREAMING-EMBEDDED`. - Change the balance strategy to make the channel assigned to streaming node as much as possible. Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-24 16:55:07 +08:00
wei liu	d2834a1812	enhance: Add logs for check health failed (#39208 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2025-01-15 17:31:00 +08:00
Zhen Ye	bb8d1ab3bf	enhance: make new go package to manage proto (#39114 ) issue: #39095 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2025-01-10 10:49:01 +08:00
jaime	f03a85725a	enhance: add db name in replica (#38672 ) issue: #36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2025-01-09 19:40:59 +08:00
jaime	78438ef41e	fix: revert optimize CPU usage for CheckHealth requests (#35589 ) (#38555 ) issue: #35563 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-19 00:38:45 +08:00
jaime	28fdbc4e30	enhance: optimize CPU usage for CheckHealth requests (#35589 ) issue: #35563 1. Use an internal health checker to monitor the cluster's health state, storing the latest state on the coordinator node. The CheckHealth request retrieves the cluster's health from this latest state on the proxy sides, which enhances cluster stability. 2. Each health check will assess all collections and channels, with detailed failure messages temporarily saved in the latest state. 3. Use CheckHealth request instead of the heavy GetMetrics request on the querynode and datanode Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-12-17 11:02:45 +08:00
tinswzy	27229f7907	enhance: refine exists log print with ctx (#38080 ) issue: #35917 Refines exists log print with ctx Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-14 22:36:44 +08:00
Zhen Ye	d3ae8e9232	fix: delay the wait other coord logic in query coord after query coord change into standby state (#38259 ) issue: https://github.com/milvus-io/milvus/issues/37764 - After removing rpc layer from mixcoord, the querycoord at standby mode will be blocked forever of deployment rolling --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-12-11 15:48:42 +08:00
tinswzy	7944538ade	enhance: Add ctx param to KV operation interfaces (#38154 ) issue: #35917 Refine KV operation interfaces by adding a ctx param Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-12-05 15:16:41 +08:00
tinswzy	e76802f910	enhance: refine querycoord meta/catalog related interfaces to ensure that each method includes a ctx parameter (#37916 ) issue: #35917 This PR refine the querycoord meta related interfaces to ensure that each method includes a ctx parameter. Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>	2024-11-25 11:14:34 +08:00
jaime	7bbfe86bcd	enhance: add list index and segment index retrieval API for WebUI (#37861 ) issue: #36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-22 16:58:34 +08:00
jaime	1e8ea4a7e7	feat: add segment/channel/task/slow query render (#37561 ) issue: #36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-12 17:44:29 +08:00
wei liu	266f8ef1f5	fix: Search may return less result after qn recover (#36549 ) issue: #36293 #36242 after qn recover, delegator may be loaded in new node, after all segment has been loaded, delegator becomes serviceable. but delegator's target version hasn't been synced, and if search/query comes, delegator will use wrong target version to filter out a empty segment list, which caused empty search result. This pr will block delegator's serviceable status until target version is synced --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-12 16:34:28 +08:00
wei liu	a03157838b	enhance: Enable node assign policy on resource group (#36968 ) issue: #36977 with node_label_filter on resource group, user can add label on querynode with env `MILVUS_COMPONENT_LABEL`, then resource group will prefer to accept node which match it's node_label_filter. then querynode's can't be group by labels, and put querynodes with same label to same resource groups. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-11-08 11:18:27 +08:00
jaime	f348bd9441	feat: add segment,pipeline, replica and resourcegroup api for WebUI (#37344 ) issue: #36621 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-11-07 11:52:25 +08:00
jaime	9d16b972ea	feat: add tasks page into management WebUI (#37002 ) issue: #36621 1. Add API to access task runtime metrics, including: - build index task - compaction task - import task - balance (including load/release of segments/channels and some leader tasks on querycoord) - sync task 2. Add a debug model to the webpage by using debug=true or debug=false in the URL query parameters to enable or disable debug mode. Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-10-28 10:13:29 +08:00
jaime	4746f47282	feat: management WebUI homepage (#36822 ) issue: #36784 1. Implement an embedded web server for WebUI access. 2. Complete the homepage development. Home page demo: <img width="2177" alt="iShot_2024-10-10_17 57 34" src="https://github.com/user-attachments/assets/38539917-ce09-4e54-a5b5-7f4f7eaac353"> Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-10-23 11:29:28 +08:00
wei liu	3cd0b26285	enhance: Enable dynamic update loaded collection's replica (#35822 ) issue: #35821 After collection loaded, if we need to increase/decrease collection's replica, we need to release and load it again. milvus offers 4 solution to update loaded collection's replica, this PR aims to dynamic change the replica number without release, and after replica number changed, milvus will execute load replica or release replica in async, and the replica loaded status can be checked by getReplicas API. Notice that if set too much replicas than querynode can afford，the new replica won't be loaded successfully until enough querynode joins. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-09-25 10:13:18 +08:00
congqixia	2fbc628994	feat: Support field partial load collection (#35416 ) Related to #35415 --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-08-20 16:49:02 +08:00
jaime	fcec4c21b9	fix: check collection health(queryable) fail for releasing collection (#34947 ) issue: #34946 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-08-02 17:20:15 +08:00
jaime	9630974fbb	enhance: move rocksmq from internal to pkg module (#33881 ) issue: #33956 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-06-25 21:18:15 +08:00
wayblink	5fac2fa1d2	fix: Panic if ProcessActiveStandBy returns error (#33369 ) #33368 Signed-off-by: wayblink <anyang.wang@zilliz.com>	2024-06-19 11:16:00 +08:00
wei liu	303470fc35	fix: Clean offline node from resource group after qc restart (#33232 ) issue: #33200 #33207 pr#33104 causes the offline node will be kept in resource group after qc recover, and offline node will be assign to new replica as rwNode, then request send to those node will fail by NodeNotFound. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-22 10:03:40 +08:00
wei liu	33bd6eed28	fix: Clean offline node from replica after qc recover (#33213 ) issue: #33200 #33207 pr#33104 remove this logic by mistake, which cause the offline node will be kept in replica after qc recover, and request send to offline qn will go a NodeNotFound error. Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-21 15:41:39 +08:00
wei liu	2013d97243	enhance: Enable to dynamic update balancer policy in querycoord (#33037 ) issue: #33036 This PR enable to dynamic update balancer policy without restart querycoord. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-21 14:29:39 +08:00
wei liu	a7f6193bfc	fix: query node may stuck at stopping progress (#33104 ) issue: #33103 when try to do stopping balance for stopping query node, balancer will try to get node list from replica.GetNodes, then check whether node is stopping, if so, stopping balance will be triggered for this replica. after the replica refactor, replica.GetNodes only return rwNodes, and the stopping node maintains in roNodes, so balancer couldn't find replica which contains stopping node, and stopping balance for replica won't be triggered, then query node will stuck forever due to segment/channel doesn't move out. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-20 10:21:38 +08:00
congqixia	861977ab60	fix: Start `LeaderCacheObserver` before `SyncAll` (#33035 ) Related to #33033 Signed-off-by: Congqi Xia <congqi.xia@zilliz.com> Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-05-14 13:25:32 +08:00
wei liu	e2332bdc17	enhance: Enable channel exclusive balance policy (#32911 ) issue: #32910 * split replica's node list to channels when create replicas * balance nodes among channels when node change happens * implement channel level balance, let balance happens in channel level Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-10 17:27:31 +08:00
wei liu	ba02d54a30	enhance: update shard leader cache when leader location changed (#32470 ) issue: #32466 this PR enhance that when shard location changed, update proxy's shard leader cache. in case of query node failover case, proxy can find replica recover --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-05-08 10:05:29 +08:00
chyezh	48fe977a9d	enhance: declarative resource group api (#31930 ) issue: #30647 - Add declarative resource group api - Add config for resource group management - Resource group recovery enhancement --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-15 08:13:19 +08:00
congqixia	25a1c9ecf0	fix: Make coordinator `Register` not blocked on ProcessActiveStandby (#32069 ) See also #32066 This PR make coordinator register successful and let `ProcessActiveStandBy` run async. And roles may receive stop signal and notify servers. --------- Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2024-04-10 18:49:18 +08:00
chyezh	a2502bde75	enhance: replica manager enhancement (#31496 ) issue: #30647 - ReplicaManager manage read only node now, and always do persistent of node distribution of replica. - All segment/channel checker using ReplicaManager to get read-only node or read-write node, but not ResourceManager. - ReplicaManager promise that only apply unique querynode to one replica in same collection now (replicas in same collection never hold same querynode at same time). - ReplicaManager promise that fairly node count assignment policy if multi replicas of collection is assigned to one resource group. - Move some parameters check into ReplicaManager to avoid data race. - Allow transfer replica to resource group that already load replica of same collection - Allow transfer node between resource groups that load replica of same collection --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-04-05 04:57:16 +08:00
wei liu	4dfdb1a443	fix: save current target after target observer stop (#31315 ) issue: #28491 should save target to meta store after target observer stop, incase of target changed Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-18 12:27:04 +08:00
wei liu	d79aa58b37	enhance: Speed up target recovery after query coord restart (#31240 ) issue: #28491 after querycoord restart, it will pull a new target, which include channel and segment list. when segments loaded on querynode has reached the target, the collection could provide search/query. but if segment list changes by time, ater querycoord pull a new target, it will takes a few minutes to catch up the target's segment distribution. and before that, query/search will fail due to lack of segments. This PR save the current loaded target to meta storein querycoord's stop progress, and recover it when query coord starts, to speed up the target recovery time. --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-03-15 14:19:03 +08:00
chyezh	ff4237bb90	enhance: add hostname into node info (#30673 ) issue: https://github.com/milvus-io/milvus/issues/30647 - Address may be reused in k8s environment. Using hostname can be better. Signed-off-by: chyezh <chyezh@outlook.com>	2024-03-15 10:45:06 +08:00
jaime	db79be3ae0	fix: ctx cancel should be the last step while stopping server (#31220 ) issue: #31219 Signed-off-by: jaime <yun.zhang@zilliz.com>	2024-03-15 10:33:05 +08:00
SimFG	ee8d6f236c	enhance: make the watch dm channel request better compatibility (#30952 ) issue: #30938 Signed-off-by: SimFG <bang.fu@zilliz.com>	2024-03-01 16:07:37 +08:00
chyezh	0c7474d7e8	enhance: add graceful stop timeout to avoid node stop hang under extreme cases (#30317 ) 1. add coordinator graceful stop timeout to 5s 2. change the order of datacoord component while stop 3. change querynode grace stop timeout to 900s, and we should potentially change this to 600s when graceful stop is smooth issue: #30310 also see pr: #30306 --------- Signed-off-by: chyezh <chyezh@outlook.com>	2024-02-29 17:01:50 +08:00
Bingyi Sun	564b12c661	enhance: make balance cost threshold configurable (#30636 ) Signed-off-by: sunby <sunbingyi1992@gmail.com>	2024-02-19 15:24:50 +08:00
wei liu	e98c62abbb	enhance: refactor leader_observer to leader_checker (#29454 ) issue: #29453 sync distribution by rpc will also call loadSegment/releaseSegment, which may cause all kinds of concurrent case on same segment, such as concurrent load and release on one segment. This PR add leader_checker which generate load/release task to correct the leader view, instead of calling sync distribution by rpc --------- Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2024-01-05 15:54:55 +08:00
wei liu	839a72129e	fix: Auto balance param can't be updated by dynamic (#29501 ) This PR fixed that auto balance param can't be updated by dynamic Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-12-27 14:30:53 +08:00
SimFG	dd9c61831d	enhance: Support to get the param value in the runtime (#29297 ) /kind improvement issue: #29299 Signed-off-by: SimFG <bang.fu@zilliz.com>	2023-12-22 18:36:44 +08:00
jaime	b1e0a27f31	enhance: Add logs for each step during service initialization (#28624 ) /kind improvement Signed-off-by: jaime <yun.zhang@zilliz.com>	2023-11-27 16:30:26 +08:00
congqixia	a2fe9dad49	enhance: Make etcd kv request timeout configurable (#28661 ) See also #28660 This pr add request timeout config item for etcd kv request timeout Sync the default timeout value to same value for etcdKV & tikv config Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>	2023-11-23 19:34:23 +08:00
smellthemoon	73f2bab454	enhance:add some log when create client and get component states (#28160 ) /kind improvement Signed-off-by: lixinguo <xinguo.li@zilliz.com> Co-authored-by: lixinguo <xinguo.li@zilliz.com>	2023-11-22 09:12:22 +08:00
wei liu	b9bf910039	fix unstable auto balance config ut (#28288 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-11-09 10:00:22 +08:00
yah01	1b90630633	Fix the target updated before version updated to cause data missing (#28250 ) Signed-off-by: yah01 <yah2er0ne@outlook.com>	2023-11-08 11:36:22 +08:00
wei liu	5b45a138b1	disable auto balance when old node exists (#28191 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-11-07 14:02:20 +08:00
wei liu	7485eeb689	fix sync distribution with wrong version (#28130 ) Signed-off-by: Wei Liu <wei.liu@zilliz.com>	2023-11-03 19:02:18 +08:00

1 2 3

125 Commits (eb046863485fdf3e130fc60484485c901b81276b)