1. A collection should observe the channel only once.
2. A collection should check the CollectionLoadPercent for updates only
once.
3. Skip saving coll/partition meta if there are no changes, primarily to
accelerate collection observation after recovery.
issue: https://github.com/milvus-io/milvus/issues/37630
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #38399
- move the lifetime implementation of common code out of the server
level lifetime implementation
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #35563
1. Use an internal health checker to monitor the cluster's health state,
storing the latest state on the coordinator node. The CheckHealth
request retrieves the cluster's health from this latest state on the
proxy sides, which enhances cluster stability.
2. Each health check will assess all collections and channels, with
detailed failure messages temporarily saved in the latest state.
3. Use CheckHealth request instead of the heavy GetMetrics request on
the querynode and datanode
Signed-off-by: jaime <yun.zhang@zilliz.com>
the task limit in assignSegment/assignChannel will works for both load
task and balance task.
this PR remove the load task limit, only limit balance task num in one
round.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
The level zero mutex could be remove since all operations are guarded by
segment manager mutex
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #37630
TSafe manager is too complex for current implementation and each
delegator need one goroutine waiting for tsafe update event.
Tsafe updating could be executed in pipeline. This PR remove tsafe
manager and simplify the entire logic of tsafe updating.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
sparse vectors may have arbitrary number of non zeros and it is hard to
optimize without knowing the actual distribution of nnz. this PR adds a
metric for analyzing that.
issue: https://github.com/milvus-io/milvus/issues/35853
comparing with https://github.com/milvus-io/milvus/pull/38328, this
includes also metric for FTS in query node delegator
also fixed a bug of sparse when searching by pk
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
Related to #38372
This PR make drop partition only check target partition load states only
in case of concurrent releasing other partition in same collection.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #38142
current balance channel policy only consider current collection's
distribution, so if all collections has 1 channel, and all channels has
been loaded on same querynode, after querynode num increase, balance
channel won't be triggered.
This PR enable score based balance channel policy, to achieve:
1. distribute all channels evenly across multiple querynodes
2. distribute each collection's channel evenly across multiple
querynodes.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: https://github.com/milvus-io/milvus/issues/37764
- After removing rpc layer from mixcoord, the querycoord at standby mode
will be blocked forever of deployment rolling
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #37630
Previously the loaded collection metrics was calculated via scanning all
loaded segment in segment manager, which is slow and buggy
implementation.
This PR:
- Move collection num metrics to collection manager
- Remove deprecated loaded partition metrics update logic
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #38325
the old impl only to check grant in default db before drop role, which
may cause role be dropped when grant still exist.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
enhance :
1. alterindex delete properties
We have introduced a new parameter deleteKeys to the alterindex
functionality, which allows for the deletion of properties within an
index. This enhancement provides users with the flexibility to manage
index properties more effectively by removing specific keys as needed.
2. altercollection delete properties
We have introduced a new parameter deleteKeys to the altercollection
functionality, which allows for the deletion of properties within an
collection. This enhancement provides users with the flexibility to
manage collection properties more effectively by removing specific keys
as needed.
3.support altercollectionfield
We currently support modifying the fieldparams of a field in a
collection using altercollectionfield, which only allows changes to the
max-length attribute.
Key Points:
- New Parameter - deleteKeys: This new parameter enables the deletion of
specified properties from an index. By passing a list of keys to
deleteKeys, users can remove the corresponding properties from the
index.
- Mutual Exclusivity: The deleteKeys parameter cannot be used in
conjunction with the extraParams parameter. Users must choose one
parameter to pass based on their requirement. If deleteKeys is provided,
it indicates an intent to delete properties; if extraParams is provided,
it signifies the addition or update of properties.
issue: https://github.com/milvus-io/milvus/issues/37436
---------
Signed-off-by: Xianhui.Lin <xianhui.lin@zilliz.com>
issue: #38237
this PR only use better compression level for proto msg which is larger
than 1MB, and use a lighter compression level for smaller proto msg,
which could get a better latency in most case.
this PR could reduce the latency from 22.7s to 4.7s with 10000
collctions and each collections has 1000 segments.
before this PR:
BenchmarkTargetManager-8 1 22781536357 ns/op 566407275088 B/op 11188282
allocs/op
after this PR:
BenchmarkTargetManager-8 1 4729566944 ns/op 36713248864 B/op 10963615
allocs/op
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #38201
leader task require to update delegator's distribution, and only success
after the distribution change has been applyed to delegator. but the
delegator will reject the distribution change if it's version is older
than current version in delegator. which cause the leader task stuck and
retry forever.
this PR remove the leader task finish check.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #37630
Add secondary index with vchannel to reduce `GetBy` rlock holding time
when segment number is large.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Related to #38275
This PR move sync created partition step to proxy to avoid potential
logic deadlock when create partition happens with target segment change.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #38305
after we disable balance segment and balance channel happens at same
time, the constriant which require release segment must happens on
serviceable shard leader is unnessary.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #36456
Unify collection/partition number metrics to collection manager in case
of unwant missing modification
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #36621
- add filters to most of lists
- add more options to the existing list filter
- fix the error list column in channels and resource group list
Signed-off-by: mimo.oyn <ning.ouyang@zilliz.com>
The partition number has already been incremented in
`ChangePartitionState`, so there is no need to increment it again in
`AddPartition`.
issue: https://github.com/milvus-io/milvus/issues/37630
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
When there are too many key-value pairs, the etcd list operation may
times out. This PR replaces `LoadWithPrefix` in list operations, which
could involve many keys, with `WalkWithPrefix`.
issue: https://github.com/milvus-io/milvus/issues/37917
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
`File.Write` and `File.WriteInt` use `write`, which may be just direct
syscall in some systems. When mappding field data and write line by
line, this could cost lost of CPU time when the row number is large.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #38086
cause ManualCompact api pass collection id in request, but RBAC requires
to check collection name, so grant ManualCompact api doesn't work.
This PR refine the ManualCompact api to accpet collection name in
request.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
related issue: https://github.com/milvus-io/milvus/issues/37031
fixed issues #38042: The interface "grant_v2" does not support empty
collectionName while the error says it does
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
#38077
remove the check for two reason
1. server will do the same to make sure use the correct database;
2. each req has an additional overhead of calling the proxy to check
database.
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
related issue: https://github.com/milvus-io/milvus/issues/37031
fixed issues:
#37974: better error messages for grant v2 interface
#37903: fix meta built-in privilege group object name
#37843: better error messages for custom privilege group interface
#38002: fix built-in privilege group meta to pass proxy interceptor
check
#38008: fix revoke v2 to support revoking v1 granted privileges
Signed-off-by: shaoting-huang <shaoting.huang@zilliz.com>
issue: #33285
- move most cgo opeartions related to search/query into segcore package
for reusing for streamingnode.
- add go unittest for segcore operations.
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #35917
This PR refines the meta-related APIs in datacoord to allow the ctx to
be passed down to the catalog operation interfaces
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
Related to #37999
This PR add `SetThreadName` API for marking cgo thread and utilize it
when initializing cgo worker.
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #35917
This PR refine the querycoord meta related interfaces to ensure that
each method includes a ctx parameter.
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
RESTful API. The influenced API are as follows:
- Handler. insert
- HandlerV1. insert/upsert
- HandlerV2. insert/upsert/search
We do not modify search API in Handler/HandlerV1 because they do not
support fp16/bf16 vectors.
module github.com/milvus-io/milvus/pkg:
Add `Float32ArrayToBFloat16Bytes()`, `Float32ArrayToFloat16Bytes()` and
`Float32ArrayToBytes()`. These method will be used in GoSDK in the
future.
issue: #37448
Signed-off-by: Yinzuo Jiang <yinzuo.jiang@zilliz.com>
Signed-off-by: Yinzuo Jiang <jiangyinzuo@foxmail.com>
issue: #37764
- add a local client to call local server directly for
querycoord/rootcoord/datacoord.
- enable local client if milvus is running mixcoord or standalone mode.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #37630
This PR add a new util coll2Replicas secondary index to reduce map
access & iteration while get replicas by collection
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
/kind improvement
Here you only need to filter out the system fields, and you don’t need
to recreate a response, because recreating the response will cause this
part to be easily missed when adding fields later.
Signed-off-by: SimFG <bang.fu@zilliz.com>
issue: #37908
cause paramtable is global single instance, which cause
paramtable.GetNodeID may return wrong server id in integration test.
This PR use node.GetNodeID to replace paramtable.GetNodeID
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #36621
- fix the empty segment source
- remove http request timeout limit
- support more filters in seg and channel table
Signed-off-by: mimo.oyn <ning.ouyang@zilliz.com>
issue: #35917
Before enhancing log trace information, it's necessary to pass the
context to the method entry point.
This PR first refine the rootcoord/metatable interfaces to ensure that
each method includes a ctx parameter.
Signed-off-by: tinswzy <zhenyuan.wei@zilliz.com>
1. taskQueueCapacity 256 is too small for production when we want to
re-write the entire collection
2. tasks should be cleaned when unable to recover, or the meta will
remain in etcd forever later.
Signed-off-by: yangxuan <xuan.yang@zilliz.com>
issue: #33550
balance segment and balance segment execute at same time, which will
cause bounch of corner case.
This PR disable simultaneous balance of segments and channels
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #37830
casue dist handler doesn't set channel's version, so if channel checker
try to dedup channel, it may release the new delegator after balance
finished.
this PR fix the way to set proper version for channel.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #37851
- make a global thread pool at tantivy temporally.
- set 1 but not 4 threads for inverted text index.
Signed-off-by: chyezh <chyezh@outlook.com>
issue : https://github.com/milvus-io/milvus/issues/36864
I have a few questions regarding my approach.I will consolidate them
here for feedback and review.Thanks
---------
Signed-off-by: Nischay Yadav <nischay.yadav@ibm.com>
Signed-off-by: Nischay <Nischay.Yadav@ibm.com>
Custom `jsonRender` that encodes JSON data directly into the response
stream, it uses less memory since it does not buffer the entire JSON
structure before sending it, unlike `c.JSON` in `HTTPReturn`, which
serializes the JSON fully in memory before writing it to the response.
Benchmark testing shows that, using the custom render incurs no
performance loss and reduces memory consumption by nearly 50%.
issue: https://github.com/milvus-io/milvus/issues/37671
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #37718
This PR refine the shard client ref counter, dec ref counter won't
release client anymore, and only permit shard client manager to remove
client.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Limit the maximum concurrency of channel tasks for each DataNode to
prevent excessive subscriptions from causing DataNode OOM.
issue: https://github.com/milvus-io/milvus/issues/37665
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
When there are a large number of segments, the metrics consume a lot of
memory. This PR Remove segment-level tag from monitoring metrics.
issue: https://github.com/milvus-io/milvus/issues/37636
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #36621
* The front-end code of the management interface has been refactored
using React.
* The interaction and display of the front-end interface have been
optimized.
Signed-off-by: mimo.oyn <ning.ouyang@zilliz.com>
Co-authored-by: zilliz <zilliz@zillizdeMacBook-Pro.local>
issue: #37679
pr #36549 introduce the logic error which update current target when
only parts of channel is ready.
This PR fix the logic error and let dist handler keep pull distribution
on querynode until all delegator becomes serviceable.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #37579
If the schema includes large varchar fields, a few thousand rows can
reach hundreds of MB in size. Therefore, if the batch size of the
segment writer is large, it will produce relatively large `binlogs`,
which can cause datanode to run out of memory (OOM) during compaction.
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
issue: #37665#37631#37620#37587#36906
knowhere has add default nlist value, so some invalid param test ut with
no nlist param will be valid.
Signed-off-by: xianliang.li <xianliang.li@zilliz.com>
issue: ##36621
- For simple types in a struct, add "string" to the JSON tag for
automatic string conversion during JSON encoding.
- For complex types in a struct, replace "int64" with "string."
Signed-off-by: jaime <yun.zhang@zilliz.com>
issue: #37640
fix the pr #36549
cause balance channel will wait until new delegator becomes serviceable,
but new delegator need to sync target version then becomes serviceable,
and sync target version need to be wait all replica load done. so if
increasing replica number and balance channel happens at same time,
logic dead lock occurs.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #36293#36242
after qn recover, delegator may be loaded in new node, after all segment
has been loaded, delegator becomes serviceable. but delegator's target
version hasn't been synced, and if search/query comes, delegator will
use wrong target version to filter out a empty segment list, which
caused empty search result.
This pr will block delegator's serviceable status until target version
is synced
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #37115
the old implementation update shard cache and shard client manager at
same time, which causes lots of conor case due to concurrent issue
without lock.
This PR decouple shard client manager from shard cache, so only shard
cache will be updated if delegator changes. and make sure shard client
manager will always return the right client, and create a new client if
not exist. in case of client leak, shard client manager will purge
client in async for every 10 minutes.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33285
- Modify the proto of consumer of streaming service.
- Make VChannel as a required option for streaming
---------
Signed-off-by: chyezh <chyezh@outlook.com>
Related to #35415
In rolling upgrade, legacy proxy may dispatch load request wit empty
load field list. The upgraded querycoord may report error by mistake
that load field list is changed.
This PR:
- Auto field empty load field list with all user field ids
- Refine the error messag when load field list updates
- Refine load job unit test with service cases
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
When there're a lot of loaded collections, they would occupy the target
observer scheduler’s pool. This prevents loading collections from
updating the current target in time, slowing down the load process.
This PR adds a separate target dispatcher for loading collections.
issue: https://github.com/milvus-io/milvus/issues/37166
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Close (unsubscribe) the msg stream after completing the
`PreCreatedTopic` check to prevent backlog issue.
issue: https://github.com/milvus-io/milvus/issues/36021
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Since master branch run cpp unittest in jenken, the cpp unit test
building could be skipped since the output is not needed in github
runner.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
When check health logic failed to collection not-queryable, the related
reason is hard to find in log.
This PR add context for log with trace id and print unqueryable
collection info log.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #36977
with node_label_filter on resource group, user can add label on
querynode with env `MILVUS_COMPONENT_LABEL`, then resource group will
prefer to accept node which match it's node_label_filter.
then querynode's can't be group by labels, and put querynodes with same
label to same resource groups.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #37172
- add redo interceptor to implement append context refresh. (make new
timetick)
- add create segment handler for flusher.
- make empty segment flushable and directly change it into dropped.
- add create segment message into wal when creating new growing segment.
- make the insert operation into following seq: createSegment -> insert
-> insert -> flushSegment.
- make manual flush into following seq: flushTs -> flushsegment ->
flushsegment -> manualflush.
---------
Signed-off-by: chyezh <chyezh@outlook.com>