issue: #32901
pr #32814 introduce the compatible issue, when upgrade to milvus latest,
the query coord may skip update dist due to the lastModifyTs doesn't
changes. but for old version querynode, the lastModifyTs in
GetDataDistritbuionResponse is always 0, which makes qc skip update
dist. then qc will keep retry the task to watch channel again and again.
this PR add compatible with old version querynode, when lastModifyTs is
0, qc will update it's data distribution.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #33099#32837#32419
1. len(search result) may be nq * topk, we need return all rather than
topk
2. the in restful response payload keep the same with milvus error code
Signed-off-by: PowderLi <min.li@zilliz.com>
1. use a small warmup pool to reduce the impact of warmup
2. change the warmup pool to nonblocking mode
3. disable warmup by default
4. remove the maximum size limit of 16 for the load pool
issue: https://github.com/milvus-io/milvus/issues/32772
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Co-authored-by: xiaofanluan <xiaofan.luan@zilliz.com>
issue: #29419
also re-enabled an e2e test using restful api, which is previously
disabled due to https://github.com/milvus-io/milvus/issues/32214.
In restful api, the accepted json formats of sparse float vector are:
* `{"indices": [1, 100, 1000], "values": [0.1, 0.2, 0.3]}`
* {"1": 0.1, "100": 0.2, "1000": 0.3}
for accepted indice and value range, see
https://milvus.io/docs/sparse_vector.md#FAQ
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
If the request is limited by rate limiter, limiter should not "Cancel".
This is because, if limited, tokens are not deducted; instead, "Cancel"
operation would increase the token count.
issue: https://github.com/milvus-io/milvus/issues/31705
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
See also #33247
Introduced in PR #32865
Remove task after task done to keep checkpoint sound and safe
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Before executing the import, partition IDs should be reordered according
to partition names. Otherwise, the data might be hashed to the wrong
partition during import. This PR corrects this error.
issue: https://github.com/milvus-io/milvus/issues/33237
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
issue: #33220
use dbName as part of privilege entity, so
1. grant / revoke a privilege need dbName
2. we can describe the privileges of the role which belong to one
special database
Signed-off-by: PowderLi <min.li@zilliz.com>
See also #33266
Each `WriteBuffer` shall have same channel/collection id attribute, so
use same logger will do and reduce logger allocation & frequent name
composition
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32530
cause ProcessDelete need to check whether pk exist in bloom filter, and
ProcessInsert need to update pk to bloom filter, when execute
ProcessInsert and ProcessDelete in parallel, it will cause race
condition in segment's bloom filter
This PR execute ProcessInsert and ProcessDelete in serial to avoid block
each other
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
When InsertData is too large for cpp proto unmarshalling, the error
message is confusing since the length is overflowed
This PR adds assertion for insert data length.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33200#33207
pr#33104 causes the offline node will be kept in resource group after qc
recover, and offline node will be assign to new replica as rwNode, then
request send to those node will fail by NodeNotFound.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
fix#32979
remove l0 cache and build delete pk and ts everytime. this reduce the
memory and also increase the code readability
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
issue: #33200#33207
pr#33104 remove this logic by mistake, which cause the offline node will
be kept in replica after qc recover, and request send to offline qn will
go a NodeNotFound error.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #25309
- Remove ctx from struct
- Add ctx parameters for internal check logic methods
- Add Waitgroup to make sure worker goroutine quit before close returns
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33103
when try to do stopping balance for stopping query node, balancer will
try to get node list from replica.GetNodes, then check whether node is
stopping, if so, stopping balance will be triggered for this replica.
after the replica refactor, replica.GetNodes only return rwNodes, and
the stopping node maintains in roNodes, so balancer couldn't find
replica which contains stopping node, and stopping balance for replica
won't be triggered, then query node will stuck forever due to
segment/channel doesn't move out.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Query slot of compaction in datanode, and transfer the control logic for
limiting compaction tasks from datacoord to the datanode.
issue: https://github.com/milvus-io/milvus/issues/32809
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
To decouple compaction from shard, loading BF from storage instead of
memory during L0 compaction in datanode.
issue: https://github.com/milvus-io/milvus/issues/32809
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Avoid passing datanode around preparing datanode code directory
refactory.
Also refine unit test code for same component. The `Await` shall return
first before checking the counter number since when lock cost is heavy
(using deadlock.RWMutex See PR #33069.) case may fail due to long
running time submitting tasks.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
1. add nullable in model.Field
help to read nullable accurately.
2. check valid_data
a. if user pass default_value or the field is nullable, the length of
valid_data must be num_rows.
b. if passed valid_data, the length of passed field data must equal to
the number of 'true' in valid_data.
c. after fill default_value, only nullable field will still has
valid_data.
3. fill data in two situation
a. has no default_value, if nullable,
will fill nullValue when passed num_rows not equal to expected num_rows.
b. has default_value,
will fill default_value when passed num_rows not equal to expected
num_rows.
c. after fill data, the length of all field will equal to passed
num_rows.
#31728
---------
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
See also #33062
This PR:
- Add `lock.RWMutex` & `lock.Mutex` alias to switch implementation based
on build flags
- When build flags has `test` in it, use `go-deadlock` to detect
possible deadlocks
- Replace all `sync.RWMutex` & `sync.Mutex` in datacoord pkg
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33005
1. add `MemorySize` field for insert binlog.
2. `LogSize` means the file size in the storage object.
3. `MemorySize` means the size of the data in the memory.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
issue: #32910
see also: #32911
when channel exclusive mode is enabled, replica will record channel node
info in meta store, and if the balance policy changes, which means
channel exclusive mode is disabled, we should clean up the channel node
info in meta store, and stop to balance node between channels.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #32642
This PR reuses hash locations for bloom filter prediction utilizing
`storage.Location`, like enhancement #32642.
Also adds a utility struct in storage: `LocationCache` to storage
locations for variable K (numbers of hash functions)
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32970
cause InvalidateShardLeaderCache use wrong lock, which may cause data
race in meta cache, then proxy may crash
This PR fixed that use leaderMut when try to access shard leader cache.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
fix#32868
plan parser takes too much cpu on high qps,this pr try to avoid create
lexer and parser too freequent
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
Set `UseDefaultConsistency` to true so that restv2 read API shall use
collection consistency level setting correctly.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32206, #32801
- search failure with some assertion, segment not loaded and resource
insufficient.
- segment leak when query segments
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue:#32899
This PR fix the wrong metric value of load index, which introduced by
pr#32567, use wrong time unit for load index metrics
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #32910
* split replica's node list to channels when create replicas
* balance nodes among channels when node change happens
* implement channel level balance, let balance happens in channel level
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Issue: #22837
including following changes:
1. Add API CreateInsertData() and BuildArrayData() in
internal/util/testutil
2. Remove duplicated test APIs from importutilv2 unittest and bulk
insert integration test
Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
issue: #32901
Cause release segment request need be send to delegator, but it need
replica to info find segment's delegator. but the stopping query node
will be marked as read only in replica, then `replica.Contains()` just
return true for rwNode in replica. then it can't get replica info by
stopping query node and release segment will be blocked.
This PR make `replica.Contains()` return true for both roNode and
rwNode.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #32165
In old `GetResourceGroupByCollection` implementation, it iterates all
replicas to match collection id, which is slow and CPU time consuming.
This PR make it utilize the coll2Replicas mapping by calling
`GetByCollection` and mapping replicas into resource group.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also #32165
Cache channel name to channel info to avoid iteration over channel
results when updating leader view version.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also #32860
SyncMgr did not ensure task key is locked before `SyncData` returning
which may cause concurrent problem during sync wich multiple policies.
This PR change sync mgr implementation to make sure the key is locked
before returning task result `*conc.Future`
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also #32165
This PR modify the `SelectSegments` interface to utilizing collection id
information when selecting segment with provided collection
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #31516
background: the server id field in data node is redundant. session id
already provides the source of truth.
Signed-off-by: yiwangdr <yiwangdr@gmail.com>
issue: #32466
this PR enhance that when shard location changed, update proxy's shard
leader cache. in case of query node failover case, proxy can find
replica recover
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #32530
when try to match segment bloom filter with pk, we can reuse the hash
locations. This PR maintain the max hash Func, and compute hash location
once for all segment, reuse hash location can speed up bf access
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #32748
This PR:
- Add `metautil.Channel` utiltiy which convert virtual name to physical
channel name, collectionID and shard idx
- Add channel mapper interface & implementation to convert limited
physical channel name into int index
- Apply `metautil.Channel` filter in querynode segment manager logic
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32663
- Use new param to control request resource timeout for lazy load.
- Remove the timeout parameter of `Do`, remove `DoWait`. use `context`
to control the timeout.
- Use `VersionedNotifier` to avoid notify event lost and broadcast,
remove the redundant goroutine in cache.
related dev pr: #32684
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #32449
to avoid GetShardLeaders return empty node list, this PR add node list
check in both client side and server side.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #27675
Use `struct{}` instead `error` for sync task future result type to
reduce result size and preventing logci error.
Also change some unused parameter to `_` to suppress lint warning
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
1. Make the dynamic field file optional during numpy import
2. Add integration importing test with dynamic
3. Disallow file of pk when autoID=true during numpy import
issue: https://github.com/milvus-io/milvus/issues/32542
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Adding a collection id to the index node log allows you to associate an
index building task with a specific collection.
If the host CPU usage is too high due to index build, you can use the
collection id to quickly locate a specific collection, improving fault
locating efficiency.
Signed-off-by: dengxiaohai <rolkdengxiaohai@didiglobal.com>
Co-authored-by: dengxiaohai <rolkdengxiaohai@didiglobal.com>
related to #32165
1. for all the manager, support collection level index
2. remove collection level filter to avoid extra cpu usage when
collection number increases
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
issue: #19095,#29655,#31718
- Change `ListWithPrefix` to `WalkWithPrefix` of OOS into a pipeline
mode.
- File garbage collection is performed in other goroutine.
- Segment Index Recycle clean index file too.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #30647
- Remove error report if there's no query node serve. It's hard for
programer to use it to do resource management.
- Change resource group `transferNode` logic to keep compatible with old
version sdk.
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue: #29419
added helper functions to parse JSON representation of sparse float
vectors, will be used by both the restful server and the import utils.
Signed-off-by: Buqian Zheng <zhengbuqian@gmail.com>
See also #32165
Add segment dist and leader view filter criterion struct to store
frequent filter conditions.
Add collection/channel filter results for these two meta
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>