issue: #33200#33207
pr#33104 causes the offline node will be kept in resource group after qc
recover, and offline node will be assign to new replica as rwNode, then
request send to those node will fail by NodeNotFound.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
fix#32979
remove l0 cache and build delete pk and ts everytime. this reduce the
memory and also increase the code readability
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
issue: #33200#33207
pr#33104 remove this logic by mistake, which cause the offline node will
be kept in replica after qc recover, and request send to offline qn will
go a NodeNotFound error.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Related to #25309
- Remove ctx from struct
- Add ctx parameters for internal check logic methods
- Add Waitgroup to make sure worker goroutine quit before close returns
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33103
when try to do stopping balance for stopping query node, balancer will
try to get node list from replica.GetNodes, then check whether node is
stopping, if so, stopping balance will be triggered for this replica.
after the replica refactor, replica.GetNodes only return rwNodes, and
the stopping node maintains in roNodes, so balancer couldn't find
replica which contains stopping node, and stopping balance for replica
won't be triggered, then query node will stuck forever due to
segment/channel doesn't move out.
---------
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Query slot of compaction in datanode, and transfer the control logic for
limiting compaction tasks from datacoord to the datanode.
issue: https://github.com/milvus-io/milvus/issues/32809
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
To decouple compaction from shard, loading BF from storage instead of
memory during L0 compaction in datanode.
issue: https://github.com/milvus-io/milvus/issues/32809
---------
Signed-off-by: bigsheeper <yihao.dai@zilliz.com>
Avoid passing datanode around preparing datanode code directory
refactory.
Also refine unit test code for same component. The `Await` shall return
first before checking the counter number since when lock cost is heavy
(using deadlock.RWMutex See PR #33069.) case may fail due to long
running time submitting tasks.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
1. add nullable in model.Field
help to read nullable accurately.
2. check valid_data
a. if user pass default_value or the field is nullable, the length of
valid_data must be num_rows.
b. if passed valid_data, the length of passed field data must equal to
the number of 'true' in valid_data.
c. after fill default_value, only nullable field will still has
valid_data.
3. fill data in two situation
a. has no default_value, if nullable,
will fill nullValue when passed num_rows not equal to expected num_rows.
b. has default_value,
will fill default_value when passed num_rows not equal to expected
num_rows.
c. after fill data, the length of all field will equal to passed
num_rows.
#31728
---------
Signed-off-by: lixinguo <xinguo.li@zilliz.com>
Co-authored-by: lixinguo <xinguo.li@zilliz.com>
See also #33062
This PR:
- Add `lock.RWMutex` & `lock.Mutex` alias to switch implementation based
on build flags
- When build flags has `test` in it, use `go-deadlock` to detect
possible deadlocks
- Replace all `sync.RWMutex` & `sync.Mutex` in datacoord pkg
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #33005
1. add `MemorySize` field for insert binlog.
2. `LogSize` means the file size in the storage object.
3. `MemorySize` means the size of the data in the memory.
---------
Signed-off-by: Cai Zhang <cai.zhang@zilliz.com>
Signed-off-by: cai.zhang <cai.zhang@zilliz.com>
issue: #32910
see also: #32911
when channel exclusive mode is enabled, replica will record channel node
info in meta store, and if the balance policy changes, which means
channel exclusive mode is disabled, we should clean up the channel node
info in meta store, and stop to balance node between channels.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #32642
This PR reuses hash locations for bloom filter prediction utilizing
`storage.Location`, like enhancement #32642.
Also adds a utility struct in storage: `LocationCache` to storage
locations for variable K (numbers of hash functions)
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32970
cause InvalidateShardLeaderCache use wrong lock, which may cause data
race in meta cache, then proxy may crash
This PR fixed that use leaderMut when try to access shard leader cache.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
fix#32868
plan parser takes too much cpu on high qps,this pr try to avoid create
lexer and parser too freequent
Signed-off-by: xiaofanluan <xiaofan.luan@zilliz.com>
Set `UseDefaultConsistency` to true so that restv2 read API shall use
collection consistency level setting correctly.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #32206, #32801
- search failure with some assertion, segment not loaded and resource
insufficient.
- segment leak when query segments
---------
Signed-off-by: chyezh <chyezh@outlook.com>
issue:#32899
This PR fix the wrong metric value of load index, which introduced by
pr#32567, use wrong time unit for load index metrics
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
issue: #32910
* split replica's node list to channels when create replicas
* balance nodes among channels when node change happens
* implement channel level balance, let balance happens in channel level
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
Issue: #22837
including following changes:
1. Add API CreateInsertData() and BuildArrayData() in
internal/util/testutil
2. Remove duplicated test APIs from importutilv2 unittest and bulk
insert integration test
Signed-off-by: Cai Yudong <yudong.cai@zilliz.com>
issue: #32901
Cause release segment request need be send to delegator, but it need
replica to info find segment's delegator. but the stopping query node
will be marked as read only in replica, then `replica.Contains()` just
return true for rwNode in replica. then it can't get replica info by
stopping query node and release segment will be blocked.
This PR make `replica.Contains()` return true for both roNode and
rwNode.
Signed-off-by: Wei Liu <wei.liu@zilliz.com>
See also #32165
In old `GetResourceGroupByCollection` implementation, it iterates all
replicas to match collection id, which is slow and CPU time consuming.
This PR make it utilize the coll2Replicas mapping by calling
`GetByCollection` and mapping replicas into resource group.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also #32165
Cache channel name to channel info to avoid iteration over channel
results when updating leader view version.
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also #32860
SyncMgr did not ensure task key is locked before `SyncData` returning
which may cause concurrent problem during sync wich multiple policies.
This PR change sync mgr implementation to make sure the key is locked
before returning task result `*conc.Future`
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
See also #32165
This PR modify the `SelectSegments` interface to utilizing collection id
information when selecting segment with provided collection
---------
Signed-off-by: Congqi Xia <congqi.xia@zilliz.com>
issue: #31516
background: the server id field in data node is redundant. session id
already provides the source of truth.
Signed-off-by: yiwangdr <yiwangdr@gmail.com>