Merge pull request #5917 from milvus-io/1.1

Merge branch 1.1 to branch 1.x
1.x
shengjun.li 2021-06-21 20:12:44 +08:00 committed by GitHub
commit 0c2198faa8
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
134 changed files with 1201 additions and 1949 deletions

View File

@ -2,7 +2,28 @@
Please mark all change in change log and use the issue from GitHub
# Milvus 1.1.0 (2021-04-29)
# Milvus 1.1.1 (2021-06-16)
## Bug
- \#4897 Query results contain some deleted ids
- \#5164 Exception should be raised if insert or delete entity on the none-existed partition
- \#5191 Mishards throw "index out of range" error after continually search/insert for a period of time
- \#5398 Random crash after request is executed
- \#5537 Failed to load bloom filter after suddenly power off
- \#5574 IVF_SQ8 and IVF_PQ cannot be built on multiple GPUs
- \#5747 Search with big nq and topk crash milvus
## Feature
- \#1434 Storage: enabling s3 storage support (implemented by Unisinsight)
- \#5142 Support keeping index in GPU memory
## Improvement
- \#5115 Relax the topk limit from 16384 to 1M for CPU search
- \#5204 Improve IVF query on GPU when no entity deleted
- \#5544 Relax the index_file_size limit from 4GB to 128Gb
## Task
# Milvus 1.1.0 (2021-05-07)
## Bug
- \#4778 Failed to open vector index in mishards
- \#4797 Wrong results returned for merged different 'topk' queries
@ -17,7 +38,7 @@ Please mark all change in change log and use the issue from GitHub
- \#5010 IVF_PQ failed to query on GPU if 'nbits' doesn't equal to 8
- \#5050 Index type returned by get_collection_stats() is incorrect
- \#5063 Empty segment is serialized and crash milvus
- \#5078 Server crashed when creaing a GPU IVF index whose dimension is 2048/4086/8192
- \#5078 Server crashed when creaing a GPU IVF index whose dimension is 2048/4096/8192
## Feature
- \#4564 Allow get_entity_by_id() in a specified partition

View File

@ -60,12 +60,12 @@ For GPU-enabled version, you will also need:
#### Step 1 Download Milvus source code and specify version
Download Milvus source code, change directory and specify version (for example, 0.10.3):
Download Milvus source code, change directory and specify version (for example, 1.1):
```shell
$ git clone https://github.com/milvus-io/milvus
$ cd ./milvus/core
$ git checkout 0.10.3
$ git checkout 1.1
```
#### Step 2 Install dependencies
@ -215,10 +215,10 @@ To enter its core directory:
$ cd ./milvus/core
```
Specify version (for example, 0.10.3):
Specify version (for example, 1.1):
```shell
$ git checkout 0.10.3
$ git checkout 1.1
```
### Step 4 Compile Milvus in the container

View File

@ -30,15 +30,15 @@ See [Milvus install guide](https://www.milvus.io/docs/install_milvus.md) to inst
### Try example programs
Try an example program with Milvus using [Python](https://www.milvus.io/docs/example_code.md), [Java](https://github.com/milvus-io/milvus-sdk-java/tree/master/examples), [Go](https://github.com/milvus-io/milvus-sdk-go/tree/master/examples), or [C++ example code](https://github.com/milvus-io/milvus/tree/master/sdk/examples).
Try an example program with Milvus using [Python](https://www.milvus.io/docs/example_code.md), [Java](https://github.com/milvus-io/milvus-sdk-java/tree/master/examples), [Go](https://github.com/milvus-io/milvus-sdk-go/tree/master/examples), or [C++ example code](https://github.com/milvus-io/milvus/tree/1.1/sdk/examples).
## Supported clients
- [Go](https://github.com/milvus-io/milvus-sdk-go)
- [Python](https://github.com/milvus-io/pymilvus)
- [Java](https://github.com/milvus-io/milvus-sdk-java)
- [C++](https://github.com/milvus-io/milvus/tree/master/sdk)
- [RESTful API](https://github.com/milvus-io/milvus/tree/master/core/src/server/web_impl)
- [C++](https://github.com/milvus-io/milvus/tree/1.1/sdk)
- [RESTful API](https://github.com/milvus-io/milvus/tree/1.1/core/src/server/web_impl)
- [Node.js](https://www.npmjs.com/package/@arkie-ai/milvus-client) (Contributed by [arkie](https://www.arkie.cn/))
## Application scenarios
@ -47,7 +47,7 @@ You can use Milvus to build intelligent systems in a variety of AI application s
## Benchmark
See our [test reports](https://github.com/milvus-io/milvus/tree/master/docs) for more information about performance benchmarking of different indexes in Milvus.
See our [test reports](https://github.com/milvus-io/milvus/tree/1.1/docs) for more information about performance benchmarking of different indexes in Milvus.
## Roadmap

View File

@ -10,9 +10,9 @@ timeout(time: 180, unit: 'MINUTES') {
try {
dir ('charts/milvus') {
if ("${BINARY_VERSION}" == "CPU") {
sh "helm install --wait --timeout 300s --set cluster.enabled=true --set persistence.enabled=true --set image.repository=registry.zilliz.com/milvus/engine --set mishards.image.tag=test --set mishards.image.pullPolicy=Always --set image.tag=${DOCKER_VERSION} --set image.pullPolicy=Always --set service.type=ClusterIP --set image.resources.requests.memory=8Gi --set image.resources.requests.cpu=2.0 --set image.resources.limits.memory=12Gi --set image.resources.limits.cpu=4.0 -f ci/db_backend/mysql_${BINARY_VERSION}_values.yaml -f ci/filebeat/values.yaml --namespace milvus ${env.SHARDS_HELM_RELEASE_NAME} ."
sh "helm install --wait --timeout 300s --set cluster.enabled=true --set persistence.enabled=true --set mishards.image.repository=milvusdb/mishards --set mishards.image.tag=1.1.0 --set mishards.image.pullPolicy=Always --set image.repository=registry.zilliz.com/milvus/engine --set image.tag=${DOCKER_VERSION} --set image.pullPolicy=Always --set service.type=ClusterIP --set image.resources.requests.memory=8Gi --set image.resources.requests.cpu=2.0 --set image.resources.limits.memory=12Gi --set image.resources.limits.cpu=4.0 -f ci/db_backend/mysql_${BINARY_VERSION}_values.yaml -f ci/filebeat/values.yaml --namespace milvus ${env.SHARDS_HELM_RELEASE_NAME} ."
} else {
sh "helm install --wait --timeout 300s --set cluster.enabled=true --set persistence.enabled=true --set image.repository=registry.zilliz.com/milvus/engine --set mishards.image.tag=test --set mishards.image.pullPolicy=Always --set gpu.enabled=true --set readonly.gpu.enabled=true --set image.tag=${DOCKER_VERSION} --set image.pullPolicy=Always --set service.type=ClusterIP -f ci/db_backend/mysql_${BINARY_VERSION}_values.yaml -f ci/filebeat/values.yaml --namespace milvus ${env.SHARDS_HELM_RELEASE_NAME} ."
sh "helm install --wait --timeout 300s --set cluster.enabled=true --set persistence.enabled=true --set mishards.image.repository=milvusdb/mishards --set mishards.image.tag=1.1.0 --set mishards.image.pullPolicy=Always --set gpu.enabled=true --set readonly.gpu.enabled=true --set image.repository=registry.zilliz.com/milvus/engine --set image.tag=${DOCKER_VERSION} --set image.pullPolicy=Always --set service.type=ClusterIP -f ci/db_backend/mysql_${BINARY_VERSION}_values.yaml -f ci/filebeat/values.yaml --namespace milvus ${env.SHARDS_HELM_RELEASE_NAME} ."
}
}
} catch (exc) {
@ -30,6 +30,6 @@ timeout(time: 180, unit: 'MINUTES') {
dir ("tests/milvus_python_test") {
sh 'python3 -m pip install -r requirements.txt'
sh "pytest . --level=2 --alluredir=\"test_out/dev/shards/\" --ip ${env.SHARDS_HELM_RELEASE_NAME}.milvus.svc.cluster.local >> ${WORKSPACE}/${env.DEV_TEST_ARTIFACTS}/milvus_${BINARY_VERSION}_shards_dev_test.log"
sh "pytest . --level=1 --alluredir=\"test_out/dev/shards/\" --ip ${env.SHARDS_HELM_RELEASE_NAME}.milvus.svc.cluster.local >> ${WORKSPACE}/${env.DEV_TEST_ARTIFACTS}/milvus_${BINARY_VERSION}_shards_dev_test.log"
}
}

View File

@ -90,7 +90,7 @@ if (MILVUS_VERSION_MAJOR STREQUAL ""
OR MILVUS_VERSION_MINOR STREQUAL ""
OR MILVUS_VERSION_PATCH STREQUAL "")
message(WARNING "Failed to determine Milvus version from git branch name")
set(MILVUS_VERSION "1.1.0")
set(MILVUS_VERSION "1.1.1")
endif ()
message(STATUS "Build version = ${MILVUS_VERSION}")

View File

@ -16,9 +16,8 @@ WITH_MKL="OFF"
WITH_PROMETHEUS="ON"
FIU_ENABLE="OFF"
BUILD_OPENBLAS="ON"
WITH_AWS="OFF"
while getopts "p:d:t:f:ulrcgahzmeis" arg; do
while getopts "p:d:t:f:ulrcgahzmei" arg; do
case $arg in
p)
INSTALL_PREFIX=$OPTARG
@ -63,9 +62,6 @@ while getopts "p:d:t:f:ulrcgahzmeis" arg; do
a)
FPGA_VERSION="ON"
;;
s)
WITH_AWS="ON"
;;
h) # help
echo "
@ -83,11 +79,10 @@ parameter:
-e: build without prometheus(default: OFF)
-i: build FIU_ENABLE(default: OFF)
-a: build FPGA(default: OFF)
-s: build with AWS S3(default: OFF)
-h: help
usage:
./build.sh -p \${INSTALL_PREFIX} -t \${BUILD_TYPE} [-u] [-l] [-r] [-c] [-z] [-g] [-a] [-m] [-e] [-s] [-h]
./build.sh -p \${INSTALL_PREFIX} -t \${BUILD_TYPE} [-u] [-l] [-r] [-c] [-z] [-g] [-a] [-m] [-e] [-h]
"
exit 0
;;
@ -122,7 +117,6 @@ CMAKE_CMD="cmake \
-DFAISS_WITH_MKL=${WITH_MKL} \
-DMILVUS_WITH_PROMETHEUS=${WITH_PROMETHEUS} \
-DMILVUS_WITH_FIU=${FIU_ENABLE} \
-DMILVUS_WITH_AWS=${WITH_AWS} \
../"
echo ${CMAKE_CMD}
${CMAKE_CMD}

View File

@ -86,7 +86,7 @@ define_option(MILVUS_WITH_OPENTRACING "Build with Opentracing" ON)
define_option(MILVUS_WITH_FIU "Build with fiu" OFF)
define_option(MILVUS_WITH_AWS "Build with aws" OFF)
define_option(MILVUS_WITH_AWS "Build with aws" ON)
define_option(MILVUS_WITH_OATPP "Build with oatpp" ON)

View File

@ -64,10 +64,25 @@ network:
# | flushes data to disk. | | |
# | 0 means disable the regular flush. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
# s3_enabled | If using s3 storage backend. | Boolean | false |
#----------------------+------------------------------------------------------------+------------+-----------------+
# s3_address | The s3 server address, support domain/hostname/ipaddress | String | 127.0.0.1 |
#----------------------+------------------------------------------------------------+------------+-----------------+
# s3_port | The s3 server port. | Integer | 80 |
#----------------------+------------------------------------------------------------+------------+-----------------+
# s3_access_key | The access key for accessing s3 service. | String | s3_access_key |
#----------------------+------------------------------------------------------------+------------+-----------------+
# s3_secret_key | The secrey key for accessing s3 service. | String | s3_secret_key |
#----------------------+------------------------------------------------------------+------------+-----------------+
# s3_bucket | The s3 bucket name for store milvus's data. | String | s3_bucket |
# | Note: please using differnet bucket for different milvus | | |
# | cluster. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
storage:
path: /var/lib/milvus
auto_flush_interval: 1
#----------------------+------------------------------------------------------------+------------+-----------------+
# WAL Config | Description | Type | Default |
#----------------------+------------------------------------------------------------+------------+-----------------+

View File

@ -124,6 +124,8 @@ cache:
#----------------------+------------------------------------------------------------+------------+-----------------+
# enable | Use GPU devices or not. | Boolean | false |
#----------------------+------------------------------------------------------------+------------+-----------------+
# cache.enable | Enable cache index on GPU devices or not. | Boolean | false |
#----------------------+------------------------------------------------------------+------------+-----------------+
# cache_size | The size of GPU memory per card used for cache. | String | 1GB |
#----------------------+------------------------------------------------------------+------------+-----------------+
# gpu_search_threshold | A Milvus performance tuning parameter. This value will be | Integer | 1000 |
@ -144,6 +146,7 @@ cache:
#----------------------+------------------------------------------------------------+------------+-----------------+
gpu:
enable: @GPU_ENABLE@
cache.enable: false
cache_size: 1GB
gpu_search_threshold: 1000
search_devices:

View File

@ -66,6 +66,9 @@ class Cache {
void
insert(const std::string& key, const ItemObj& item);
void
insert_if_not_exist(const std::string& key, const ItemObj& item);
void
erase(const std::string& key);

View File

@ -64,6 +64,16 @@ Cache<ItemObj>::insert(const std::string& key, const ItemObj& item) {
insert_internal(key, item);
}
template <typename ItemObj>
void
Cache<ItemObj>::insert_if_not_exist(const std::string& key, const ItemObj& item) {
std::lock_guard<std::mutex> lock(mutex_);
if (lru_.exists(key) == true) {
return ;
}
insert_internal(key, item);
}
template <typename ItemObj>
void
Cache<ItemObj>::erase(const std::string& key) {

View File

@ -36,6 +36,9 @@ class CacheMgr {
virtual void
InsertItem(const std::string& key, const ItemObj& data);
virtual void
InsertItemIfNotExist(const std::string& key, const ItemObj& data);
virtual void
EraseItem(const std::string& key);

View File

@ -62,6 +62,17 @@ CacheMgr<ItemObj>::InsertItem(const std::string& key, const ItemObj& data) {
server::Metrics::GetInstance().CacheAccessTotalIncrement();
}
template <typename ItemObj>
void
CacheMgr<ItemObj>::InsertItemIfNotExist(const std::string& key, const ItemObj& data) {
if (cache_ == nullptr) {
LOG_SERVER_ERROR_ << "Cache doesn't exist";
return;
}
cache_->insert_if_not_exist(key, data);
server::Metrics::GetInstance().CacheAccessTotalIncrement();
}
template <typename ItemObj>
void
CacheMgr<ItemObj>::EraseItem(const std::string& key) {

View File

@ -22,6 +22,7 @@ namespace milvus {
namespace cache {
const char* BloomFilter_Suffix = ".bloomfilter";
const char* Blacklist_Suffix = ".blacklist";
CpuCacheMgr::CpuCacheMgr() {
// All config values have been checked in Config::ValidateConfig()

View File

@ -23,6 +23,7 @@ namespace cache {
// Define cache key suffix
extern const char* BloomFilter_Suffix;
extern const char* Blacklist_Suffix;
class CpuCacheMgr : public CacheMgr<DataObjPtr>, public server::CacheConfigHandler {
private:

View File

@ -39,7 +39,7 @@ void
DefaultDeletedDocsFormat::read(const storage::FSHandlerPtr& fs_ptr, segment::DeletedDocsPtr& deleted_docs) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = fs_ptr->operation_ptr_->GetDirectory();
auto& dir_path = fs_ptr->operation_ptr_->GetDirectory();
const std::string del_file_path = dir_path + "/" + deleted_docs_filename_;
fs_ptr->operation_ptr_->CacheGet(del_file_path);
@ -78,60 +78,33 @@ DefaultDeletedDocsFormat::read(const storage::FSHandlerPtr& fs_ptr, segment::Del
void
DefaultDeletedDocsFormat::write(const storage::FSHandlerPtr& fs_ptr, const segment::DeletedDocsPtr& deleted_docs) {
std::string dir_path = fs_ptr->operation_ptr_->GetDirectory();
auto& dir_path = fs_ptr->operation_ptr_->GetDirectory();
const std::string del_file_path = dir_path + "/" + deleted_docs_filename_;
// Create a temporary file from the existing file
const std::string temp_path = dir_path + "/" + "temp_del";
fs_ptr->operation_ptr_->CacheGet(del_file_path);
bool exists = boost::filesystem::exists(del_file_path);
if (exists) {
boost::filesystem::copy_file(del_file_path, temp_path, boost::filesystem::copy_option::fail_if_exists);
}
// Write to the temp file, in order to avoid possible race condition with search (concurrent read and write)
int del_fd = open(temp_path.c_str(), O_RDWR | O_CREAT, 00664);
fs_ptr->operation_ptr_->CacheGet(del_file_path);
// if exist write to the temp file, in order to avoid possible race condition with search
bool exists = boost::filesystem::exists(del_file_path);
const std::string* file_path = exists ? &temp_path : &del_file_path;
int del_fd = open(file_path->c_str(), O_RDWR | O_CREAT, 00664);
if (del_fd == -1) {
std::string err_msg = "Failed to open file: " + temp_path + ", error: " + std::strerror(errno);
LOG_ENGINE_ERROR_ << err_msg;
throw Exception(SERVER_CANNOT_CREATE_FILE, err_msg);
}
size_t old_num_bytes;
if (exists) {
if (::read(del_fd, &old_num_bytes, sizeof(size_t)) == -1) {
std::string err_msg = "Failed to read from file: " + temp_path + ", error: " + std::strerror(errno);
LOG_ENGINE_ERROR_ << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
} else {
old_num_bytes = 0;
}
auto& deleted_docs_list = deleted_docs->GetDeletedDocs();
size_t new_num_bytes = sizeof(segment::offset_t) * deleted_docs->GetSize();
auto deleted_docs_list = deleted_docs->GetDeletedDocs();
size_t new_num_bytes = old_num_bytes + sizeof(segment::offset_t) * deleted_docs->GetSize();
// rewind and overwrite with the new_num_bytes
int off = lseek(del_fd, 0, SEEK_SET);
if (off == -1) {
std::string err_msg = "Failed to seek file: " + temp_path + ", error: " + std::strerror(errno);
LOG_ENGINE_ERROR_ << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
if (::write(del_fd, &new_num_bytes, sizeof(size_t)) == -1) {
std::string err_msg = "Failed to write to file" + temp_path + ", error: " + std::strerror(errno);
LOG_ENGINE_ERROR_ << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
// Move to the end of file and append
off = lseek(del_fd, 0, SEEK_END);
if (off == -1) {
std::string err_msg = "Failed to seek file: " + temp_path + ", error: " + std::strerror(errno);
LOG_ENGINE_ERROR_ << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
if (::write(del_fd, deleted_docs_list.data(), sizeof(segment::offset_t) * deleted_docs->GetSize()) == -1) {
if (::write(del_fd, deleted_docs_list.data(), new_num_bytes) == -1) {
std::string err_msg = "Failed to write to file" + temp_path + ", error: " + std::strerror(errno);
LOG_ENGINE_ERROR_ << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
@ -144,8 +117,10 @@ DefaultDeletedDocsFormat::write(const storage::FSHandlerPtr& fs_ptr, const segme
}
// Move temp file to delete file
const std::lock_guard<std::mutex> lock(mutex_);
boost::filesystem::rename(temp_path, del_file_path);
if (exists) {
const std::lock_guard<std::mutex> lock(mutex_);
boost::filesystem::rename(temp_path, del_file_path);
}
fs_ptr->operation_ptr_->CachePut(del_file_path);
}
@ -153,7 +128,7 @@ void
DefaultDeletedDocsFormat::readSize(const storage::FSHandlerPtr& fs_ptr, size_t& size) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = fs_ptr->operation_ptr_->GetDirectory();
auto& dir_path = fs_ptr->operation_ptr_->GetDirectory();
const std::string del_file_path = dir_path + "/" + deleted_docs_filename_;
fs_ptr->operation_ptr_->CacheGet(del_file_path);

View File

@ -17,8 +17,13 @@
#include "codecs/default/DefaultIdBloomFilterFormat.h"
#include <fcntl.h>
#include <fiu-local.h>
#define BOOST_NO_CXX11_SCOPED_ENUMS
#include <boost/filesystem.hpp>
#undef BOOST_NO_CXX11_SCOPED_ENUMS
#include <memory>
#include <string>
@ -39,7 +44,7 @@ void
DefaultIdBloomFilterFormat::read(const storage::FSHandlerPtr& fs_ptr, segment::IdBloomFilterPtr& id_bloom_filter_ptr) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = fs_ptr->operation_ptr_->GetDirectory();
auto& dir_path = fs_ptr->operation_ptr_->GetDirectory();
const std::string bloom_filter_file_path = dir_path + "/" + bloom_filter_filename_;
scaling_bloom_t* bloom_filter{nullptr};
do {
@ -94,13 +99,18 @@ DefaultIdBloomFilterFormat::read(const storage::FSHandlerPtr& fs_ptr, segment::I
void
DefaultIdBloomFilterFormat::write(const storage::FSHandlerPtr& fs_ptr,
const segment::IdBloomFilterPtr& id_bloom_filter_ptr) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = fs_ptr->operation_ptr_->GetDirectory();
auto& dir_path = fs_ptr->operation_ptr_->GetDirectory();
const std::string bloom_filter_file_path = dir_path + "/" + bloom_filter_filename_;
if (!fs_ptr->writer_ptr_->open(bloom_filter_file_path)) {
std::string err_msg =
"Failed to write bloom filter to file: " + bloom_filter_file_path + ". " + std::strerror(errno);
const std::string temp_bloom_filter_file_path = dir_path + "/" + "temp_bloom";
fs_ptr->operation_ptr_->CacheGet(bloom_filter_file_path);
bool exists = boost::filesystem::exists(bloom_filter_file_path);
const std::string* file_path = exists ? &temp_bloom_filter_file_path : &bloom_filter_file_path;
int del_fd = open(file_path->c_str(), O_RDWR | O_CREAT, 00664);
if (del_fd == -1) {
std::string err_msg = "Failed to write bloom filter to file: " + *file_path + ". " + std::strerror(errno);
LOG_ENGINE_ERROR_ << err_msg;
throw Exception(SERVER_UNEXPECTED_ERROR, err_msg);
}
@ -108,12 +118,24 @@ DefaultIdBloomFilterFormat::write(const storage::FSHandlerPtr& fs_ptr,
auto bloom_filter = id_bloom_filter_ptr->GetBloomFilter();
int64_t magic_num = BLOOM_FILTER_MAGIC_NUM;
fs_ptr->writer_ptr_->write(&magic_num, sizeof(magic_num));
fs_ptr->writer_ptr_->write(&bloom_filter->capacity, sizeof(bloom_filter->capacity));
fs_ptr->writer_ptr_->write(&bloom_filter->error_rate, sizeof(bloom_filter->error_rate));
fs_ptr->writer_ptr_->write(&bloom_filter->bitmap->bytes, sizeof(bloom_filter->bitmap->bytes));
fs_ptr->writer_ptr_->write(bloom_filter->bitmap->array, bloom_filter->bitmap->bytes);
fs_ptr->writer_ptr_->close();
::write(del_fd, &magic_num, sizeof(magic_num));
::write(del_fd, &bloom_filter->capacity, sizeof(bloom_filter->capacity));
::write(del_fd, &bloom_filter->error_rate, sizeof(bloom_filter->error_rate));
::write(del_fd, &bloom_filter->bitmap->bytes, sizeof(bloom_filter->bitmap->bytes));
::write(del_fd, bloom_filter->bitmap->array, bloom_filter->bitmap->bytes);
if (::close(del_fd) == -1) {
std::string err_msg = "Failed to close file: " + *file_path + ", error: " + std::strerror(errno);
LOG_ENGINE_ERROR_ << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
// Move temp file to bloom filter file
if (exists) {
const std::lock_guard<std::mutex> lock(mutex_);
boost::filesystem::rename(temp_bloom_filter_file_path, bloom_filter_file_path);
}
fs_ptr->operation_ptr_->CachePut(bloom_filter_file_path);
}

View File

@ -107,7 +107,7 @@ DefaultVectorIndexFormat::read(const storage::FSHandlerPtr& fs_ptr, const std::s
segment::VectorIndexPtr& vector_index) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = fs_ptr->operation_ptr_->GetDirectory();
auto& dir_path = fs_ptr->operation_ptr_->GetDirectory();
if (!boost::filesystem::is_directory(dir_path)) {
std::string err_msg = "Directory: " + dir_path + "does not exist";
LOG_ENGINE_ERROR_ << err_msg;

View File

@ -76,7 +76,7 @@ void
DefaultVectorsFormat::read(const storage::FSHandlerPtr& fs_ptr, segment::VectorsPtr& vectors_read) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = fs_ptr->operation_ptr_->GetDirectory();
auto& dir_path = fs_ptr->operation_ptr_->GetDirectory();
if (!boost::filesystem::is_directory(dir_path)) {
std::string err_msg = "Directory: " + dir_path + "does not exist";
LOG_ENGINE_ERROR_ << err_msg;
@ -102,7 +102,7 @@ void
DefaultVectorsFormat::write(const storage::FSHandlerPtr& fs_ptr, const segment::VectorsPtr& vectors) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = fs_ptr->operation_ptr_->GetDirectory();
auto& dir_path = fs_ptr->operation_ptr_->GetDirectory();
const std::string rv_file_path = dir_path + "/" + vectors->GetName() + raw_vector_extension_;
const std::string uid_file_path = dir_path + "/" + vectors->GetName() + user_id_extension_;
@ -139,7 +139,7 @@ void
DefaultVectorsFormat::read_uids(const storage::FSHandlerPtr& fs_ptr, std::vector<segment::doc_id_t>& uids) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = fs_ptr->operation_ptr_->GetDirectory();
auto& dir_path = fs_ptr->operation_ptr_->GetDirectory();
if (!boost::filesystem::is_directory(dir_path)) {
std::string err_msg = "Directory: " + dir_path + "does not exist";
LOG_ENGINE_ERROR_ << err_msg;
@ -161,7 +161,7 @@ DefaultVectorsFormat::read_vectors(const storage::FSHandlerPtr& fs_ptr, off_t of
std::vector<uint8_t>& raw_vectors) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = fs_ptr->operation_ptr_->GetDirectory();
auto& dir_path = fs_ptr->operation_ptr_->GetDirectory();
if (!boost::filesystem::is_directory(dir_path)) {
std::string err_msg = "Directory: " + dir_path + "does not exist";
LOG_ENGINE_ERROR_ << err_msg;

View File

@ -1297,7 +1297,7 @@ Config::CheckStorageConfigS3Enable(const std::string& value) {
Status
Config::CheckStorageConfigS3Address(const std::string& value) {
if (!ValidationUtil::ValidateIpAddress(value).ok()) {
if (!ValidationUtil::ValidateHostname(value).ok()) {
std::string msg = "Invalid s3 address: " + value + ". Possible reason: storage_config.s3_address is invalid.";
return Status(SERVER_INVALID_ARGUMENT, msg);
}

View File

@ -1,38 +0,0 @@
// Copyright (C) 2019-2020 Zilliz. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#pragma once
#include <iostream>
#include <memory>
#include <string>
#include <vector>
#include "query/BinaryQuery.h"
namespace milvus {
namespace search {
class Task;
using TaskPtr = std::shared_ptr<Task>;
} // namespace search
namespace context {
struct HybridSearchContext {
query::GeneralQueryPtr general_query_;
std::vector<::milvus::search::TaskPtr> tasks_;
};
using HybridSearchContextPtr = std::shared_ptr<HybridSearchContext>;
} // namespace context
} // namespace milvus

View File

@ -18,7 +18,6 @@
#include "Options.h"
#include "Types.h"
#include "context/HybridSearchContext.h"
#include "meta/Meta.h"
#include "query/GeneralQuery.h"
#include "server/context/Context.h"
@ -159,17 +158,6 @@ class DB {
virtual Status
DropAll() = 0;
virtual Status
CreateHybridCollection(meta::CollectionSchema& collection_schema, meta::hybrid::FieldsSchema& fields_schema) = 0;
virtual Status
DescribeHybridCollection(meta::CollectionSchema& collection_schema, meta::hybrid::FieldsSchema& fields_schema) = 0;
virtual Status
InsertEntities(const std::string& collection_id, const std::string& partition_tag,
const std::vector<std::string>& field_names, Entity& entity,
std::unordered_map<std::string, meta::hybrid::DataType>& field_types) = 0;
}; // DB
using DBPtr = std::shared_ptr<DB>;

View File

@ -31,11 +31,11 @@
#include "Utils.h"
#include "cache/CpuCacheMgr.h"
#include "cache/GpuCacheMgr.h"
#include "codecs/default/DefaultCodec.h"
#include "db/IDGenerator.h"
#include "db/merge/MergeManagerFactory.h"
#include "engine/EngineFactory.h"
#include "index/knowhere/knowhere/index/vector_index/helpers/BuilderSuspend.h"
#include "index/knowhere/knowhere/index/vector_index/helpers/FaissIO.h"
#include "index/thirdparty/faiss/utils/distances.h"
#include "insert/MemManagerFactory.h"
#include "meta/MetaConsts.h"
@ -93,7 +93,6 @@ DBImpl::DBImpl(const DBOptions& options)
SetIdentity("DBImpl");
AddCacheInsertDataListener();
AddUseBlasThresholdListener();
knowhere::enable_faiss_logging();
Start();
}
@ -234,33 +233,15 @@ DBImpl::CreateCollection(meta::CollectionSchema& collection_schema) {
meta::CollectionSchema temp_schema = collection_schema;
temp_schema.index_file_size_ *= MB; // store as MB
if (options_.wal_enable_) {
temp_schema.flush_lsn_ = wal_mgr_->CreateCollection(collection_schema.collection_id_);
temp_schema.flush_lsn_ = wal_mgr_->GetLastAppliedLsn();
}
return meta_ptr_->CreateCollection(temp_schema);
}
Status
DBImpl::CreateHybridCollection(meta::CollectionSchema& collection_schema, meta::hybrid::FieldsSchema& fields_schema) {
if (!initialized_.load(std::memory_order_acquire)) {
return SHUTDOWN_ERROR;
auto status = meta_ptr_->CreateCollection(temp_schema);
if (options_.wal_enable_ && status.ok()) {
wal_mgr_->CreateCollection(collection_schema.collection_id_);
}
meta::CollectionSchema temp_schema = collection_schema;
temp_schema.index_file_size_ *= MB;
return meta_ptr_->CreateHybridCollection(temp_schema, fields_schema);
}
Status
DBImpl::DescribeHybridCollection(meta::CollectionSchema& collection_schema,
milvus::engine::meta::hybrid::FieldsSchema& fields_schema) {
if (!initialized_.load(std::memory_order_acquire)) {
return SHUTDOWN_ERROR;
}
auto stat = meta_ptr_->DescribeHybridCollection(collection_schema, fields_schema);
return stat;
return status;
}
Status
@ -479,8 +460,8 @@ DBImpl::PreloadCollection(const std::shared_ptr<server::Context>& context, const
}
auto json = milvus::json::parse(file.index_params_);
ExecutionEnginePtr engine =
EngineFactory::Build(file.dimension_, file.location_, engine_type, (MetricType)file.metric_type_, json);
ExecutionEnginePtr engine = EngineFactory::Build(file.dimension_, file.location_, engine_type,
(MetricType)file.metric_type_, json, file.updated_time_);
fiu_do_on("DBImpl.PreloadCollection.null_engine", engine = nullptr);
if (engine == nullptr) {
LOG_ENGINE_ERROR_ << "Invalid engine type";
@ -493,7 +474,7 @@ DBImpl::PreloadCollection(const std::shared_ptr<server::Context>& context, const
fiu_do_on("DBImpl.PreloadCollection.engine_throw_exception", throw std::exception());
std::string msg = "Pre-loaded file: " + file.file_id_ + " size: " + std::to_string(file.file_size_);
TimeRecorderAuto rc_1(msg);
status = engine->Load(true);
status = engine->Load(false, true);
if (!status.ok()) {
return status;
}
@ -552,8 +533,8 @@ DBImpl::ReleaseCollection(const std::shared_ptr<server::Context>& context, const
}
auto json = milvus::json::parse(file.index_params_);
ExecutionEnginePtr engine =
EngineFactory::Build(file.dimension_, file.location_, engine_type, (MetricType)file.metric_type_, json);
ExecutionEnginePtr engine = EngineFactory::Build(file.dimension_, file.location_, engine_type,
(MetricType)file.metric_type_, json, file.updated_time_);
if (engine == nullptr) {
LOG_ENGINE_ERROR_ << "Invalid engine type";
@ -574,7 +555,7 @@ DBImpl::ReLoadSegmentsDeletedDocs(const std::string& collection_id, const std::v
if (!initialized_.load(std::memory_order_acquire)) {
return SHUTDOWN_ERROR;
}
#if 0 // todo
meta::FilesHolder files_holder;
std::vector<size_t> file_ids;
for (auto& id : segment_ids) {
@ -624,7 +605,7 @@ DBImpl::ReLoadSegmentsDeletedDocs(const std::string& collection_id, const std::v
blacklist->set(i);
}
}
#endif
return Status::OK();
}
@ -655,11 +636,16 @@ DBImpl::CreatePartition(const std::string& collection_id, const std::string& par
uint64_t lsn = 0;
if (options_.wal_enable_) {
lsn = wal_mgr_->CreatePartition(collection_id, partition_tag);
lsn = wal_mgr_->GetLastAppliedLsn();
} else {
meta_ptr_->GetCollectionFlushLSN(collection_id, lsn);
}
return meta_ptr_->CreatePartition(collection_id, partition_name, partition_tag, lsn);
auto status = meta_ptr_->CreatePartition(collection_id, partition_name, partition_tag, lsn);
if (options_.wal_enable_ && status.ok()) {
wal_mgr_->CreatePartition(collection_id, partition_tag);
}
return status;
}
Status
@ -914,101 +900,6 @@ CopyToAttr(std::vector<uint8_t>& record, uint64_t row_num, const std::vector<std
return Status::OK();
}
Status
DBImpl::InsertEntities(const std::string& collection_id, const std::string& partition_tag,
const std::vector<std::string>& field_names, Entity& entity,
std::unordered_map<std::string, meta::hybrid::DataType>& attr_types) {
if (!initialized_.load(std::memory_order_acquire)) {
return SHUTDOWN_ERROR;
}
// Generate id
if (entity.id_array_.empty()) {
SafeIDGenerator& id_generator = SafeIDGenerator::GetInstance();
Status status = id_generator.GetNextIDNumbers(entity.entity_count_, entity.id_array_);
if (!status.ok()) {
return status;
}
}
Status status;
std::unordered_map<std::string, std::vector<uint8_t>> attr_data;
std::unordered_map<std::string, uint64_t> attr_nbytes;
std::unordered_map<std::string, uint64_t> attr_data_size;
status = CopyToAttr(entity.attr_value_, entity.entity_count_, field_names, attr_types, attr_data, attr_nbytes,
attr_data_size);
if (!status.ok()) {
return status;
}
wal::MXLogRecord record;
record.lsn = 0;
record.collection_id = collection_id;
record.partition_tag = partition_tag;
record.ids = entity.id_array_.data();
record.length = entity.entity_count_;
auto vector_it = entity.vector_data_.begin();
if (vector_it->second.binary_data_.empty()) {
record.type = wal::MXLogType::Entity;
record.data = vector_it->second.float_data_.data();
record.data_size = vector_it->second.float_data_.size() * sizeof(float);
} else {
// record.type = wal::MXLogType::InsertBinary;
// record.data = entities.vector_data_[0].binary_data_.data();
// record.length = entities.vector_data_[0].binary_data_.size() * sizeof(uint8_t);
}
status = ExecWalRecord(record);
#if 0
if (options_.wal_enable_) {
std::string target_collection_name;
status = GetPartitionByTag(collection_id, partition_tag, target_collection_name);
if (!status.ok()) {
LOG_ENGINE_ERROR_ << LogOut("[%s][%ld] Get partition fail: %s", "insert", 0, status.message().c_str());
return status;
}
auto vector_it = entity.vector_data_.begin();
if (!vector_it->second.binary_data_.empty()) {
wal_mgr_->InsertEntities(collection_id, partition_tag, entity.id_array_, vector_it->second.binary_data_,
attr_nbytes, attr_data);
} else if (!vector_it->second.float_data_.empty()) {
wal_mgr_->InsertEntities(collection_id, partition_tag, entity.id_array_, vector_it->second.float_data_,
attr_nbytes, attr_data);
}
swn_wal_.Notify();
} else {
// insert entities: collection_name is field id
wal::MXLogRecord record;
record.lsn = 0;
record.collection_id = collection_id;
record.partition_tag = partition_tag;
record.ids = entity.id_array_.data();
record.length = entity.entity_count_;
auto vector_it = entity.vector_data_.begin();
if (vector_it->second.binary_data_.empty()) {
record.type = wal::MXLogType::Entity;
record.data = vector_it->second.float_data_.data();
record.data_size = vector_it->second.float_data_.size() * sizeof(float);
record.attr_data = attr_data;
record.attr_nbytes = attr_nbytes;
record.attr_data_size = attr_data_size;
} else {
// record.type = wal::MXLogType::InsertBinary;
// record.data = entities.vector_data_[0].binary_data_.data();
// record.length = entities.vector_data_[0].binary_data_.size() * sizeof(uint8_t);
}
status = ExecWalRecord(record);
}
#endif
return status;
}
Status
DBImpl::DeleteVectors(const std::string& collection_id, const std::string& partition_tag, IDNumbers vector_ids) {
if (!initialized_.load(std::memory_order_acquire)) {
@ -1497,18 +1388,51 @@ DBImpl::GetVectorsByIdHelper(const IDNumbers& id_array, std::vector<engine::Vect
if (temp_ids.empty()) {
break; // all vectors found, no need to continue
}
// Load bloom filter
// SegmentReader
std::string segment_dir;
engine::utils::GetParentPath(file.location_, segment_dir);
segment::SegmentReader segment_reader(segment_dir);
segment::IdBloomFilterPtr id_bloom_filter_ptr;
auto status = segment_reader.LoadBloomFilter(id_bloom_filter_ptr);
if (!status.ok()) {
return status;
}
std::shared_ptr<std::vector<segment::doc_id_t>> uids_ptr = nullptr;
// uids_ptr
segment::UidsPtr uids_ptr = nullptr;
auto LoadUid = [&]() {
auto index = cache::CpuCacheMgr::GetInstance()->GetItem(file.location_);
if (index != nullptr) {
uids_ptr = std::static_pointer_cast<knowhere::VecIndex>(index)->GetUids();
return Status::OK();
}
return segment_reader.LoadUids(uids_ptr);
};
// deleted_docs_ptr
segment::DeletedDocsPtr deleted_docs_ptr = nullptr;
auto LoadDeleteDoc = [&]() { return segment_reader.LoadDeletedDocs(deleted_docs_ptr); };
// id_bloom_filter_ptr
segment::IdBloomFilterPtr id_bloom_filter_ptr;
auto status = segment_reader.LoadBloomFilter(id_bloom_filter_ptr, false);
fiu_do_on("DBImpl.GetVectorsByIdHelper.FailedToLoadBloomFilter",
(status = Status(DB_ERROR, ""), id_bloom_filter_ptr = nullptr));
if (!status.ok()) {
// Some accidents may cause the bloom filter file destroyed.
// If failed to load bloom filter, just to create a new one.
if (!(status = LoadUid()).ok()) {
return status;
}
if (!(status = LoadDeleteDoc()).ok()) {
return status;
}
codec::DefaultCodec default_codec;
default_codec.GetIdBloomFilterFormat()->create(uids_ptr->size(), id_bloom_filter_ptr);
id_bloom_filter_ptr->Add(*uids_ptr, deleted_docs_ptr->GetMutableDeletedDocs());
LOG_ENGINE_DEBUG_ << "A new bloom filter is created";
segment::SegmentWriter segment_writer(segment_dir);
segment_writer.WriteBloomFilter(id_bloom_filter_ptr);
}
for (size_t i = 0; i < temp_ids.size();) {
// each id must has a VectorsData
@ -1519,16 +1443,8 @@ DBImpl::GetVectorsByIdHelper(const IDNumbers& id_array, std::vector<engine::Vect
// Check if the id is present in bloom filter.
if (id_bloom_filter_ptr->Check(vector_id)) {
// Load uids and check if the id is indeed present. If yes, find its offset.
if (uids_ptr == nullptr) {
auto index = cache::CpuCacheMgr::GetInstance()->GetItem(file.location_);
if (index != nullptr) {
uids_ptr = std::static_pointer_cast<knowhere::VecIndex>(index)->GetUids();
} else {
status = segment_reader.LoadUids(uids_ptr);
if (!status.ok()) {
return status;
}
}
if (uids_ptr == nullptr && !(status = LoadUid()).ok()) {
return status;
}
auto found = std::find(uids_ptr->begin(), uids_ptr->end(), vector_id);
@ -1536,8 +1452,7 @@ DBImpl::GetVectorsByIdHelper(const IDNumbers& id_array, std::vector<engine::Vect
auto offset = std::distance(uids_ptr->begin(), found);
// Check whether the id has been deleted
if (!deleted_docs_ptr && !(status = segment_reader.LoadDeletedDocs(deleted_docs_ptr)).ok()) {
LOG_ENGINE_ERROR_ << status.message();
if (!deleted_docs_ptr && !(status = LoadDeleteDoc()).ok()) {
return status;
}
auto& deleted_docs = deleted_docs_ptr->GetDeletedDocs();
@ -1546,8 +1461,8 @@ DBImpl::GetVectorsByIdHelper(const IDNumbers& id_array, std::vector<engine::Vect
if (deleted == deleted_docs.end()) {
// Load raw vector
std::vector<uint8_t> raw_vector;
status =
segment_reader.LoadVectors(offset * single_vector_bytes, single_vector_bytes, raw_vector);
status = segment_reader.LoadsSingleVector(offset * single_vector_bytes, single_vector_bytes,
raw_vector);
if (!status.ok()) {
LOG_ENGINE_ERROR_ << status.message();
return status;
@ -2016,102 +1931,6 @@ DBImpl::StartMergeTask(const std::set<std::string>& merge_collection_ids, bool f
// LOG_ENGINE_DEBUG_ << "End StartMergeTask";
}
// Status
// DBImpl::MergeHybridFiles(const std::string& collection_id, meta::FilesHolder& files_holder) {
// // const std::lock_guard<std::mutex> lock(flush_merge_compact_mutex_);
//
// LOG_ENGINE_DEBUG_ << "Merge files for collection: " << collection_id;
//
// // step 1: create table file
// meta::SegmentSchema table_file;
// table_file.collection_id_ = collection_id;
// table_file.file_type_ = meta::SegmentSchema::NEW_MERGE;
// Status status = meta_ptr_->CreateHybridCollectionFile(table_file);
//
// if (!status.ok()) {
// LOG_ENGINE_ERROR_ << "Failed to create collection: " << status.ToString();
// return status;
// }
//
// // step 2: merge files
// /*
// ExecutionEnginePtr index =
// EngineFactory::Build(table_file.dimension_, table_file.location_, (EngineType)table_file.engine_type_,
// (MetricType)table_file.metric_type_, table_file.nlist_);
//*/
// meta::SegmentsSchema updated;
//
// std::string new_segment_dir;
// utils::GetParentPath(table_file.location_, new_segment_dir);
// auto segment_writer_ptr = std::make_shared<segment::SegmentWriter>(new_segment_dir);
//
// // attention: here is a copy, not reference, since files_holder.UnmarkFile will change the array internal
// milvus::engine::meta::SegmentsSchema files = files_holder.HoldFiles();
// for (auto& file : files) {
// server::CollectMergeFilesMetrics metrics;
// std::string segment_dir_to_merge;
// utils::GetParentPath(file.location_, segment_dir_to_merge);
// segment_writer_ptr->Merge(segment_dir_to_merge, table_file.file_id_);
//
// files_holder.UnmarkFile(file);
//
// auto file_schema = file;
// file_schema.file_type_ = meta::SegmentSchema::TO_DELETE;
// updated.push_back(file_schema);
// int64_t size = segment_writer_ptr->Size();
// if (size >= file_schema.index_file_size_) {
// break;
// }
// }
//
// // step 3: serialize to disk
// try {
// status = segment_writer_ptr->Serialize();
// fiu_do_on("DBImpl.MergeFiles.Serialize_ThrowException", throw std::exception());
// fiu_do_on("DBImpl.MergeFiles.Serialize_ErrorStatus", status = Status(DB_ERROR, ""));
// } catch (std::exception& ex) {
// std::string msg = "Serialize merged index encounter exception: " + std::string(ex.what());
// LOG_ENGINE_ERROR_ << msg;
// status = Status(DB_ERROR, msg);
// }
//
// if (!status.ok()) {
// LOG_ENGINE_ERROR_ << "Failed to persist merged segment: " << new_segment_dir << ". Error: " <<
// status.message();
//
// // if failed to serialize merge file to disk
// // typical error: out of disk space, out of memory or permission denied
// table_file.file_type_ = meta::SegmentSchema::TO_DELETE;
// status = meta_ptr_->UpdateCollectionFile(table_file);
// LOG_ENGINE_DEBUG_ << "Failed to update file to index, mark file: " << table_file.file_id_ << " to to_delete";
//
// return status;
// }
//
// // step 4: update table files state
// // if index type isn't IDMAP, set file type to TO_INDEX if file size exceed index_file_size
// // else set file type to RAW, no need to build index
// if (!utils::IsRawIndexType(table_file.engine_type_)) {
// table_file.file_type_ = (segment_writer_ptr->Size() >= (size_t)(table_file.index_file_size_))
// ? meta::SegmentSchema::TO_INDEX
// : meta::SegmentSchema::RAW;
// } else {
// table_file.file_type_ = meta::SegmentSchema::RAW;
// }
// table_file.file_size_ = segment_writer_ptr->Size();
// table_file.row_count_ = segment_writer_ptr->VectorCount();
// updated.push_back(table_file);
// status = meta_ptr_->UpdateCollectionFiles(updated);
// LOG_ENGINE_DEBUG_ << "New merged segment " << table_file.segment_id_ << " of size " << segment_writer_ptr->Size()
// << " bytes";
//
// if (options_.insert_cache_immediately_) {
// segment_writer_ptr->Cache();
// }
//
// return status;
//}
void
DBImpl::BackgroundMerge(std::set<std::string> collection_ids, bool force_merge_all) {
// LOG_ENGINE_TRACE_ << " Background merge thread start";

View File

@ -150,19 +150,6 @@ class DBImpl : public DB, public server::CacheConfigHandler, public server::Engi
Status
DropIndex(const std::string& collection_id) override;
Status
CreateHybridCollection(meta::CollectionSchema& collection_schema,
meta::hybrid::FieldsSchema& fields_schema) override;
Status
DescribeHybridCollection(meta::CollectionSchema& collection_schema,
meta::hybrid::FieldsSchema& fields_schema) override;
Status
InsertEntities(const std::string& collection_name, const std::string& partition_tag,
const std::vector<std::string>& field_names, engine::Entity& entity,
std::unordered_map<std::string, meta::hybrid::DataType>& field_types) override;
Status
QueryByIDs(const std::shared_ptr<server::Context>& context, const std::string& collection_id,
const std::vector<std::string>& partition_tags, uint64_t k, const milvus::json& extra_params,
@ -228,9 +215,6 @@ class DBImpl : public DB, public server::CacheConfigHandler, public server::Engi
void
BackgroundMerge(std::set<std::string> collection_ids, bool force_merge_all);
// Status
// MergeHybridFiles(const std::string& table_id, meta::FilesHolder& files_holder);
void
StartBuildIndexTask();

View File

@ -20,7 +20,7 @@ namespace engine {
ExecutionEnginePtr
EngineFactory::Build(uint16_t dimension, const std::string& location, EngineType index_type, MetricType metric_type,
const milvus::json& index_params) {
const milvus::json& index_params, int64_t time_stamp) {
if (index_type == EngineType::INVALID) {
LOG_ENGINE_ERROR_ << "Unsupported engine type";
return nullptr;
@ -28,32 +28,11 @@ EngineFactory::Build(uint16_t dimension, const std::string& location, EngineType
LOG_ENGINE_DEBUG_ << "EngineFactory index type: " << (int)index_type;
ExecutionEnginePtr execution_engine_ptr =
std::make_shared<ExecutionEngineImpl>(dimension, location, index_type, metric_type, index_params);
std::make_shared<ExecutionEngineImpl>(dimension, location, index_type, metric_type, index_params, time_stamp);
execution_engine_ptr->Init();
return execution_engine_ptr;
}
// ExecutionEnginePtr
// EngineFactory::Build(uint16_t dimension,
// const std::string& location,
// EngineType index_type,
// MetricType metric_type,
// std::unordered_map<std::string, DataType>& attr_type,
// const milvus::json& index_params) {
//
// if (index_type == EngineType::INVALID) {
// ENGINE_LOG_ERROR << "Unsupported engine type";
// return nullptr;
// }
//
// ENGINE_LOG_DEBUG << "EngineFactory index type: " << (int)index_type;
// ExecutionEnginePtr execution_engine_ptr =
// std::make_shared<ExecutionEngineImpl>(dimension, location, index_type, metric_type, attr_type, index_params);
//
// execution_engine_ptr->Init();
// return execution_engine_ptr;
//}
} // namespace engine
} // namespace milvus

View File

@ -24,15 +24,7 @@ class EngineFactory {
public:
static ExecutionEnginePtr
Build(uint16_t dimension, const std::string& location, EngineType index_type, MetricType metric_type,
const milvus::json& index_params);
// static ExecutionEnginePtr
// Build(uint16_t dimension,
// const std::string& location,
// EngineType index_type,
// MetricType metric_type,
// std::unordered_map<std::string, DataType>& attr_type,
// const milvus::json& index_params);
const milvus::json& index_params, int64_t time_stamp);
};
} // namespace engine

View File

@ -86,7 +86,7 @@ class ExecutionEngine {
Serialize() = 0;
virtual Status
Load(bool to_cache = true) = 0;
Load(bool load_blacklist, bool to_cache = true) = 0;
virtual Status
CopyToFpga() = 0;

View File

@ -153,22 +153,24 @@ class CachedQuantizer : public cache::DataObj {
#endif
ExecutionEngineImpl::ExecutionEngineImpl(uint16_t dimension, const std::string& location, EngineType index_type,
MetricType metric_type, const milvus::json& index_params)
MetricType metric_type, const milvus::json& index_params, int64_t time_stamp)
: location_(location),
dim_(dimension),
index_type_(index_type),
metric_type_(metric_type),
index_params_(index_params) {
index_params_(index_params),
time_stamp_(time_stamp) {
}
ExecutionEngineImpl::ExecutionEngineImpl(knowhere::VecIndexPtr index, const std::string& location,
EngineType index_type, MetricType metric_type,
const milvus::json& index_params)
const milvus::json& index_params, int64_t time_stamp)
: index_(std::move(index)),
location_(location),
index_type_(index_type),
metric_type_(metric_type),
index_params_(index_params) {
index_params_(index_params),
time_stamp_(time_stamp) {
}
knowhere::IndexMode
@ -381,13 +383,16 @@ ExecutionEngineImpl::Serialize() {
}
Status
ExecutionEngineImpl::Load(bool to_cache) {
index_ = std::static_pointer_cast<knowhere::VecIndex>(cache::CpuCacheMgr::GetInstance()->GetItem(location_));
ExecutionEngineImpl::Load(bool load_blacklist, bool to_cache) {
std::string segment_dir;
utils::GetParentPath(location_, segment_dir);
auto segment_reader_ptr = std::make_shared<segment::SegmentReader>(segment_dir);
auto cpu_cache_mgr = cache::CpuCacheMgr::GetInstance();
// step 1: Load index
index_ = std::static_pointer_cast<knowhere::VecIndex>(cpu_cache_mgr->GetItem(location_));
if (!index_) {
// not in the cache
std::string segment_dir;
utils::GetParentPath(location_, segment_dir);
auto segment_reader_ptr = std::make_shared<segment::SegmentReader>(segment_dir);
knowhere::VecIndexFactory& vec_index_factory = knowhere::VecIndexFactory::GetInstance();
if (utils::IsRawIndexType((int32_t)index_type_)) {
@ -405,18 +410,14 @@ ExecutionEngineImpl::Load(bool to_cache) {
throw Exception(DB_ERROR, "Illegal index params");
}
auto status = segment_reader_ptr->Load();
segment::VectorsPtr vectors = nullptr;
auto status = segment_reader_ptr->LoadsVectors(vectors);
if (!status.ok()) {
std::string msg = "Failed to load segment from " + location_;
std::string msg = "Failed to load vectors from " + location_;
LOG_ENGINE_ERROR_ << msg;
return Status(DB_ERROR, msg);
}
segment::SegmentPtr segment_ptr;
segment_reader_ptr->GetSegment(segment_ptr);
auto& vectors = segment_ptr->vectors_ptr_;
auto& deleted_docs = segment_ptr->deleted_docs_ptr_->GetDeletedDocs();
auto& vectors_uids = vectors->GetMutableUids();
std::shared_ptr<std::vector<int64_t>> vector_uids_ptr = std::make_shared<std::vector<int64_t>>();
vector_uids_ptr->swap(vectors_uids);
@ -424,28 +425,16 @@ ExecutionEngineImpl::Load(bool to_cache) {
LOG_ENGINE_DEBUG_ << "set uids " << vector_uids_ptr->size() << " for index " << location_;
auto& vectors_data = vectors->GetData();
auto count = vector_uids_ptr->size();
faiss::ConcurrentBitsetPtr concurrent_bitset_ptr = nullptr;
if (!deleted_docs.empty()) {
concurrent_bitset_ptr = std::make_shared<faiss::ConcurrentBitset>(count);
for (auto& offset : deleted_docs) {
concurrent_bitset_ptr->set(offset);
}
}
auto dataset = knowhere::GenDataset(count, this->dim_, vectors_data.data());
if (index_type_ == EngineType::FAISS_IDMAP) {
auto bf_index = std::static_pointer_cast<knowhere::IDMAP>(index_);
bf_index->Train(knowhere::DatasetPtr(), conf);
bf_index->AddWithoutIds(dataset, conf);
bf_index->SetBlacklist(concurrent_bitset_ptr);
} else if (index_type_ == EngineType::FAISS_BIN_IDMAP) {
auto bin_bf_index = std::static_pointer_cast<knowhere::BinaryIDMAP>(index_);
bin_bf_index->Train(knowhere::DatasetPtr(), conf);
bin_bf_index->AddWithoutIds(dataset, conf);
bin_bf_index->SetBlacklist(concurrent_bitset_ptr);
}
LOG_ENGINE_DEBUG_ << "Finished loading raw data from segment " << segment_dir;
@ -455,38 +444,18 @@ ExecutionEngineImpl::Load(bool to_cache) {
segment_reader_ptr->GetSegment(segment_ptr);
auto status = segment_reader_ptr->LoadVectorIndex(location_, segment_ptr->vector_index_ptr_);
index_ = segment_ptr->vector_index_ptr_->GetVectorIndex();
if (index_ == nullptr) {
std::string msg = "Failed to load index from " + location_;
LOG_ENGINE_ERROR_ << msg;
return Status(DB_ERROR, msg);
} else {
segment::DeletedDocsPtr deleted_docs_ptr;
auto status = segment_reader_ptr->LoadDeletedDocs(deleted_docs_ptr);
if (!status.ok()) {
std::string msg = "Failed to load deleted docs from " + location_;
LOG_ENGINE_ERROR_ << msg;
return Status(DB_ERROR, msg);
}
auto& deleted_docs = deleted_docs_ptr->GetDeletedDocs();
faiss::ConcurrentBitsetPtr concurrent_bitset_ptr = nullptr;
if (!deleted_docs.empty()) {
concurrent_bitset_ptr = std::make_shared<faiss::ConcurrentBitset>(index_->Count());
for (auto& offset : deleted_docs) {
if (!concurrent_bitset_ptr->test(offset)) {
concurrent_bitset_ptr->set(offset);
}
}
}
index_->SetBlacklist(concurrent_bitset_ptr);
segment::UidsPtr uids_ptr = nullptr;
segment_reader_ptr->LoadUids(uids_ptr);
index_->SetUids(uids_ptr);
LOG_ENGINE_DEBUG_ << "set uids " << index_->GetUids()->size() << " for index " << location_;
LOG_ENGINE_DEBUG_ << "Finished loading index file from segment " << segment_dir;
}
segment::UidsPtr uids_ptr = nullptr;
segment_reader_ptr->LoadUids(uids_ptr);
index_->SetUids(uids_ptr);
LOG_ENGINE_DEBUG_ << "set uids " << index_->GetUids()->size() << " for index " << location_;
LOG_ENGINE_DEBUG_ << "Finished loading index file from segment " << segment_dir;
} catch (std::exception& e) {
LOG_ENGINE_ERROR_ << e.what();
return Status(DB_ERROR, e.what());
@ -494,12 +463,54 @@ ExecutionEngineImpl::Load(bool to_cache) {
}
if (to_cache) {
Cache();
cpu_cache_mgr->InsertItem(location_, index_);
}
}
// step 2: Load blacklist
if (load_blacklist) {
auto blacklist_cache_key = segment_dir + cache::Blacklist_Suffix;
blacklist_ = std::static_pointer_cast<knowhere::Blacklist>(cpu_cache_mgr->GetItem(blacklist_cache_key));
bool cache_miss = true;
if (blacklist_ != nullptr) {
if (blacklist_->time_stamp_ == time_stamp_) {
cache_miss = false;
} else {
LOG_ENGINE_DEBUG_ << "Mismatched time stamp " << blacklist_->time_stamp_ << " < " << time_stamp_;
}
}
if (cache_miss) {
segment::DeletedDocsPtr deleted_docs_ptr;
auto status = segment_reader_ptr->LoadDeletedDocs(deleted_docs_ptr);
if (!status.ok()) {
std::string msg = "Failed to load deleted docs from " + location_;
LOG_ENGINE_ERROR_ << msg;
return Status(DB_ERROR, msg);
}
auto& deleted_docs = deleted_docs_ptr->GetDeletedDocs();
blacklist_ = std::make_shared<knowhere::Blacklist>();
blacklist_->time_stamp_ = time_stamp_;
if (!deleted_docs.empty()) {
auto concurrent_bitset_ptr = std::make_shared<faiss::ConcurrentBitset>(index_->Count());
for (auto& offset : deleted_docs) {
concurrent_bitset_ptr->set(offset);
}
blacklist_->bitset_ = concurrent_bitset_ptr;
}
LOG_ENGINE_DEBUG_ << "Finished loading blacklist_ deleted docs size " << deleted_docs.size();
if (to_cache) {
cpu_cache_mgr->InsertItem(blacklist_cache_key, blacklist_);
}
}
}
return Status::OK();
} // namespace engine
}
Status
ExecutionEngineImpl::CopyToGpu(uint64_t device_id, bool hybrid) {
@ -508,6 +519,7 @@ ExecutionEngineImpl::CopyToGpu(uint64_t device_id, bool hybrid) {
auto index = std::static_pointer_cast<knowhere::VecIndex>(data_obj_ptr);
bool already_in_cache = (index != nullptr);
if (already_in_cache) {
LOG_ENGINE_DEBUG_ << "ExecutionEngineImpl::CopyToGpu: already_in_cache in gpu" << device_id;
index_ = index;
} else {
if (index_ == nullptr) {
@ -541,6 +553,7 @@ ExecutionEngineImpl::CopyToGpu(uint64_t device_id, bool hybrid) {
} else {
if (gpu_cache_enable) {
gpu_cache_mgr->InsertItem(location_, std::static_pointer_cast<cache::DataObj>(index_));
LOG_ENGINE_DEBUG_ << "ExecutionEngineImpl::CopyToGpu: Gpu cache in device " << device_id;
}
LOG_ENGINE_DEBUG_ << "CPU to GPU" << device_id << " finished";
}
@ -655,12 +668,10 @@ ExecutionEngineImpl::BuildIndex(const std::string& location, EngineType engine_t
auto dataset = knowhere::GenDataset(Count(), Dimension(), from_index->GetRawVectors());
to_index->BuildAll(dataset, conf);
uids = from_index->GetUids();
blacklist = from_index->GetBlacklist();
} else if (bin_from_index) {
auto dataset = knowhere::GenDataset(Count(), Dimension(), bin_from_index->GetRawVectors());
to_index->BuildAll(dataset, conf);
uids = bin_from_index->GetUids();
blacklist = bin_from_index->GetBlacklist();
}
#ifdef MILVUS_GPU_VERSION
@ -673,12 +684,10 @@ ExecutionEngineImpl::BuildIndex(const std::string& location, EngineType engine_t
to_index->SetUids(uids);
LOG_ENGINE_DEBUG_ << "Set " << to_index->UidsSize() << "uids for " << location;
if (blacklist != nullptr) {
to_index->SetBlacklist(blacklist);
LOG_ENGINE_DEBUG_ << "Set blacklist for index " << location;
}
LOG_ENGINE_DEBUG_ << "Finish build index: " << location;
return std::make_shared<ExecutionEngineImpl>(to_index, location, engine_type, metric_type_, index_params_);
return std::make_shared<ExecutionEngineImpl>(to_index, location, engine_type, metric_type_, index_params_,
time_stamp_);
}
void
@ -719,7 +728,7 @@ ExecutionEngineImpl::Search(int64_t n, const float* data, int64_t k, const milvu
rc.RecordSection("query prepare");
auto dataset = knowhere::GenDataset(n, index_->Dim(), data);
auto result = index_->Query(dataset, conf);
auto result = index_->Query(dataset, conf, (blacklist_ ? blacklist_->bitset_ : nullptr));
rc.RecordSection("query done");
LOG_ENGINE_DEBUG_ << LogOut("[%s][%ld] get %ld uids from index %s", "search", 0, index_->GetUids()->size(),
@ -760,7 +769,7 @@ ExecutionEngineImpl::Search(int64_t n, const uint8_t* data, int64_t k, const mil
rc.RecordSection("query prepare");
auto dataset = knowhere::GenDataset(n, index_->Dim(), data);
auto result = index_->Query(dataset, conf);
auto result = index_->Query(dataset, conf, (blacklist_ ? blacklist_->bitset_ : nullptr));
rc.RecordSection("query done");
LOG_ENGINE_DEBUG_ << LogOut("[%s][%ld] get %ld uids from index %s", "search", 0, index_->GetUids()->size(),
@ -778,8 +787,15 @@ ExecutionEngineImpl::Search(int64_t n, const uint8_t* data, int64_t k, const mil
Status
ExecutionEngineImpl::Cache() {
auto cpu_cache_mgr = milvus::cache::CpuCacheMgr::GetInstance();
cache::DataObjPtr obj = std::static_pointer_cast<cache::DataObj>(index_);
cpu_cache_mgr->InsertItem(location_, obj);
if (index_) {
cpu_cache_mgr->InsertItem(location_, index_);
}
if (blacklist_) {
std::string segment_dir;
utils::GetParentPath(location_, segment_dir);
cpu_cache_mgr->InsertItem(segment_dir + cache::Blacklist_Suffix, blacklist_);
}
return Status::OK();
}

View File

@ -28,10 +28,10 @@ namespace engine {
class ExecutionEngineImpl : public ExecutionEngine {
public:
ExecutionEngineImpl(uint16_t dimension, const std::string& location, EngineType index_type, MetricType metric_type,
const milvus::json& index_params);
const milvus::json& index_params, int64_t time_stamp);
ExecutionEngineImpl(knowhere::VecIndexPtr index, const std::string& location, EngineType index_type,
MetricType metric_type, const milvus::json& index_params);
MetricType metric_type, const milvus::json& index_params, int64_t time_stamp);
size_t
Count() const override;
@ -46,7 +46,7 @@ class ExecutionEngineImpl : public ExecutionEngine {
Serialize() override;
Status
Load(bool to_cache) override;
Load(bool load_blacklist, bool to_cache) override;
Status
CopyToGpu(uint64_t device_id, bool hybrid = false) override;
@ -125,6 +125,7 @@ class ExecutionEngineImpl : public ExecutionEngine {
HybridUnset() const;
protected:
knowhere::BlacklistPtr blacklist_ = nullptr;
knowhere::VecIndexPtr index_ = nullptr;
#ifdef MILVUS_GPU_VERSION
knowhere::VecIndexPtr index_reserve_ = nullptr; // reserve the cpu index before copying it to gpu
@ -134,12 +135,10 @@ class ExecutionEngineImpl : public ExecutionEngine {
EngineType index_type_;
MetricType metric_type_;
int64_t vector_count_;
milvus::json index_params_;
int64_t gpu_num_ = 0;
bool gpu_cache_enable_ = false;
int64_t time_stamp_;
};
} // namespace engine

View File

@ -16,6 +16,7 @@
#include <utility>
#include "cache/CpuCacheMgr.h"
#include "codecs/default/DefaultCodec.h"
#include "db/Utils.h"
#include "db/insert/MemTable.h"
#include "db/meta/FilesHolder.h"
@ -185,19 +186,18 @@ MemTable::GetCurrentMem() {
Status
MemTable::ApplyDeletes() {
// Applying deletes to other segments on disk and their corresponding cache:
// Applying deletes to other segments on disk:
// For each segment in collection:
// Load its bloom filter
// For each id in delete list:
// If present, add the uid to segment's uid list
// For each segment
// Get its cache if exists
// Load its uids file.
// Scan the uids, if any uid in segment's uid list exists:
// If present, add the uid to segment's delete list
// if segment delete list is empty
// continue
// Load its uids and deleted docs file
// Scan the uids, if any un-deleted uid in segment's delete list
// add its offset to deletedDoc
// remove the id from bloom filter
// set black list in cache
// Serialize segment's deletedDoc TODO(zhiru): append directly to previous file for now, may have duplicates
// Serialize segment's deletedDoc
// Serialize bloom filter
LOG_ENGINE_DEBUG_ << "Applying " << doc_ids_to_delete_.size() << " deletes in collection: " << collection_id_;
@ -217,155 +217,143 @@ MemTable::ApplyDeletes() {
// attention: here is a copy, not reference, since files_holder.UnmarkFile will change the array internal
milvus::engine::meta::SegmentsSchema files = files_holder.HoldFiles();
// which file need to be apply delete
std::vector<std::pair<segment::IdBloomFilterPtr, std::vector<segment::doc_id_t>>> ids_check_pair;
ids_check_pair.resize(files.size());
meta::SegmentsSchema files_to_update;
size_t unmark_file_cnt = 0;
for (size_t file_i = 0; file_i < files.size(); file_i++) {
auto& file = files[file_i];
auto& id_bloom_filter_ptr = ids_check_pair[file_i].first;
auto& ids_to_check = ids_check_pair[file_i].second;
ids_to_check.reserve(doc_ids_to_delete_.size());
for (auto& file : files) {
LOG_ENGINE_DEBUG_ << "Applying deletes in segment: " << file.segment_id_;
segment::IdBloomFilterPtr id_bloom_filter_ptr = nullptr;
segment::UidsPtr uids_ptr = nullptr;
segment::DeletedDocsPtr deleted_docs_ptr = nullptr;
std::vector<segment::doc_id_t> ids_to_check;
TimeRecorder rec("handle segment " + file.segment_id_);
// segment reader
std::string segment_dir;
utils::GetParentPath(file.location_, segment_dir);
segment::SegmentReader segment_reader(segment_dir);
segment_reader.LoadBloomFilter(id_bloom_filter_ptr);
// prepare segment_files
meta::FilesHolder segment_holder;
status = meta_->GetCollectionFilesBySegmentId(file.segment_id_, segment_holder);
if (!status.ok()) {
break;
}
milvus::engine::meta::SegmentsSchema& segment_files = segment_holder.HoldFiles();
// Lamda: LoadUid
auto LoadUid = [&]() {
for (auto& segment_file : segment_files) {
auto data_obj_ptr = cache::CpuCacheMgr::GetInstance()->GetItem(segment_file.location_);
auto index = std::static_pointer_cast<knowhere::VecIndex>(data_obj_ptr);
if (index != nullptr) {
uids_ptr = index->GetUids();
return Status::OK();
}
}
return segment_reader.LoadUids(uids_ptr);
};
// Lamda: LoadDeleteDoc
auto LoadDeleteDoc = [&]() { return segment_reader.LoadDeletedDocs(deleted_docs_ptr); };
// load bloom filter
status = segment_reader.LoadBloomFilter(id_bloom_filter_ptr, true);
if (!status.ok()) {
// Some accidents may cause the bloom filter file destroyed.
// If failed to load bloom filter, just to create a new one.
if (!(status = LoadUid()).ok()) {
return status;
}
if (!(status = LoadDeleteDoc()).ok()) {
return status;
}
codec::DefaultCodec default_codec;
default_codec.GetIdBloomFilterFormat()->create(uids_ptr->size(), id_bloom_filter_ptr);
id_bloom_filter_ptr->Add(*uids_ptr, deleted_docs_ptr->GetMutableDeletedDocs());
LOG_ENGINE_DEBUG_ << "A new bloom filter is created";
segment::SegmentWriter segment_writer(segment_dir);
segment_writer.WriteBloomFilter(id_bloom_filter_ptr);
}
// check ids by bloom filter
for (auto& id : doc_ids_to_delete_) {
if (id_bloom_filter_ptr->Check(id)) {
ids_to_check.emplace_back(id);
}
}
rec.RecordSection("bloom filter check end, segment delete list cnt " + std::to_string(ids_to_check.size()));
// release unused files
if (ids_to_check.empty()) {
id_bloom_filter_ptr = nullptr;
files_holder.UnmarkFile(file);
++unmark_file_cnt;
}
}
recorder.RecordSection("Found " + std::to_string(files.size() - unmark_file_cnt) + " segment to apply deletes");
meta::SegmentsSchema files_to_update;
for (size_t file_i = 0; file_i < files.size(); file_i++) {
auto& file = files[file_i];
auto& id_bloom_filter_ptr = ids_check_pair[file_i].first;
auto& ids_to_check = ids_check_pair[file_i].second;
if (id_bloom_filter_ptr == nullptr) {
continue;
}
LOG_ENGINE_DEBUG_ << "Applying deletes in segment: " << file.segment_id_;
// Load its uids and deleted docs file
if (uids_ptr == nullptr && !(status = LoadUid()).ok()) {
return status;
}
if (deleted_docs_ptr == nullptr && !(status = LoadDeleteDoc()).ok()) {
return status;
}
auto& deleted_docs = deleted_docs_ptr->GetMutableDeletedDocs();
TimeRecorder rec("handle segment " + file.segment_id_);
rec.RecordSection("load uids and deleted docs");
auto& segment_id = file.segment_id_;
meta::FilesHolder segment_holder;
status = meta_->GetCollectionFilesBySegmentId(segment_id, segment_holder);
if (!status.ok()) {
break;
// sort ids_to_check
bool ids_sorted = false;
if (ids_to_check.size() >= 64) {
std::sort(ids_to_check.begin(), ids_to_check.end());
ids_sorted = true;
rec.RecordSection("Sorting " + std::to_string(ids_to_check.size()) + " ids");
}
segment::UidsPtr uids_ptr = nullptr;
// Get all index that contains blacklist in cache
std::vector<knowhere::VecIndexPtr> indexes;
std::vector<faiss::ConcurrentBitsetPtr> blacklists;
milvus::engine::meta::SegmentsSchema& segment_files = segment_holder.HoldFiles();
for (auto& segment_file : segment_files) {
auto data_obj_ptr = cache::CpuCacheMgr::GetInstance()->GetItem(segment_file.location_);
auto index = std::static_pointer_cast<knowhere::VecIndex>(data_obj_ptr);
if (index != nullptr) {
faiss::ConcurrentBitsetPtr blacklist = index->GetBlacklist();
if (blacklist == nullptr) {
// to update and set the blacklist
blacklist = std::make_shared<faiss::ConcurrentBitset>(index->Count());
indexes.emplace_back(index);
blacklists.emplace_back(blacklist);
} else {
// just to update the blacklist
indexes.emplace_back(nullptr);
blacklists.emplace_back(blacklist);
}
// load uids from cache
uids_ptr = index->GetUids();
}
}
std::string segment_dir;
utils::GetParentPath(file.location_, segment_dir);
if (uids_ptr == nullptr) {
// load uids from disk
segment::SegmentReader segment_reader(segment_dir);
status = segment_reader.LoadUids(uids_ptr);
if (!status.ok()) {
return status;
}
}
segment::DeletedDocsPtr deleted_docs = std::make_shared<segment::DeletedDocs>();
rec.RecordSection("Loading uids and deleted docs");
std::sort(ids_to_check.begin(), ids_to_check.end());
rec.RecordSection("Sorting " + std::to_string(ids_to_check.size()) + " ids");
auto find_diff = std::chrono::duration<double>::zero();
auto set_diff = std::chrono::duration<double>::zero();
// for each id
int64_t segment_deleted_count = 0;
for (size_t i = 0; i < uids_ptr->size(); ++i) {
auto find_start = std::chrono::high_resolution_clock::now();
auto found = std::binary_search(ids_to_check.begin(), ids_to_check.end(), (*uids_ptr)[i]);
auto find_end = std::chrono::high_resolution_clock::now();
find_diff += (find_end - find_start);
if (found) {
auto set_start = std::chrono::high_resolution_clock::now();
deleted_docs->AddDeletedDoc(i);
id_bloom_filter_ptr->Remove((*uids_ptr)[i]);
for (auto& blacklist : blacklists) {
blacklist->set(i);
if (std::find(deleted_docs.begin(), deleted_docs.end(), i) != deleted_docs.end()) {
continue;
}
if (ids_sorted) {
if (!std::binary_search(ids_to_check.begin(), ids_to_check.end(), (*uids_ptr)[i])) {
continue;
}
} else {
if (std::find(ids_to_check.begin(), ids_to_check.end(), (*uids_ptr)[i]) == ids_to_check.end()) {
continue;
}
auto set_end = std::chrono::high_resolution_clock::now();
set_diff += (set_end - set_start);
}
// delete
id_bloom_filter_ptr->Remove((*uids_ptr)[i]);
deleted_docs.push_back(i);
segment_deleted_count++;
}
LOG_ENGINE_DEBUG_ << "Finding " << ids_to_check.size() << " uids in " << uids_ptr->size() << " uids took "
<< find_diff.count() << " s in total";
LOG_ENGINE_DEBUG_ << "Setting deleted docs and bloom filter took " << set_diff.count() << " s in total";
rec.RecordSection("Find uids and set deleted docs and bloom filter, append " +
std::to_string(segment_deleted_count) + " offsets");
rec.RecordSection("Find uids and set deleted docs and bloom filter");
if (deleted_docs->GetSize() == 0) {
if (segment_deleted_count == 0) {
LOG_ENGINE_DEBUG_ << "deleted_docs does not need to be updated";
files_holder.UnmarkFile(file);
continue;
}
for (size_t i = 0; i < indexes.size(); ++i) {
if (indexes[i]) {
indexes[i]->SetBlacklist(blacklists[i]);
}
}
segment::Segment tmp_segment;
segment::SegmentWriter segment_writer(segment_dir);
status = segment_writer.WriteDeletedDocs(deleted_docs);
status = segment_writer.WriteDeletedDocs(deleted_docs_ptr);
if (!status.ok()) {
break;
}
rec.RecordSection("Appended " + std::to_string(deleted_docs->GetSize()) + " offsets to deleted docs");
rec.RecordSection("Updated deleted docs");
status = segment_writer.WriteBloomFilter(id_bloom_filter_ptr);
if (!status.ok()) {
@ -380,14 +368,14 @@ MemTable::ApplyDeletes() {
segment_file.file_type_ == meta::SegmentSchema::TO_INDEX ||
segment_file.file_type_ == meta::SegmentSchema::INDEX ||
segment_file.file_type_ == meta::SegmentSchema::BACKUP) {
segment_file.row_count_ -= deleted_docs->GetSize();
segment_file.row_count_ -= segment_deleted_count;
files_to_update.emplace_back(segment_file);
}
}
rec.RecordSection("Update collection file row count in vector");
}
recorder.RecordSection("Finished " + std::to_string(files.size() - unmark_file_cnt) + " segment to apply deletes");
recorder.RecordSection("Finished " + std::to_string(files.size()) + " segment to apply deletes");
status = meta_->UpdateCollectionFilesRowCount(files_to_update);

View File

@ -21,7 +21,7 @@
namespace milvus {
namespace engine {
const int64_t FORCE_MERGE_THREASHOLD = 30; // force merge files older this time(in second)
const int64_t FORCE_MERGE_THREASHOLD = 300; // force merge files older this time(in second)
Status
MergeLayeredStrategy::RegroupFiles(meta::FilesHolder& files_holder, MergeFilesGroups& files_groups) {
@ -33,6 +33,9 @@ MergeLayeredStrategy::RegroupFiles(meta::FilesHolder& files_holder, MergeFilesGr
{1UL << 26, meta::SegmentsSchema()}, // 64MB
{1UL << 28, meta::SegmentsSchema()}, // 256MB
{1UL << 30, meta::SegmentsSchema()}, // 1GB
{1UL << 32, meta::SegmentsSchema()}, // 4GB
{1UL << 34, meta::SegmentsSchema()}, // 16GB
{1UL << 36, meta::SegmentsSchema()}, // 64GB
};
meta::SegmentsSchema sort_files = files_holder.HoldFiles();

View File

@ -171,12 +171,6 @@ class Meta {
virtual Status
GetGlobalLastLSN(uint64_t& lsn) = 0;
virtual Status
CreateHybridCollection(CollectionSchema& collection_schema, hybrid::FieldsSchema& fields_schema) = 0;
virtual Status
DescribeHybridCollection(CollectionSchema& collection_schema, hybrid::FieldsSchema& fields_schema) = 0;
}; // MetaData
using MetaPtr = std::shared_ptr<Meta>;

View File

@ -1744,7 +1744,7 @@ MySQLMetaImpl::FilesToSearchEx(const std::string& root_collection, const std::se
<< " OR file_type = " << std::to_string(SegmentSchema::TO_INDEX)
<< " OR file_type = " << std::to_string(SegmentSchema::INDEX) << ");";
LOG_ENGINE_DEBUG_ << "FilesToSearch: " << statement.str();
LOG_ENGINE_DEBUG_ << "FilesToSearchEx: " << statement.str();
res = statement.store();
} // Scoped Connection
@ -1900,7 +1900,7 @@ MySQLMetaImpl::FilesToIndex(FilesHolder& files_holder) {
<< " FROM " << META_TABLEFILES << " WHERE file_type = " << std::to_string(SegmentSchema::TO_INDEX)
<< ";";
// LOG_ENGINE_DEBUG_ << "FilesToIndex: " << statement.str();
// LOG_ENGINE_DEBUG_ << "FilesToIndex: " << statement.str();
res = statement.store();
} // Scoped Connection
@ -2162,7 +2162,7 @@ MySQLMetaImpl::FilesByTypeEx(const std::vector<meta::CollectionSchema>& collecti
}
statement << ") AND file_type in (" << types << ");";
LOG_ENGINE_DEBUG_ << "FilesByType: " << statement.str();
LOG_ENGINE_DEBUG_ << "FilesByTypeEx: " << statement.str();
res = statement.store();
} // Scoped Connection
@ -2295,7 +2295,7 @@ MySQLMetaImpl::FilesByID(const std::vector<size_t>& ids, FilesHolder& files_hold
statement << " WHERE (" << idStr << ")";
LOG_ENGINE_DEBUG_ << "FilesToSearch: " << statement.str();
LOG_ENGINE_DEBUG_ << "FilesByID: " << statement.str();
res = statement.store();
} // Scoped Connection
@ -2473,7 +2473,7 @@ MySQLMetaImpl::CleanUpShadowFiles() {
<< " WHERE table_schema = " << mysqlpp::quote << mysql_connection_pool_->db_name()
<< " AND table_name = " << mysqlpp::quote << META_TABLEFILES << ";";
LOG_ENGINE_DEBUG_ << "CleanUp: " << statement.str();
LOG_ENGINE_DEBUG_ << "CleanUpShadowFiles: " << statement.str();
mysqlpp::StoreQueryResult res = statement.store();
@ -2965,183 +2965,6 @@ MySQLMetaImpl::GetGlobalLastLSN(uint64_t& lsn) {
return Status::OK();
}
Status
MySQLMetaImpl::CreateHybridCollection(CollectionSchema& collection_schema, hybrid::FieldsSchema& fields_schema) {
try {
server::MetricCollector metric;
{
mysqlpp::ScopedConnection connectionPtr(*mysql_connection_pool_, safe_grab_);
bool is_null_connection = (connectionPtr == nullptr);
fiu_do_on("MySQLMetaImpl.CreateCollection.null_connection", is_null_connection = true);
fiu_do_on("MySQLMetaImpl.CreateCollection.throw_exception", throw std::exception(););
if (is_null_connection) {
return Status(DB_ERROR, "Failed to connect to meta server(mysql)");
}
mysqlpp::Query statement = connectionPtr->query();
if (collection_schema.collection_id_.empty()) {
NextCollectionId(collection_schema.collection_id_);
} else {
statement << "SELECT state FROM " << META_TABLES << " WHERE table_id = " << mysqlpp::quote
<< collection_schema.collection_id_ << ";";
LOG_ENGINE_DEBUG_ << "CreateCollection: " << statement.str();
mysqlpp::StoreQueryResult res = statement.store();
if (res.num_rows() == 1) {
int state = res[0]["state"];
fiu_do_on("MySQLMetaImpl.CreateCollection.schema_TO_DELETE", state = CollectionSchema::TO_DELETE);
if (CollectionSchema::TO_DELETE == state) {
return Status(DB_ERROR,
"Collection already exists and it is in delete state, please wait a second");
} else {
return Status(DB_ALREADY_EXIST, "Collection already exists");
}
}
}
collection_schema.id_ = -1;
collection_schema.created_on_ = utils::GetMicroSecTimeStamp();
std::string id = "NULL"; // auto-increment
std::string& collection_id = collection_schema.collection_id_;
std::string state = std::to_string(collection_schema.state_);
std::string dimension = std::to_string(collection_schema.dimension_);
std::string created_on = std::to_string(collection_schema.created_on_);
std::string flag = std::to_string(collection_schema.flag_);
std::string index_file_size = std::to_string(collection_schema.index_file_size_);
std::string engine_type = std::to_string(collection_schema.engine_type_);
std::string& index_params = collection_schema.index_params_;
std::string metric_type = std::to_string(collection_schema.metric_type_);
std::string& owner_collection = collection_schema.owner_collection_;
std::string& partition_tag = collection_schema.partition_tag_;
std::string& version = collection_schema.version_;
std::string flush_lsn = std::to_string(collection_schema.flush_lsn_);
statement << "INSERT INTO " << META_TABLES << " VALUES(" << id << ", " << mysqlpp::quote << collection_id
<< ", " << state << ", " << dimension << ", " << created_on << ", " << flag << ", "
<< index_file_size << ", " << engine_type << ", " << mysqlpp::quote << index_params << ", "
<< metric_type << ", " << mysqlpp::quote << owner_collection << ", " << mysqlpp::quote
<< partition_tag << ", " << mysqlpp::quote << version << ", " << flush_lsn << ");";
LOG_ENGINE_DEBUG_ << "CreateHybridCollection: " << statement.str();
if (mysqlpp::SimpleResult res = statement.execute()) {
collection_schema.id_ = res.insert_id(); // Might need to use SELECT LAST_INSERT_ID()?
// Consume all results to avoid "Commands out of sync" error
} else {
return HandleException("Failed to create collection", statement.error());
}
for (auto schema : fields_schema.fields_schema_) {
std::string id = "NULL";
std::string collection_id = schema.collection_id_;
std::string field_name = schema.field_name_;
std::string field_type = std::to_string(schema.field_type_);
std::string field_params = schema.field_params_;
statement << "INSERT INTO " << META_FIELDS << " VALUES(" << mysqlpp::quote << collection_id << ", "
<< mysqlpp::quote << field_name << ", " << field_type << ", " << mysqlpp::quote << ", "
<< field_params << ");";
LOG_ENGINE_DEBUG_ << "Create field: " << statement.str();
if (mysqlpp::SimpleResult field_res = statement.execute()) {
// TODO(yukun): need field id?
} else {
return HandleException("Failed to create field table", statement.error());
}
}
} // Scoped Connection
LOG_ENGINE_DEBUG_ << "Successfully create hybrid collection: " << collection_schema.collection_id_;
std::cout << collection_schema.collection_id_;
return utils::CreateCollectionPath(options_, collection_schema.collection_id_);
} catch (std::exception& e) {
return HandleException("Failed to create collection", e.what());
}
}
Status
MySQLMetaImpl::DescribeHybridCollection(CollectionSchema& collection_schema, hybrid::FieldsSchema& fields_schema) {
try {
server::MetricCollector metric;
mysqlpp::StoreQueryResult res, field_res;
{
mysqlpp::ScopedConnection connectionPtr(*mysql_connection_pool_, safe_grab_);
bool is_null_connection = (connectionPtr == nullptr);
fiu_do_on("MySQLMetaImpl.DescribeCollection.null_connection", is_null_connection = true);
fiu_do_on("MySQLMetaImpl.DescribeCollection.throw_exception", throw std::exception(););
if (is_null_connection) {
return Status(DB_ERROR, "Failed to connect to meta server(mysql)");
}
mysqlpp::Query statement = connectionPtr->query();
statement << "SELECT id, state, dimension, created_on, flag, index_file_size, engine_type, index_params"
<< " , metric_type ,owner_table, partition_tag, version, flush_lsn"
<< " FROM " << META_TABLES << " WHERE table_id = " << mysqlpp::quote
<< collection_schema.collection_id_ << " AND state <> "
<< std::to_string(CollectionSchema::TO_DELETE) << ";";
LOG_ENGINE_DEBUG_ << "DescribeHybridCollection: " << statement.str();
res = statement.store();
mysqlpp::Query field_statement = connectionPtr->query();
field_statement << "SELECT collection_id, field_name, field_type, field_params"
<< " FROM " << META_FIELDS << " WHERE collection_id = " << mysqlpp::quote
<< collection_schema.collection_id_ << ";";
LOG_ENGINE_DEBUG_ << "Describe Collection Fields: " << field_statement.str();
field_res = field_statement.store();
} // Scoped Connection
if (res.num_rows() == 1) {
const mysqlpp::Row& resRow = res[0];
collection_schema.id_ = resRow["id"]; // implicit conversion
collection_schema.state_ = resRow["state"];
collection_schema.dimension_ = resRow["dimension"];
collection_schema.created_on_ = resRow["created_on"];
collection_schema.flag_ = resRow["flag"];
collection_schema.index_file_size_ = resRow["index_file_size"];
collection_schema.engine_type_ = resRow["engine_type"];
resRow["index_params"].to_string(collection_schema.index_params_);
collection_schema.metric_type_ = resRow["metric_type"];
resRow["owner_table"].to_string(collection_schema.owner_collection_);
resRow["partition_tag"].to_string(collection_schema.partition_tag_);
resRow["version"].to_string(collection_schema.version_);
collection_schema.flush_lsn_ = resRow["flush_lsn"];
} else {
return Status(DB_NOT_FOUND, "Collection " + collection_schema.collection_id_ + " not found");
}
auto num_row = field_res.num_rows();
if (num_row >= 1) {
fields_schema.fields_schema_.resize(num_row);
for (uint64_t i = 0; i < num_row; ++i) {
const mysqlpp::Row& resRow = field_res[i];
resRow["collection_id"].to_string(fields_schema.fields_schema_[i].collection_id_);
resRow["field_name"].to_string(fields_schema.fields_schema_[i].field_name_);
fields_schema.fields_schema_[i].field_type_ = resRow["field_type"];
resRow["field_params"].to_string(fields_schema.fields_schema_[i].field_params_);
}
} else {
return Status(DB_NOT_FOUND, "Fields of " + collection_schema.collection_id_ + " not found");
}
} catch (std::exception& e) {
return HandleException("Failed to describe collection", e.what());
}
return Status::OK();
}
} // namespace meta
} // namespace engine
} // namespace milvus

View File

@ -158,12 +158,6 @@ class MySQLMetaImpl : public Meta {
Status
GetGlobalLastLSN(uint64_t& lsn) override;
Status
CreateHybridCollection(CollectionSchema& collection_schema, hybrid::FieldsSchema& fields_schema) override;
Status
DescribeHybridCollection(CollectionSchema& collection_schema, hybrid::FieldsSchema& fields_schema) override;
private:
Status
NextFileId(std::string& file_id);

View File

@ -35,10 +35,6 @@
#include "utils/StringHelpFunctions.h"
#include "utils/ValidationUtil.h"
#define USING_SQLITE_WARNING \
LOG_ENGINE_WARNING_ << "You are using SQLite as the meta data management, which can't be used in production. " \
"Please change it to MySQL!";
namespace milvus {
namespace engine {
namespace meta {
@ -252,8 +248,6 @@ SqliteMetaImpl::ValidateMetaSchema() {
Status
SqliteMetaImpl::SqlQuery(const std::string& sql, AttrsMapList* res) {
try {
LOG_ENGINE_DEBUG_ << sql;
std::lock_guard<std::mutex> meta_lock(sqlite_mutex_);
int (* call_back)(void*, int, char**, char**) = nullptr;
@ -289,8 +283,6 @@ SqliteMetaImpl::SqlTransaction(const std::vector<std::string>& sql_statements) {
int rc = SQLITE_OK;
for (auto& sql : sql_statements) {
LOG_ENGINE_DEBUG_ << sql;
rc = sqlite3_exec(db_, sql.c_str(), nullptr, nullptr, nullptr);
if (rc != SQLITE_OK) {
break;
@ -345,6 +337,7 @@ SqliteMetaImpl::Initialize() {
// create meta tables
auto create_schema = [&](const MetaSchema& schema) {
std::string create_table_str = "CREATE TABLE IF NOT EXISTS " + schema.name() + "(" + schema.ToString() + ");";
LOG_ENGINE_DEBUG_ << "Initialize: " << create_table_str;
std::vector<std::string> statements = {create_table_str};
auto status = SqlTransaction(statements);
if (!status.ok()) {
@ -366,8 +359,6 @@ SqliteMetaImpl::Initialize() {
Status
SqliteMetaImpl::CreateCollection(CollectionSchema& collection_schema) {
USING_SQLITE_WARNING
try {
server::MetricCollector metric;
@ -377,6 +368,7 @@ SqliteMetaImpl::CreateCollection(CollectionSchema& collection_schema) {
fiu_do_on("SqliteMetaImpl.CreateCollection.throw_exception", throw std::exception());
std::string statement = "SELECT state FROM " + std::string(META_TABLES) + " WHERE table_id = "
+ Quote(collection_schema.collection_id_) + ";";
LOG_ENGINE_DEBUG_ << "CreateCollection: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
if (res.size() == 1) {
@ -415,7 +407,7 @@ SqliteMetaImpl::CreateCollection(CollectionSchema& collection_schema) {
+ ", " + Quote(owner_collection) + ", " + Quote(partition_tag) + ", " + Quote(version)
+ ", " + flush_lsn + ");";
LOG_ENGINE_DEBUG_ << statement;
LOG_ENGINE_DEBUG_ << "CreateCollection: " << statement;
fiu_do_on("SqliteMetaImpl.CreateCollection.insert_throw_exception", throw std::exception());
auto status = SqlTransaction({statement});
@ -442,6 +434,7 @@ SqliteMetaImpl::DescribeCollection(CollectionSchema& collection_schema) {
+ std::string(META_TABLES) + " WHERE table_id = "
+ Quote(collection_schema.collection_id_) + " AND state <> "
+ std::to_string(CollectionSchema::TO_DELETE) + ";";
LOG_ENGINE_DEBUG_ << "DescribeCollection: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -488,6 +481,7 @@ SqliteMetaImpl::HasCollection(const std::string& collection_id, bool& has_or_not
statement = "SELECT id FROM " + std::string(META_TABLES) + " WHERE table_id = " + Quote(collection_id)
+ " AND state <> " + std::to_string(CollectionSchema::TO_DELETE) + ";";
}
LOG_ENGINE_DEBUG_ << "HasCollection: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -517,6 +511,7 @@ SqliteMetaImpl::AllCollections(std::vector<CollectionSchema>& collection_schema_
} else {
statement += ";";
}
LOG_ENGINE_DEBUG_ << "AllCollections: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -570,6 +565,7 @@ SqliteMetaImpl::DropCollections(const std::vector<std::string>& collection_id_ar
}
statement += ");";
statements.emplace_back(statement);
LOG_ENGINE_DEBUG_ << "DropCollections: " << statement;
}
{
@ -617,6 +613,7 @@ SqliteMetaImpl::DeleteCollectionFiles(const std::vector<std::string>& collection
}
statement += (") AND file_type <> " + std::to_string(SegmentSchema::TO_DELETE) + ";");
statements.emplace_back(statement);
LOG_ENGINE_DEBUG_ << "DeleteCollectionFiles: " << statement;
}
auto status = SqlTransaction(statements);
@ -634,7 +631,6 @@ SqliteMetaImpl::DeleteCollectionFiles(const std::vector<std::string>& collection
Status
SqliteMetaImpl::CreateCollectionFile(SegmentSchema& file_schema) {
USING_SQLITE_WARNING
if (file_schema.date_ == EmptyDate) {
file_schema.date_ = utils::GetDate();
}
@ -680,6 +676,7 @@ SqliteMetaImpl::CreateCollectionFile(SegmentSchema& file_schema) {
+ Quote(collection_id) + ", " + Quote(segment_id) + ", " + engine_type + ", "
+ Quote(file_id) + ", " + file_type + ", " + file_size + ", " + row_count
+ ", " + updated_time + ", " + created_on + ", " + date + ", " + flush_lsn + ");";
LOG_ENGINE_DEBUG_ << "CreateCollectionFile: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -715,6 +712,7 @@ SqliteMetaImpl::GetCollectionFiles(const std::string& collection_id, const std::
" row_count, date, created_on FROM " + std::string(META_TABLEFILES)
+ " WHERE table_id = " + Quote(collection_id) + " AND (" + idStr + ")"
+ " AND file_type <> " + std::to_string(SegmentSchema::TO_DELETE) + ";";
LOG_ENGINE_DEBUG_ << "GetCollectionFiles: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -768,6 +766,7 @@ SqliteMetaImpl::GetCollectionFilesBySegmentId(const std::string& segment_id, Fil
" row_count, date, created_on FROM " + std::string(META_TABLEFILES)
+ " WHERE segment_id = " + Quote(segment_id) + " AND file_type <> "
+ std::to_string(SegmentSchema::TO_DELETE) + ";";
LOG_ENGINE_DEBUG_ << "GetCollectionFilesBySegmentId: " << statement;
AttrsMapList res;
{
@ -826,6 +825,7 @@ SqliteMetaImpl::UpdateCollectionFlag(const std::string& collection_id, int64_t f
std::string statement = "UPDATE " + std::string(META_TABLES) + " SET flag = " + std::to_string(flag)
+ " WHERE table_id = " + Quote(collection_id) + ";";
LOG_ENGINE_DEBUG_ << "UpdateCollectionFlag: " << statement;
auto status = SqlTransaction({statement});
if (!status.ok()) {
@ -848,6 +848,7 @@ SqliteMetaImpl::UpdateCollectionFlushLSN(const std::string& collection_id, uint6
std::string statement = "UPDATE " + std::string(META_TABLES) + " SET flush_lsn = "
+ std::to_string(flush_lsn) + " WHERE table_id = " + Quote(collection_id) + ";";
LOG_ENGINE_DEBUG_ << "UpdateCollectionFlushLSN: " << statement;
auto status = SqlTransaction({statement});
if (!status.ok()) {
@ -871,6 +872,7 @@ SqliteMetaImpl::GetCollectionFlushLSN(const std::string& collection_id, uint64_t
std::string statement = "SELECT flush_lsn FROM " + std::string(META_TABLES) + " WHERE table_id = "
+ Quote(collection_id) + ";";
LOG_ENGINE_DEBUG_ << "GetCollectionFlushLSN: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -900,6 +902,7 @@ SqliteMetaImpl::UpdateCollectionFile(SegmentSchema& file_schema) {
std::string statement = "SELECT state FROM " + std::string(META_TABLES) + " WHERE table_id = "
+ Quote(file_schema.collection_id_) + ";";
LOG_ENGINE_DEBUG_ << "UpdateCollectionFile: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -932,6 +935,7 @@ SqliteMetaImpl::UpdateCollectionFile(SegmentSchema& file_schema) {
+ " ,file_type = " + file_type + " ,file_size = " + file_size + " ,row_count = " + row_count
+ " ,updated_time = " + updated_time + " ,created_on = " + created_on + " ,date = " + date
+ " WHERE id = " + id + ";";
LOG_ENGINE_DEBUG_ << "UpdateCollectionFile: " << statement;
status = SqlTransaction({statement});
if (!status.ok()) {
@ -966,6 +970,7 @@ SqliteMetaImpl::UpdateCollectionFiles(SegmentsSchema& files) {
std::string statement = "SELECT id FROM " + std::string(META_TABLES)
+ " WHERE table_id = " + Quote(file.collection_id_) + " AND state <> "
+ std::to_string(CollectionSchema::TO_DELETE) + ";";
LOG_ENGINE_DEBUG_ << "UpdateCollectionFiles: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -1006,6 +1011,7 @@ SqliteMetaImpl::UpdateCollectionFiles(SegmentsSchema& files) {
+ " ,row_count = " + row_count + " ,updated_time = " + updated_time
+ " ,created_on = " + created_on + " ,date = " + date + " WHERE id = " + id + ";";
statements.emplace_back(statement);
LOG_ENGINE_DEBUG_ << "UpdateCollectionFiles: " << statement;
}
auto status = SqlTransaction(statements);
@ -1033,6 +1039,7 @@ SqliteMetaImpl::UpdateCollectionFilesRowCount(SegmentsSchema& files) {
std::string statement = "UPDATE " + std::string(META_TABLEFILES) + " SET row_count = " + row_count
+ " , updated_time = " + updated_time + " WHERE file_id = " + file.file_id_ + ";";
LOG_ENGINE_DEBUG_ << "UpdateCollectionFilesRowCount: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -1060,6 +1067,7 @@ SqliteMetaImpl::UpdateCollectionIndex(const std::string& collection_id, const Co
std::string statement = "SELECT id, state, dimension, created_on FROM " + std::string(META_TABLES)
+ " WHERE table_id = " + Quote(collection_id)
+ " AND state <> " + std::to_string(CollectionSchema::TO_DELETE) + ";";
LOG_ENGINE_DEBUG_ << "UpdateCollectionIndex: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -1080,6 +1088,7 @@ SqliteMetaImpl::UpdateCollectionIndex(const std::string& collection_id, const Co
+ " ,engine_type = " + std::to_string(index.engine_type_) + " ,index_params = "
+ Quote(index.extra_params_.dump()) + " ,metric_type = " + std::to_string(index.metric_type_)
+ " WHERE table_id = " + Quote(collection_id) + ";";
LOG_ENGINE_DEBUG_ << "UpdateCollectionIndex: " << statement;
auto status = SqlTransaction({statement});
if (!status.ok()) {
@ -1108,6 +1117,7 @@ SqliteMetaImpl::UpdateCollectionFilesToIndex(const std::string& collection_id) {
+ std::to_string(SegmentSchema::TO_INDEX) + " WHERE table_id = " + Quote(collection_id)
+ " AND row_count >= " + std::to_string(meta::BUILD_INDEX_THRESHOLD)
+ " AND file_type = " + std::to_string(SegmentSchema::RAW) + ";";
LOG_ENGINE_DEBUG_ << "UpdateCollectionFilesToIndex: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -1134,6 +1144,7 @@ SqliteMetaImpl::DescribeCollectionIndex(const std::string& collection_id, Collec
std::string statement = "SELECT engine_type, index_params, metric_type FROM "
+ std::string(META_TABLES) + " WHERE table_id = " + Quote(collection_id)
+ " AND state <> " + std::to_string(CollectionSchema::TO_DELETE) + ";";
LOG_ENGINE_DEBUG_ << "DescribeCollectionIndex: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -1172,6 +1183,7 @@ SqliteMetaImpl::DropCollectionIndex(const std::string& collection_id) {
+ Quote(collection_id) + " AND file_type = " + std::to_string(SegmentSchema::INDEX)
+ ";";
statements.emplace_back(statement);
LOG_ENGINE_DEBUG_ << "DropCollectionIndex: " << statement;
// set all backup file to raw
statement = "UPDATE " + std::string(META_TABLEFILES) + " SET file_type = "
@ -1179,6 +1191,7 @@ SqliteMetaImpl::DropCollectionIndex(const std::string& collection_id) {
+ std::to_string(utils::GetMicroSecTimeStamp()) + " WHERE table_id = "
+ Quote(collection_id) + " AND file_type = " + std::to_string(SegmentSchema::BACKUP) + ";";
statements.emplace_back(statement);
LOG_ENGINE_DEBUG_ << "DropCollectionIndex: " << statement;
// set collection index type to raw
statement = "UPDATE " + std::string(META_TABLES) + " SET engine_type = (CASE WHEN metric_type in ("
@ -1189,6 +1202,7 @@ SqliteMetaImpl::DropCollectionIndex(const std::string& collection_id) {
+ " ELSE " + std::to_string((int32_t)EngineType::FAISS_IDMAP) + " END)"
+ " , index_params = '{}' WHERE table_id = " + Quote(collection_id) + ";";
statements.emplace_back(statement);
LOG_ENGINE_DEBUG_ << "DropCollectionIndex: " << statement;
auto status = SqlTransaction(statements);
if (!status.ok()) {
@ -1206,7 +1220,6 @@ SqliteMetaImpl::DropCollectionIndex(const std::string& collection_id) {
Status
SqliteMetaImpl::CreatePartition(const std::string& collection_id, const std::string& partition_name,
const std::string& tag, uint64_t lsn) {
USING_SQLITE_WARNING
server::MetricCollector metric;
CollectionSchema collection_schema;
@ -1269,6 +1282,7 @@ SqliteMetaImpl::HasPartition(const std::string& collection_id, const std::string
+ " WHERE owner_table = " + Quote(collection_id)
+ " AND partition_tag = " + Quote(valid_tag)
+ " AND state <> " + std::to_string(CollectionSchema::TO_DELETE) + ";";
LOG_ENGINE_DEBUG_ << "HasPartition: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -1305,6 +1319,7 @@ SqliteMetaImpl::ShowPartitions(const std::string& collection_id,
+ std::string(META_TABLES) + " WHERE owner_table = "
+ Quote(collection_id) + " AND state <> "
+ std::to_string(CollectionSchema::TO_DELETE) + ";";
LOG_ENGINE_DEBUG_ << "ShowPartitions: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -1346,6 +1361,7 @@ SqliteMetaImpl::CountPartitions(const std::string& collection_id, int64_t& parti
std::string statement = "SELECT count(*) FROM " + std::string(META_TABLES)
+ " WHERE owner_table = " + Quote(collection_id)
+ " AND state <> " + std::to_string(CollectionSchema::TO_DELETE) + ";";
LOG_ENGINE_DEBUG_ << "CountPartitions: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -1381,6 +1397,7 @@ SqliteMetaImpl::GetPartitionName(const std::string& collection_id, const std::st
+ " WHERE owner_table = " + Quote(collection_id)
+ " AND partition_tag = " + Quote(valid_tag)
+ " AND state <> " + std::to_string(CollectionSchema::TO_DELETE) + ";";
LOG_ENGINE_DEBUG_ << "GetPartitionName: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -1412,6 +1429,7 @@ SqliteMetaImpl::FilesToSearch(const std::string& collection_id, FilesHolder& fil
+ " AND (file_type = " + std::to_string(SegmentSchema::RAW)
+ " OR file_type = " + std::to_string(SegmentSchema::TO_INDEX)
+ " OR file_type = " + std::to_string(SegmentSchema::INDEX) + ");";
LOG_ENGINE_DEBUG_ << "FilesToSearch: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -1519,6 +1537,7 @@ SqliteMetaImpl::FilesToSearchEx(const std::string& root_collection, const std::s
statement += (" AND (file_type = " + std::to_string(SegmentSchema::RAW));
statement += (" OR file_type = " + std::to_string(SegmentSchema::TO_INDEX));
statement += (" OR file_type = " + std::to_string(SegmentSchema::INDEX) + ");");
LOG_ENGINE_DEBUG_ << "FilesToSearchEx: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -1592,6 +1611,7 @@ SqliteMetaImpl::FilesToMerge(const std::string& collection_id, FilesHolder& file
+ " WHERE table_id = " + Quote(collection_id)
+ " AND file_type = " + std::to_string(SegmentSchema::RAW)
+ " ORDER BY row_count DESC;";
LOG_ENGINE_DEBUG_ << "FilesToMerge: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -1659,6 +1679,7 @@ SqliteMetaImpl::FilesToIndex(FilesHolder& files_holder) {
std::string statement = "SELECT id, table_id, segment_id, file_id, file_type, file_size, row_count, "
"date, engine_type, created_on, updated_time FROM " + Quote(META_TABLEFILES)
+ " WHERE file_type = " + std::to_string(SegmentSchema::TO_INDEX) + ";";
// LOG_ENGINE_DEBUG_ << "FilesToIndex: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -1746,6 +1767,7 @@ SqliteMetaImpl::FilesByType(const std::string& collection_id, const std::vector<
"date, engine_type, created_on, updated_time FROM " + Quote(META_TABLEFILES)
+ " WHERE table_id = " + Quote(collection_id)
+ " AND file_type in (" + types + ");";
LOG_ENGINE_DEBUG_ << "FilesByType: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -1897,6 +1919,7 @@ SqliteMetaImpl::FilesByTypeEx(const std::vector<meta::CollectionSchema>& collect
}
}
statement += (") AND file_type in (" + types + ");");
LOG_ENGINE_DEBUG_ << "FilesByTypeEx: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -2005,6 +2028,7 @@ SqliteMetaImpl::FilesByID(const std::vector<size_t>& ids, FilesHolder& files_hol
idStr = idStr.substr(0, idStr.size() - 4); // remove the last " OR "
statement += (" WHERE (" + idStr + ")");
LOG_ENGINE_DEBUG_ << "FilesByID: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -2160,6 +2184,7 @@ SqliteMetaImpl::CleanUpShadowFiles() {
+ std::to_string(SegmentSchema::NEW) + ","
+ std::to_string(SegmentSchema::NEW_MERGE) + ","
+ std::to_string(SegmentSchema::NEW_INDEX) + ");";
LOG_ENGINE_DEBUG_ << "CleanUpShadowFiles: " << statement;
auto status = SqlTransaction({statement});
fiu_do_on("SqliteMetaImpl.CleanUpShadowFiles.fail_commited", status = Status(DB_ERROR, ""));
@ -2271,6 +2296,7 @@ SqliteMetaImpl::CleanUpFilesWithTTL(uint64_t seconds /*, CleanUpFilter* filter*/
idsToDeleteStr = idsToDeleteStr.substr(0, idsToDeleteStr.size() - 4); // remove the last " OR "
statement = "DELETE FROM " + std::string(META_TABLEFILES) + " WHERE " + idsToDeleteStr + ";";
statements.emplace_back(statement);
LOG_ENGINE_DEBUG_ << "CleanUpFilesWithTTL: " << statement;
}
}
@ -2307,6 +2333,7 @@ SqliteMetaImpl::CleanUpFilesWithTTL(uint64_t seconds /*, CleanUpFilter* filter*/
++remove_collections;
statement = "DELETE FROM " + std::string(META_TABLES) + " WHERE id = " + resRow["id"] + ";";
LOG_ENGINE_DEBUG_ << "CleanUpFilesWithTTL: " << statement;
status = SqlTransaction({statement});
if (!status.ok()) {
return HandleException("Failed to clean up with ttl", status.message().c_str());
@ -2363,6 +2390,7 @@ SqliteMetaImpl::CleanUpFilesWithTTL(uint64_t seconds /*, CleanUpFilter* filter*/
for (auto& segment_id : segment_ids) {
std::string statement = "SELECT id FROM " + std::string(META_TABLEFILES)
+ " WHERE segment_id = " + Quote(segment_id.first) + ";";
LOG_ENGINE_DEBUG_ << "CleanUpFilesWithTTL: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -2400,6 +2428,7 @@ SqliteMetaImpl::Count(const std::string& collection_id, uint64_t& result) {
+ " AND (file_type = " + std::to_string(SegmentSchema::RAW)
+ " OR file_type = " + std::to_string(SegmentSchema::TO_INDEX)
+ " OR file_type = " + std::to_string(SegmentSchema::INDEX) + ");";
LOG_ENGINE_DEBUG_ << "Count: " << statement;
// to ensure UpdateCollectionFiles to be a atomic operation
std::lock_guard<std::mutex> meta_lock(operation_mutex_);
@ -2435,6 +2464,10 @@ SqliteMetaImpl::DropAll() {
statement + FIELDS_SCHEMA.name() + ";",
};
for (auto& sql : statements) {
LOG_ENGINE_DEBUG_ << "DropAll: " << sql;
}
auto status = SqlTransaction(statements);
if (!status.ok()) {
return HandleException("Failed to drop all", status.message().c_str());
@ -2461,6 +2494,7 @@ SqliteMetaImpl::DiscardFiles(int64_t to_discard_size) {
std::string statement = "SELECT id, file_size FROM " + std::string(META_TABLEFILES)
+ " WHERE file_type <> " + std::to_string(SegmentSchema::TO_DELETE)
+ " ORDER BY id ASC LIMIT 10;";
LOG_ENGINE_DEBUG_ << "DiscardFiles: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -2493,6 +2527,7 @@ SqliteMetaImpl::DiscardFiles(int64_t to_discard_size) {
+ " SET file_type = " + std::to_string(SegmentSchema::TO_DELETE)
+ " ,updated_time = " + std::to_string(utils::GetMicroSecTimeStamp())
+ " WHERE " + idsToDiscardStr + ";";
LOG_ENGINE_DEBUG_ << "DiscardFiles: " << statement;
status = SqlTransaction({statement});
fiu_do_on("SqliteMetaImpl.DiscardFiles.fail_commited", status = Status(DB_ERROR, ""));
@ -2529,6 +2564,7 @@ SqliteMetaImpl::SetGlobalLastLSN(uint64_t lsn) {
if (first_create) { // first time to get global lsn
statement = "INSERT INTO " + std::string(META_ENVIRONMENT) + " VALUES(" + std::to_string(lsn) + ");";
LOG_ENGINE_DEBUG_ << "SetGlobalLastLSN: " << statement;
status = SqlTransaction({statement});
if (!status.ok()) {
@ -2536,6 +2572,7 @@ SqliteMetaImpl::SetGlobalLastLSN(uint64_t lsn) {
}
} else if (lsn > last_lsn) {
statement = "UPDATE " + std::string(META_ENVIRONMENT) + " SET global_lsn = " + std::to_string(lsn) + ";";
LOG_ENGINE_DEBUG_ << "SetGlobalLastLSN: " << statement;
status = SqlTransaction({statement});
if (!status.ok()) {
@ -2543,7 +2580,6 @@ SqliteMetaImpl::SetGlobalLastLSN(uint64_t lsn) {
}
}
LOG_ENGINE_DEBUG_ << "Update global lsn = " << lsn;
} catch (std::exception& e) {
std::string msg = "Exception update global lsn = " + lsn;
return HandleException(msg, e.what());
@ -2558,6 +2594,7 @@ SqliteMetaImpl::GetGlobalLastLSN(uint64_t& lsn) {
server::MetricCollector metric;
std::string statement = "SELECT global_lsn FROM " + std::string(META_ENVIRONMENT) + ";";
LOG_ENGINE_DEBUG_ << "GetGlobalLastLSN: " << statement;
AttrsMapList res;
auto status = SqlQuery(statement, &res);
@ -2577,163 +2614,6 @@ SqliteMetaImpl::GetGlobalLastLSN(uint64_t& lsn) {
return Status::OK();
}
Status
SqliteMetaImpl::CreateHybridCollection(meta::CollectionSchema& collection_schema,
meta::hybrid::FieldsSchema& fields_schema) {
USING_SQLITE_WARNING
try {
server::MetricCollector metric;
if (collection_schema.collection_id_ == "") {
NextCollectionId(collection_schema.collection_id_);
} else {
fiu_do_on("SqliteMetaImpl.CreateCollection.throw_exception", throw std::exception());
std::string statement = "SELECT state FROM " + std::string(META_TABLES)
+ " WHERE table_id = " + Quote(collection_schema.collection_id_) + ";";
AttrsMapList res;
auto status = SqlQuery(statement, &res);
if (!status.ok()) {
return status;
}
if (res.size() == 1) {
int state = std::stoi(res[0]["state"]);
fiu_do_on("MySQLMetaImpl.CreateCollection.schema_TO_DELETE", state = CollectionSchema::TO_DELETE);
if (CollectionSchema::TO_DELETE == state) {
return Status(DB_ERROR,
"Collection already exists and it is in delete state, please wait a second");
} else {
return Status(DB_ALREADY_EXIST, "Collection already exists");
}
}
}
collection_schema.id_ = -1;
collection_schema.created_on_ = utils::GetMicroSecTimeStamp();
std::string id = "NULL"; // auto-increment
std::string& collection_id = collection_schema.collection_id_;
std::string state = std::to_string(collection_schema.state_);
std::string dimension = std::to_string(collection_schema.dimension_);
std::string created_on = std::to_string(collection_schema.created_on_);
std::string flag = std::to_string(collection_schema.flag_);
std::string index_file_size = std::to_string(collection_schema.index_file_size_);
std::string engine_type = std::to_string(collection_schema.engine_type_);
std::string& index_params = collection_schema.index_params_;
std::string metric_type = std::to_string(collection_schema.metric_type_);
std::string& owner_collection = collection_schema.owner_collection_;
std::string& partition_tag = collection_schema.partition_tag_;
std::string& version = collection_schema.version_;
std::string flush_lsn = std::to_string(collection_schema.flush_lsn_);
std::string statement = "INSERT INTO " + std::string(META_TABLES)
+ " VALUES(" + id + ", " + Quote(collection_id) + ", " + state + ", " + dimension + ", "
+ created_on + ", " + flag + ", " + index_file_size + ", " + engine_type + ", "
+ Quote(index_params) + ", " + metric_type + ", " + Quote(owner_collection) + ", "
+ Quote(partition_tag) + ", " + Quote(version) + ", " + flush_lsn + ");";
auto status = SqlTransaction({statement});
if (!status.ok()) {
return HandleException("Encounter exception when create collection", status.message().c_str());
}
collection_schema.id_ = sqlite3_last_insert_rowid(db_);
LOG_ENGINE_DEBUG_ << "Successfully create collection collection: " << collection_schema.collection_id_;
for (auto schema : fields_schema.fields_schema_) {
std::string id = "NULL";
std::string collection_id = schema.collection_id_;
std::string field_name = schema.field_name_;
std::string field_type = std::to_string(schema.field_type_);
std::string field_params = schema.field_params_;
statement = "INSERT INTO " + std::string(META_FIELDS) + " VALUES(" + Quote(collection_id) + ", "
+ Quote(field_name) + ", " + field_type + ", " + Quote(field_params) + ");";
status = SqlTransaction({statement});
if (!status.ok()) {
return HandleException("Failed to create field table", status.message().c_str());
}
}
LOG_ENGINE_DEBUG_ << "Successfully create hybrid collection: " << collection_schema.collection_id_;
return utils::CreateCollectionPath(options_, collection_schema.collection_id_);
} catch (std::exception& e) {
return HandleException("Encounter exception when create collection", e.what());
}
return Status::OK();
}
Status
SqliteMetaImpl::DescribeHybridCollection(milvus::engine::meta::CollectionSchema& collection_schema,
milvus::engine::meta::hybrid::FieldsSchema& fields_schema) {
try {
server::MetricCollector metric;
fiu_do_on("SqliteMetaImpl.DescriCollection.throw_exception", throw std::exception());
std::string statement = "SELECT id, state, dimension, created_on, flag, index_file_size, engine_type,"
" index_params, metric_type ,owner_table, partition_tag, version, flush_lsn"
" FROM " + std::string(META_TABLES)
+ " WHERE table_id = " + Quote(collection_schema.collection_id_)
+ " AND state <> " + std::to_string(CollectionSchema::TO_DELETE) + ";";
AttrsMapList res;
auto status = SqlQuery(statement, &res);
if (!status.ok()) {
return status;
}
if (res.size() == 1) {
auto& resRow = res[0];
collection_schema.id_ = std::stoul(resRow["id"]);
collection_schema.state_ = std::stoi(resRow["state"]);
collection_schema.dimension_ = std::stoi(resRow["dimension"]);
collection_schema.created_on_ = std::stol(resRow["created_on"]);
collection_schema.flag_ = std::stol(resRow["flag"]);
collection_schema.index_file_size_ = std::stol(resRow["index_file_size"]);
collection_schema.engine_type_ = std::stoi(resRow["engine_type"]);
collection_schema.index_params_ = resRow["index_params"];
collection_schema.metric_type_ = std::stoi(resRow["metric_type"]);
collection_schema.owner_collection_ = resRow["owner_table"];
collection_schema.partition_tag_ = resRow["partition_tag"];
collection_schema.version_ = resRow["version"];
collection_schema.flush_lsn_ = std::stoul(resRow["flush_lsn"]);
} else {
return Status(DB_NOT_FOUND, "Collection " + collection_schema.collection_id_ + " not found");
}
statement = "SELECT collection_id, field_name, field_type, field_params FROM " + std::string(META_FIELDS)
+ " WHERE collection_id = " + Quote(collection_schema.collection_id_) + ";";
AttrsMapList field_res;
status = SqlQuery(statement, &field_res);
if (!status.ok()) {
return status;
}
auto num_row = field_res.size();
if (num_row >= 1) {
fields_schema.fields_schema_.resize(num_row);
for (uint64_t i = 0; i < num_row; ++i) {
auto& resRow = field_res[i];
fields_schema.fields_schema_[i].collection_id_ = resRow["collection_id"];
fields_schema.fields_schema_[i].field_name_ = resRow["field_name"];
fields_schema.fields_schema_[i].field_type_ = std::stoi(resRow["field_type"]);
fields_schema.fields_schema_[i].field_params_ = resRow["field_params"];
}
} else {
return Status(DB_NOT_FOUND, "Fields of " + collection_schema.collection_id_ + " not found");
}
} catch (std::exception& e) {
return HandleException("Encounter exception when describe collection", e.what());
}
return Status::OK();
}
} // namespace meta
} // namespace engine
} // namespace milvus

View File

@ -160,12 +160,6 @@ class SqliteMetaImpl : public Meta {
Status
GetGlobalLastLSN(uint64_t& lsn) override;
Status
CreateHybridCollection(CollectionSchema& collection_schema, hybrid::FieldsSchema& fields_schema) override;
Status
DescribeHybridCollection(CollectionSchema& collection_schema, hybrid::FieldsSchema& fields_schema) override;
private:
Status
NextFileId(std::string& file_id);

View File

@ -164,8 +164,10 @@ WalManager::GetNextRecovery(MXLogRecord& record) {
auto it_col = collections_.find(record.collection_id);
if (it_col != collections_.end()) {
auto it_part = it_col->second.find(record.partition_tag);
if (it_part->second.flush_lsn < record.lsn) {
break;
if (it_part != it_col->second.end()) {
if (it_part->second.flush_lsn < record.lsn) {
break;
}
}
}
}
@ -215,8 +217,10 @@ WalManager::GetNextRecord(MXLogRecord& record) {
auto it_col = collections_.find(record.collection_id);
if (it_col != collections_.end()) {
auto it_part = it_col->second.find(record.partition_tag);
if (it_part->second.flush_lsn < record.lsn) {
break;
if (it_part != it_col->second.end()) {
if (it_part->second.flush_lsn < record.lsn) {
break;
}
}
}
}

View File

@ -159,6 +159,15 @@ class WalManager {
void
RemoveOldFiles(uint64_t flushed_lsn);
/*
* Get the LSN of the last inserting or deleting operation
* @retval lsn
*/
uint64_t
GetLastAppliedLsn() {
return last_applied_lsn_;
}
private:
WalManager
operator=(WalManager&);

View File

@ -31,15 +31,25 @@ class Index : public milvus::cache::DataObj {
using IndexPtr = std::shared_ptr<Index>;
// todo: remove from knowhere
class ToIndexData : public milvus::cache::DataObj {
class Blacklist : public milvus::cache::DataObj {
public:
explicit ToIndexData(int64_t size) : size_(size) {
Blacklist() {
}
private:
int64_t size_ = 0;
int64_t
Size() override {
int64_t sz = sizeof(Blacklist);
if (bitset_) {
sz += bitset_->size();
}
return sz;
}
int64_t time_stamp_ = -1;
faiss::ConcurrentBitsetPtr bitset_ = nullptr;
};
using BlacklistPtr = std::shared_ptr<Blacklist>;
} // namespace knowhere
} // namespace milvus

View File

@ -25,7 +25,7 @@ namespace milvus {
namespace knowhere {
static const int64_t MIN_K = 0;
static const int64_t MAX_K = 16384;
static const int64_t MAX_K = 1024 * 1024;
static const int64_t MIN_NBITS = 1;
static const int64_t MAX_NBITS = 16;
static const int64_t DEFAULT_NBITS = 8;
@ -175,7 +175,7 @@ IVFPQConfAdapter::CheckTrain(Config& oricfg, IndexMode& mode) {
return true;
}
// else try CPU Mode
mode == IndexMode::MODE_CPU;
mode = IndexMode::MODE_CPU;
}
#endif
return IsValidForCPU(dimension, m);

View File

@ -108,7 +108,7 @@ IndexAnnoy::BuildAll(const DatasetPtr& dataset_ptr, const Config& config) {
}
DatasetPtr
IndexAnnoy::Query(const DatasetPtr& dataset_ptr, const Config& config) {
IndexAnnoy::Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) {
if (!index_) {
KNOWHERE_THROW_MSG("index not initialize or trained");
}
@ -119,7 +119,6 @@ IndexAnnoy::Query(const DatasetPtr& dataset_ptr, const Config& config) {
auto all_num = rows * k;
auto p_id = (int64_t*)malloc(all_num * sizeof(int64_t));
auto p_dist = (float*)malloc(all_num * sizeof(float));
faiss::ConcurrentBitsetPtr blacklist = GetBlacklist();
#pragma omp parallel for
for (unsigned int i = 0; i < rows; ++i) {

View File

@ -48,7 +48,7 @@ class IndexAnnoy : public VecIndex {
}
DatasetPtr
Query(const DatasetPtr& dataset_ptr, const Config& config) override;
Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) override;
int64_t
Count() override;

View File

@ -37,7 +37,7 @@ BinaryIDMAP::Load(const BinarySet& index_binary) {
}
DatasetPtr
BinaryIDMAP::Query(const DatasetPtr& dataset_ptr, const Config& config) {
BinaryIDMAP::Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) {
if (!index_) {
KNOWHERE_THROW_MSG("index not initialize");
}
@ -50,7 +50,7 @@ BinaryIDMAP::Query(const DatasetPtr& dataset_ptr, const Config& config) {
auto p_id = (int64_t*)malloc(p_id_size);
auto p_dist = (float*)malloc(p_dist_size);
QueryImpl(rows, (uint8_t*)p_data, k, p_dist, p_id, config);
QueryImpl(rows, (uint8_t*)p_data, k, p_dist, p_id, config, blacklist);
MapOffsetToUid(p_id, static_cast<size_t>(elems));
auto ret_ds = std::make_shared<Dataset>();
@ -107,13 +107,13 @@ BinaryIDMAP::GetRawVectors() {
void
BinaryIDMAP::QueryImpl(int64_t n, const uint8_t* data, int64_t k, float* distances, int64_t* labels,
const Config& config) {
const Config& config, faiss::ConcurrentBitsetPtr blacklist) {
auto default_type = index_->metric_type;
if (config.contains(Metric::TYPE))
index_->metric_type = GetMetricType(config[Metric::TYPE].get<std::string>());
int32_t* i_distances = reinterpret_cast<int32_t*>(distances);
index_->search(n, (uint8_t*)data, k, i_distances, labels, GetBlacklist());
index_->search(n, (uint8_t*)data, k, i_distances, labels, blacklist);
// if hamming, it need transform int32 to float
if (index_->metric_type == faiss::METRIC_Hamming) {

View File

@ -44,7 +44,7 @@ class BinaryIDMAP : public VecIndex, public FaissBaseBinaryIndex {
AddWithoutIds(const DatasetPtr&, const Config&) override;
DatasetPtr
Query(const DatasetPtr&, const Config&) override;
Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) override;
int64_t
Count() override;
@ -62,7 +62,8 @@ class BinaryIDMAP : public VecIndex, public FaissBaseBinaryIndex {
protected:
virtual void
QueryImpl(int64_t n, const uint8_t* data, int64_t k, float* distances, int64_t* labels, const Config& config);
QueryImpl(int64_t n, const uint8_t* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist);
};
using BinaryIDMAPPtr = std::shared_ptr<BinaryIDMAP>;

View File

@ -41,7 +41,7 @@ BinaryIVF::Load(const BinarySet& index_binary) {
}
DatasetPtr
BinaryIVF::Query(const DatasetPtr& dataset_ptr, const Config& config) {
BinaryIVF::Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) {
if (!index_ || !index_->is_trained) {
KNOWHERE_THROW_MSG("index not initialize or trained");
}
@ -57,7 +57,7 @@ BinaryIVF::Query(const DatasetPtr& dataset_ptr, const Config& config) {
auto p_id = (int64_t*)malloc(p_id_size);
auto p_dist = (float*)malloc(p_dist_size);
QueryImpl(rows, (uint8_t*)p_data, k, p_dist, p_id, config);
QueryImpl(rows, (uint8_t*)p_data, k, p_dist, p_id, config, blacklist);
MapOffsetToUid(p_id, static_cast<size_t>(elems));
auto ret_ds = std::make_shared<Dataset>();
@ -163,15 +163,15 @@ BinaryIVF::GenParams(const Config& config) {
}
void
BinaryIVF::QueryImpl(int64_t n, const uint8_t* data, int64_t k, float* distances, int64_t* labels,
const Config& config) {
BinaryIVF::QueryImpl(int64_t n, const uint8_t* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist) {
auto params = GenParams(config);
auto ivf_index = dynamic_cast<faiss::IndexBinaryIVF*>(index_.get());
ivf_index->nprobe = params->nprobe;
stdclock::time_point before = stdclock::now();
int32_t* i_distances = reinterpret_cast<int32_t*>(distances);
index_->search(n, (uint8_t*)data, k, i_distances, labels, GetBlacklist());
index_->search(n, (uint8_t*)data, k, i_distances, labels, blacklist);
stdclock::time_point after = stdclock::now();
double search_cost = (std::chrono::duration<double, std::micro>(after - before)).count();

View File

@ -47,7 +47,7 @@ class BinaryIVF : public VecIndex, public FaissBaseBinaryIndex {
AddWithoutIds(const DatasetPtr&, const Config&) override;
DatasetPtr
Query(const DatasetPtr& dataset_ptr, const Config& config) override;
Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) override;
int64_t
Count() override;
@ -63,7 +63,8 @@ class BinaryIVF : public VecIndex, public FaissBaseBinaryIndex {
GenParams(const Config& config);
virtual void
QueryImpl(int64_t n, const uint8_t* data, int64_t k, float* distances, int64_t* labels, const Config& config);
QueryImpl(int64_t n, const uint8_t* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist);
};
using BinaryIVFIndexPtr = std::shared_ptr<BinaryIVF>;

View File

@ -110,7 +110,7 @@ IndexHNSW::AddWithoutIds(const DatasetPtr& dataset_ptr, const Config& config) {
}
DatasetPtr
IndexHNSW::Query(const DatasetPtr& dataset_ptr, const Config& config) {
IndexHNSW::Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) {
if (!index_) {
KNOWHERE_THROW_MSG("index not initialize or trained");
}
@ -124,7 +124,6 @@ IndexHNSW::Query(const DatasetPtr& dataset_ptr, const Config& config) {
index_->setEf(config[IndexParams::ef]);
faiss::ConcurrentBitsetPtr blacklist = GetBlacklist();
bool transform = (index_->metric_type_ == 1); // InnerProduct: 1
#pragma omp parallel for

View File

@ -40,7 +40,7 @@ class IndexHNSW : public VecIndex {
AddWithoutIds(const DatasetPtr&, const Config&) override;
DatasetPtr
Query(const DatasetPtr& dataset_ptr, const Config& config) override;
Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) override;
int64_t
Count() override;

View File

@ -68,7 +68,7 @@ IDMAP::AddWithoutIds(const DatasetPtr& dataset_ptr, const Config& config) {
}
DatasetPtr
IDMAP::Query(const DatasetPtr& dataset_ptr, const Config& config) {
IDMAP::Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) {
if (!index_) {
KNOWHERE_THROW_MSG("index not initialize");
}
@ -81,7 +81,7 @@ IDMAP::Query(const DatasetPtr& dataset_ptr, const Config& config) {
auto p_id = (int64_t*)malloc(p_id_size);
auto p_dist = (float*)malloc(p_dist_size);
QueryImpl(rows, (float*)p_data, k, p_dist, p_id, config);
QueryImpl(rows, (float*)p_data, k, p_dist, p_id, config, blacklist);
MapOffsetToUid(p_id, static_cast<size_t>(elems));
auto ret_ds = std::make_shared<Dataset>();
@ -135,11 +135,12 @@ IDMAP::GetRawVectors() {
}
void
IDMAP::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config) {
IDMAP::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist) {
auto default_type = index_->metric_type;
if (config.contains(Metric::TYPE))
index_->metric_type = GetMetricType(config[Metric::TYPE].get<std::string>());
index_->search(n, (float*)data, k, distances, labels, GetBlacklist());
index_->search(n, (float*)data, k, distances, labels, blacklist);
index_->metric_type = default_type;
}

View File

@ -43,7 +43,7 @@ class IDMAP : public VecIndex, public FaissBaseIndex {
AddWithoutIds(const DatasetPtr&, const Config&) override;
DatasetPtr
Query(const DatasetPtr&, const Config&) override;
Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) override;
int64_t
Count() override;
@ -64,7 +64,8 @@ class IDMAP : public VecIndex, public FaissBaseIndex {
protected:
virtual void
QueryImpl(int64_t, const float*, int64_t, float*, int64_t*, const Config&);
QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist);
};
using IDMAPPtr = std::shared_ptr<IDMAP>;

View File

@ -85,7 +85,7 @@ IVF::AddWithoutIds(const DatasetPtr& dataset_ptr, const Config& config) {
}
DatasetPtr
IVF::Query(const DatasetPtr& dataset_ptr, const Config& config) {
IVF::Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) {
if (!index_ || !index_->is_trained) {
KNOWHERE_THROW_MSG("index not initialize or trained");
}
@ -103,7 +103,7 @@ IVF::Query(const DatasetPtr& dataset_ptr, const Config& config) {
auto p_id = (int64_t*)malloc(p_id_size);
auto p_dist = (float*)malloc(p_dist_size);
QueryImpl(rows, (float*)p_data, k, p_dist, p_id, config);
QueryImpl(rows, (float*)p_data, k, p_dist, p_id, config, blacklist);
MapOffsetToUid(p_id, static_cast<size_t>(elems));
auto ret_ds = std::make_shared<Dataset>();
@ -272,7 +272,7 @@ IVF::GenGraph(const float* data, const int64_t k, GraphType& graph, const Config
res.resize(K * b_size);
auto xq = data + batch_size * dim * i;
QueryImpl(b_size, (float*)xq, K, res_dis.data(), res.data(), config);
QueryImpl(b_size, (float*)xq, K, res_dis.data(), res.data(), config, nullptr);
for (int j = 0; j < b_size; ++j) {
auto& node = graph[batch_size * i + j];
@ -294,7 +294,8 @@ IVF::GenParams(const Config& config) {
}
void
IVF::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config) {
IVF::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist) {
auto params = GenParams(config);
auto ivf_index = dynamic_cast<faiss::IndexIVF*>(index_.get());
ivf_index->nprobe = std::min(params->nprobe, ivf_index->invlists->nlist);
@ -304,7 +305,7 @@ IVF::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_
} else {
ivf_index->parallel_mode = 0;
}
ivf_index->search(n, (float*)data, k, distances, labels, GetBlacklist());
ivf_index->search(n, (float*)data, k, distances, labels, blacklist);
stdclock::time_point after = stdclock::now();
double search_cost = (std::chrono::duration<double, std::micro>(after - before)).count();
LOG_KNOWHERE_DEBUG_ << "IVF search cost: " << search_cost

View File

@ -47,12 +47,7 @@ class IVF : public VecIndex, public FaissBaseIndex {
AddWithoutIds(const DatasetPtr&, const Config&) override;
DatasetPtr
Query(const DatasetPtr&, const Config&) override;
#if 0
DatasetPtr
QueryById(const DatasetPtr& dataset, const Config& config) override;
#endif
Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) override;
int64_t
Count() override;
@ -82,7 +77,8 @@ class IVF : public VecIndex, public FaissBaseIndex {
GenParams(const Config&);
virtual void
QueryImpl(int64_t, const float*, int64_t, float*, int64_t*, const Config&);
QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist);
void
SealImpl() override;

View File

@ -71,7 +71,7 @@ NSG::Load(const BinarySet& index_binary) {
}
DatasetPtr
NSG::Query(const DatasetPtr& dataset_ptr, const Config& config) {
NSG::Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) {
if (!index_ || !index_->is_trained) {
KNOWHERE_THROW_MSG("index not initialize or trained");
}
@ -85,8 +85,6 @@ NSG::Query(const DatasetPtr& dataset_ptr, const Config& config) {
auto p_id = (int64_t*)malloc(p_id_size);
auto p_dist = (float*)malloc(p_dist_size);
faiss::ConcurrentBitsetPtr blacklist = GetBlacklist();
impl::SearchParams s_params;
s_params.search_length = config[IndexParams::search_length];
s_params.k = config[meta::TOPK];

View File

@ -54,7 +54,7 @@ class NSG : public VecIndex {
}
DatasetPtr
Query(const DatasetPtr&, const Config&) override;
Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) override;
int64_t
Count() override;

View File

@ -176,7 +176,7 @@ CPUSPTAGRNG::SetParameters(const Config& config) {
}
DatasetPtr
CPUSPTAGRNG::Query(const DatasetPtr& dataset_ptr, const Config& config) {
CPUSPTAGRNG::Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) {
SetParameters(config);
float* p_data = (float*)dataset_ptr->Get<const void*>(meta::TENSOR);

View File

@ -47,7 +47,7 @@ class CPUSPTAGRNG : public VecIndex {
}
DatasetPtr
Query(const DatasetPtr& dataset_ptr, const Config& config) override;
Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) override;
int64_t
Count() override;

View File

@ -41,7 +41,7 @@ class VecIndex : public Index {
AddWithoutIds(const DatasetPtr& dataset, const Config& config) = 0;
virtual DatasetPtr
Query(const DatasetPtr& dataset, const Config& config) = 0;
Query(const DatasetPtr& dataset_ptr, const Config& config, faiss::ConcurrentBitsetPtr blacklist) = 0;
virtual int64_t
Dim() = 0;
@ -59,18 +59,6 @@ class VecIndex : public Index {
return index_mode_;
}
faiss::ConcurrentBitsetPtr
GetBlacklist() {
std::unique_lock<std::mutex> lck(bitset_mutex_);
return bitset_;
}
void
SetBlacklist(faiss::ConcurrentBitsetPtr bitset_ptr) {
std::unique_lock<std::mutex> lck(bitset_mutex_);
bitset_ = std::move(bitset_ptr);
}
std::shared_ptr<std::vector<IDType>>
GetUids() const {
return uids_;
@ -92,12 +80,6 @@ class VecIndex : public Index {
}
}
size_t
BlacklistSize() {
std::unique_lock<std::mutex> lck(bitset_mutex_);
return bitset_ ? bitset_->size() : 0;
}
size_t
UidsSize() {
return (uids_ == nullptr) ? 0 : (uids_->size() * sizeof(IDType));
@ -122,7 +104,7 @@ class VecIndex : public Index {
int64_t
Size() override {
return BlacklistSize() + UidsSize() + IndexSize();
return UidsSize() + IndexSize();
}
protected:
@ -130,11 +112,6 @@ class VecIndex : public Index {
IndexMode index_mode_ = IndexMode::MODE_CPU;
std::shared_ptr<std::vector<IDType>> uids_ = nullptr;
int64_t index_size_ = -1;
private:
// multi thread may access bitset_
std::mutex bitset_mutex_;
faiss::ConcurrentBitsetPtr bitset_ = nullptr;
};
using VecIndexPtr = std::shared_ptr<VecIndex>;

View File

@ -97,13 +97,14 @@ GPUIDMAP::GetRawVectors() {
}
void
GPUIDMAP::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config) {
GPUIDMAP::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist) {
ResScope rs(res_, gpu_id_);
auto default_type = index_->metric_type;
if (config.contains(Metric::TYPE))
index_->metric_type = GetMetricType(config[Metric::TYPE].get<std::string>());
index_->search(n, (float*)data, k, distances, labels, GetBlacklist());
index_->search(n, (float*)data, k, distances, labels, blacklist);
index_->metric_type = default_type;
}
@ -128,7 +129,7 @@ GPUIDMAP::GenGraph(const float* data, const int64_t k, GraphType& graph, const C
res.resize(K * b_size);
auto xq = data + batch_size * dim * i;
QueryImpl(b_size, (float*)xq, K, res_dis.data(), res.data(), config);
QueryImpl(b_size, (float*)xq, K, res_dis.data(), res.data(), config, nullptr);
for (int j = 0; j < b_size; ++j) {
auto& node = graph[batch_size * i + j];

View File

@ -50,7 +50,8 @@ class GPUIDMAP : public IDMAP, public GPUIndex {
LoadImpl(const BinarySet&, const IndexType&) override;
void
QueryImpl(int64_t, const float*, int64_t, float*, int64_t*, const Config&) override;
QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist);
};
using GPUIDMAPPtr = std::shared_ptr<GPUIDMAP>;

View File

@ -38,14 +38,12 @@ GPUIVF::Train(const DatasetPtr& dataset_ptr, const Config& config) {
if (gpu_res != nullptr) {
ResScope rs(gpu_res, gpu_id_, true);
faiss::gpu::GpuIndexIVFFlatConfig idx_config;
idx_config.device = gpu_id_;
idx_config.device = static_cast<int32_t>(gpu_id_);
int32_t nlist = config[IndexParams::nlist];
faiss::MetricType metric_type = GetMetricType(config[Metric::TYPE].get<std::string>());
auto device_index =
new faiss::gpu::GpuIndexIVFFlat(gpu_res->faiss_res.get(), dim, nlist, metric_type, idx_config);
device_index->train(rows, (float*)p_data);
index_.reset(device_index);
index_ = std::make_shared<faiss::gpu::GpuIndexIVFFlat>(gpu_res->faiss_res.get(), dim, nlist, metric_type,
idx_config);
index_->train(rows, (float*)p_data);
res_ = gpu_res;
} else {
KNOWHERE_THROW_MSG("Build IVF can't get gpu resource");
@ -133,7 +131,8 @@ GPUIVF::LoadImpl(const BinarySet& binary_set, const IndexType& type) {
}
void
GPUIVF::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config) {
GPUIVF::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist) {
auto device_index = std::dynamic_pointer_cast<faiss::gpu::GpuIndexIVF>(index_);
fiu_do_on("GPUIVF.search_impl.invald_index", device_index = nullptr);
if (device_index) {
@ -145,8 +144,7 @@ GPUIVF::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int
int64_t dim = device_index->d;
for (int64_t i = 0; i < n; i += block_size) {
int64_t search_size = (n - i > block_size) ? block_size : (n - i);
device_index->search(search_size, (float*)data + i * dim, k, distances + i * k, labels + i * k,
GetBlacklist());
device_index->search(search_size, (float*)data + i * dim, k, distances + i * k, labels + i * k, blacklist);
}
} else {
KNOWHERE_THROW_MSG("Not a GpuIndexIVF type.");

View File

@ -51,7 +51,8 @@ class GPUIVF : public IVF, public GPUIndex {
LoadImpl(const BinarySet&, const IndexType&) override;
void
QueryImpl(int64_t, const float*, int64_t, float*, int64_t*, const Config&) override;
QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist);
};
using GPUIVFPtr = std::shared_ptr<GPUIVF>;

View File

@ -32,11 +32,15 @@ GPUIVFPQ::Train(const DatasetPtr& dataset_ptr, const Config& config) {
auto gpu_res = FaissGpuResourceMgr::GetInstance().GetRes(gpu_id_);
if (gpu_res != nullptr) {
ResScope rs(gpu_res, gpu_id_, true);
auto device_index = new faiss::gpu::GpuIndexIVFPQ(
gpu_res->faiss_res.get(), dim, config[IndexParams::nlist].get<int64_t>(), config[IndexParams::m],
config[IndexParams::nbits], GetMetricType(config[Metric::TYPE].get<std::string>()));
device_index->train(rows, (float*)p_data);
index_.reset(device_index);
faiss::gpu::GpuIndexIVFPQConfig idx_config;
idx_config.device = static_cast<int32_t>(gpu_id_);
int32_t nlist = config[IndexParams::nlist];
int32_t m = config[IndexParams::m];
int32_t nbits = config[IndexParams::nbits];
faiss::MetricType metric_type = GetMetricType(config[Metric::TYPE].get<std::string>());
index_ = std::make_shared<faiss::gpu::GpuIndexIVFPQ>(gpu_res->faiss_res.get(), dim, nlist, m, nbits,
metric_type, idx_config);
index_->train(rows, (float*)p_data);
res_ = gpu_res;
} else {
KNOWHERE_THROW_MSG("Build IVFPQ can't get gpu resource");

View File

@ -32,11 +32,13 @@ GPUIVFSQ::Train(const DatasetPtr& dataset_ptr, const Config& config) {
auto gpu_res = FaissGpuResourceMgr::GetInstance().GetRes(gpu_id_);
if (gpu_res != nullptr) {
ResScope rs(gpu_res, gpu_id_, true);
auto device_index = new faiss::gpu::GpuIndexIVFScalarQuantizer(
gpu_res->faiss_res.get(), dim, config[IndexParams::nlist].get<int64_t>(), faiss::QuantizerType::QT_8bit,
GetMetricType(config[Metric::TYPE].get<std::string>()));
device_index->train(rows, (float*)p_data);
index_.reset(device_index);
faiss::gpu::GpuIndexIVFScalarQuantizerConfig idx_config;
idx_config.device = static_cast<int32_t>(gpu_id_);
int32_t nlist = config[IndexParams::nlist];
faiss::MetricType metric_type = GetMetricType(config[Metric::TYPE].get<std::string>());
index_ = std::make_shared<faiss::gpu::GpuIndexIVFScalarQuantizer>(
gpu_res->faiss_res.get(), dim, nlist, faiss::QuantizerType::QT_8bit, metric_type, true, idx_config);
index_->train(rows, (float*)p_data);
res_ = gpu_res;
} else {
KNOWHERE_THROW_MSG("Build IVFSQ can't get gpu resource");

View File

@ -12,8 +12,7 @@
#include <faiss/IndexSQHybrid.h>
#include <faiss/gpu/GpuCloner.h>
#include <faiss/gpu/GpuIndexIVF.h>
#include <faiss/index_factory.h>
#include <faiss/gpu/GpuIndexIVFSQHybrid.h>
#include <fiu-local.h>
#include <string>
#include <utility>
@ -34,28 +33,22 @@ IVFSQHybrid::Train(const DatasetPtr& dataset_ptr, const Config& config) {
GETTENSOR(dataset_ptr)
gpu_id_ = config[knowhere::meta::DEVICEID];
std::stringstream index_type;
index_type << "IVF" << config[IndexParams::nlist] << ","
<< "SQ8Hybrid";
auto build_index =
faiss::index_factory(dim, index_type.str().c_str(), GetMetricType(config[Metric::TYPE].get<std::string>()));
auto gpu_res = FaissGpuResourceMgr::GetInstance().GetRes(gpu_id_);
if (gpu_res != nullptr) {
ResScope rs(gpu_res, gpu_id_, true);
auto device_index = faiss::gpu::index_cpu_to_gpu(gpu_res->faiss_res.get(), gpu_id_, build_index);
device_index->train(rows, (float*)p_data);
index_.reset(device_index);
faiss::gpu::GpuIndexIVFSQHybridConfig idx_config;
idx_config.device = static_cast<int32_t>(gpu_id_);
int32_t nlist = config[IndexParams::nlist];
faiss::MetricType metric_type = GetMetricType(config[Metric::TYPE].get<std::string>());
index_ = std::make_shared<faiss::gpu::GpuIndexIVFSQHybrid>(
gpu_res->faiss_res.get(), dim, nlist, faiss::QuantizerType::QT_8bit, metric_type, true, idx_config);
index_->train(rows, reinterpret_cast<const float*>(p_data));
res_ = gpu_res;
gpu_mode_ = 2;
index_mode_ = IndexMode::MODE_GPU;
} else {
delete build_index;
KNOWHERE_THROW_MSG("Build IVFSQHybrid can't get gpu resource");
}
delete build_index;
}
VecIndexPtr
@ -243,21 +236,21 @@ IVFSQHybrid::LoadImpl(const BinarySet& binary_set, const IndexType& type) {
}
void
IVFSQHybrid::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels,
const Config& config) {
IVFSQHybrid::QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist) {
if (gpu_mode_ == 2) {
GPUIVF::QueryImpl(n, data, k, distances, labels, config);
GPUIVF::QueryImpl(n, data, k, distances, labels, config, blacklist);
// index_->search(n, (float*)data, k, distances, labels);
} else if (gpu_mode_ == 1) { // hybrid
auto gpu_id = quantizer_->gpu_id;
if (auto res = FaissGpuResourceMgr::GetInstance().GetRes(gpu_id)) {
ResScope rs(res, gpu_id, true);
IVF::QueryImpl(n, data, k, distances, labels, config);
IVF::QueryImpl(n, data, k, distances, labels, config, blacklist);
} else {
KNOWHERE_THROW_MSG("Hybrid Search Error, can't get gpu: " + std::to_string(gpu_id) + "resource");
}
} else if (gpu_mode_ == 0) {
IVF::QueryImpl(n, data, k, distances, labels, config);
IVF::QueryImpl(n, data, k, distances, labels, config, blacklist);
}
}

View File

@ -90,7 +90,8 @@ class IVFSQHybrid : public GPUIVFSQ {
LoadImpl(const BinarySet&, const IndexType&) override;
void
QueryImpl(int64_t, const float*, int64_t, float*, int64_t*, const Config&) override;
QueryImpl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& config,
faiss::ConcurrentBitsetPtr blacklist);
protected:
int64_t gpu_mode_ = 0; // 0: CPU, 1: Hybrid, 2: GPU

View File

@ -27,8 +27,6 @@ namespace cloner {
void
CopyIndexData(const VecIndexPtr& dst_index, const VecIndexPtr& src_index) {
dst_index->SetUids(src_index->GetUids());
dst_index->SetBlacklist(src_index->GetBlacklist());
dst_index->SetIndexSize(src_index->IndexSize());
}

View File

@ -63,7 +63,7 @@ MemoryIOReader::operator()(void* ptr, size_t size, size_t nitems) {
void
enable_faiss_logging() {
faiss::LOG_DEBUG_ = &log_debug_;
faiss::LOG_TRACE_ = &log_trace_;
}
} // namespace knowhere

View File

@ -317,7 +317,7 @@ void IndexIVF::search (idx_t n, const float *x, idx_t k,
indexIVF_stats.search_time += getmillisecs() - t0;
// string
if (LOG_DEBUG_) {
if (LOG_TRACE_) {
auto ids = idx.get();
for (size_t i = 0; i < n; i++) {
std::stringstream ss;
@ -328,7 +328,7 @@ void IndexIVF::search (idx_t n, const float *x, idx_t k,
}
ss << ids[i * nprobe + j];
}
(*LOG_DEBUG_)(ss.str());
(*LOG_TRACE_)(ss.str());
}
}
}

View File

@ -68,29 +68,39 @@ pass1SelectLists(void** listIndices,
// BlockSelect add cannot be used in a warp divergent circumstance; we
// handle the remainder warp below
for (; i < limit; i += blockDim.x) {
index = getListIndex(queryId,
start + i,
listIndices,
prefixSumOffsets,
topQueryToCentroid,
opt);
if (bitsetEmpty || (!(bitset[index >> 3] & (0x1 << (index & 0x7))))) {
do {
if (!bitsetEmpty) {
index = getListIndex(queryId,
start + i,
listIndices,
prefixSumOffsets,
topQueryToCentroid,
opt);
if (bitset[index >> 3] & (0x1 << (index & 0x7))) {
break;
}
}
heap.addThreadQ(distanceStart[i], start + i);
}
} while(0);
heap.checkThreadQ();
}
// Handle warp divergence separately
if (i < num) {
index = getListIndex(queryId,
start + i,
listIndices,
prefixSumOffsets,
topQueryToCentroid,
opt);
if (bitsetEmpty || (!(bitset[index >> 3] & (0x1 << (index & 0x7))))) {
do {
if (!bitsetEmpty) {
index = getListIndex(queryId,
start + i,
listIndices,
prefixSumOffsets,
topQueryToCentroid,
opt);
if (bitset[index >> 3] & (0x1 << (index & 0x7))) {
break;
}
}
heap.addThreadQ(distanceStart[i], start + i);
}
} while(0);
}
// Merge all final results

View File

@ -53,7 +53,7 @@ TEST_P(AnnoyTest, annoy_basic) {
// null faiss index
{
ASSERT_ANY_THROW(index_->Train(base_dataset, conf));
ASSERT_ANY_THROW(index_->Query(query_dataset, conf));
ASSERT_ANY_THROW(index_->Query(query_dataset, conf, nullptr));
ASSERT_ANY_THROW(index_->Serialize(conf));
ASSERT_ANY_THROW(index_->AddWithoutIds(base_dataset, conf));
ASSERT_ANY_THROW(index_->Count());
@ -64,7 +64,7 @@ TEST_P(AnnoyTest, annoy_basic) {
ASSERT_EQ(index_->Count(), nb);
ASSERT_EQ(index_->Dim(), dim);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
ReleaseQueryResult(result);
@ -73,7 +73,7 @@ TEST_P(AnnoyTest, annoy_basic) {
base_dataset->Set(milvus::knowhere::meta::ROWS, rows);
index_ = std::make_shared<milvus::knowhere::IndexAnnoy>();
index_->BuildAll(base_dataset, conf);
auto result2 = index_->Query(query_dataset, conf);
auto result2 = index_->Query(query_dataset, conf, nullptr);
auto res_ids = result2->Get<int64_t*>(milvus::knowhere::meta::IDS);
for (int64_t i = 0; i < nq; i++) {
for (int64_t j = rows; j < k; j++) {
@ -95,12 +95,11 @@ TEST_P(AnnoyTest, annoy_delete) {
bitset->set(i);
}
auto result1 = index_->Query(query_dataset, conf);
auto result1 = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result1, nq, k);
ReleaseQueryResult(result1);
index_->SetBlacklist(bitset);
auto result2 = index_->Query(query_dataset, conf);
auto result2 = index_->Query(query_dataset, conf, bitset);
AssertAnns(result2, nq, k, CheckMode::CHECK_NOT_EQUAL);
ReleaseQueryResult(result2);
@ -193,7 +192,7 @@ TEST_P(AnnoyTest, annoy_serialize) {
index_->Load(binaryset);
ASSERT_EQ(index_->Count(), nb);
ASSERT_EQ(index_->Dim(), dim);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, conf[milvus::knowhere::meta::TOPK]);
ReleaseQueryResult(result);
}

View File

@ -52,7 +52,7 @@ TEST_P(BinaryIDMAPTest, binaryidmap_basic) {
// null faiss index
{
ASSERT_ANY_THROW(index_->Serialize());
ASSERT_ANY_THROW(index_->Query(query_dataset, conf));
ASSERT_ANY_THROW(index_->Query(query_dataset, conf, nullptr));
ASSERT_ANY_THROW(index_->AddWithoutIds(nullptr, conf));
}
@ -61,7 +61,7 @@ TEST_P(BinaryIDMAPTest, binaryidmap_basic) {
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
ASSERT_TRUE(index_->GetRawVectors() != nullptr);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -69,7 +69,7 @@ TEST_P(BinaryIDMAPTest, binaryidmap_basic) {
auto binaryset = index_->Serialize();
auto new_index = std::make_shared<milvus::knowhere::BinaryIDMAP>();
new_index->Load(binaryset);
auto result2 = new_index->Query(query_dataset, conf);
auto result2 = new_index->Query(query_dataset, conf, nullptr);
AssertAnns(result2, nq, k);
// PrintResult(re_result, nq, k);
ReleaseQueryResult(result2);
@ -78,9 +78,8 @@ TEST_P(BinaryIDMAPTest, binaryidmap_basic) {
for (int64_t i = 0; i < nq; ++i) {
concurrent_bitset_ptr->set(i);
}
index_->SetBlacklist(concurrent_bitset_ptr);
auto result_bs_1 = index_->Query(query_dataset, conf);
auto result_bs_1 = index_->Query(query_dataset, conf, concurrent_bitset_ptr);
AssertAnns(result_bs_1, nq, k, CheckMode::CHECK_NOT_EQUAL);
ReleaseQueryResult(result_bs_1);
@ -108,7 +107,7 @@ TEST_P(BinaryIDMAPTest, binaryidmap_serialize) {
// serialize index
index_->Train(base_dataset, conf);
index_->AddWithoutIds(base_dataset, milvus::knowhere::Config());
auto re_result = index_->Query(query_dataset, conf);
auto re_result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(re_result, nq, k);
// PrintResult(re_result, nq, k);
ReleaseQueryResult(re_result);
@ -128,7 +127,7 @@ TEST_P(BinaryIDMAPTest, binaryidmap_serialize) {
index_->Load(binaryset);
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);

View File

@ -63,7 +63,7 @@ TEST_P(BinaryIVFTest, binaryivf_basic) {
// null faiss index
{
ASSERT_ANY_THROW(index_->Serialize());
ASSERT_ANY_THROW(index_->Query(query_dataset, conf));
ASSERT_ANY_THROW(index_->Query(query_dataset, conf, nullptr));
ASSERT_ANY_THROW(index_->AddWithoutIds(nullptr, conf));
}
@ -71,7 +71,7 @@ TEST_P(BinaryIVFTest, binaryivf_basic) {
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, conf[milvus::knowhere::meta::TOPK]);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -80,9 +80,8 @@ TEST_P(BinaryIVFTest, binaryivf_basic) {
for (int64_t i = 0; i < nq; ++i) {
concurrent_bitset_ptr->set(i);
}
index_->SetBlacklist(concurrent_bitset_ptr);
auto result2 = index_->Query(query_dataset, conf);
auto result2 = index_->Query(query_dataset, conf, concurrent_bitset_ptr);
AssertAnns(result2, nq, k, CheckMode::CHECK_NOT_EQUAL);
ReleaseQueryResult(result2);
@ -146,7 +145,7 @@ TEST_P(BinaryIVFTest, binaryivf_serialize) {
index_->Load(binaryset);
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, conf[milvus::knowhere::meta::TOPK]);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);

View File

@ -67,7 +67,7 @@ TEST_F(SingleIndexTest, IVFSQHybrid) {
{
for (int i = 0; i < 3; ++i) {
auto gpu_idx = cpu_idx->CopyCpuToGpu(DEVICEID, conf);
auto result = gpu_idx->Query(query_dataset, conf);
auto result = gpu_idx->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, conf[milvus::knowhere::meta::TOPK]);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -84,7 +84,7 @@ TEST_F(SingleIndexTest, IVFSQHybrid) {
auto pair = cpu_idx->CopyCpuToGpuWithQuantizer(DEVICEID, conf);
auto gpu_idx = pair.first;
auto result = gpu_idx->Query(query_dataset, conf);
auto result = gpu_idx->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, conf[milvus::knowhere::meta::TOPK]);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -95,7 +95,7 @@ TEST_F(SingleIndexTest, IVFSQHybrid) {
hybrid_idx->Load(binaryset);
auto quantization = hybrid_idx->LoadQuantizer(quantizer_conf);
auto new_idx = hybrid_idx->LoadData(quantization, quantizer_conf);
auto result = new_idx->Query(query_dataset, conf);
auto result = new_idx->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, conf[milvus::knowhere::meta::TOPK]);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -115,7 +115,7 @@ TEST_F(SingleIndexTest, IVFSQHybrid) {
hybrid_idx->Load(binaryset);
hybrid_idx->SetQuantizer(quantization);
auto result = hybrid_idx->Query(query_dataset, conf);
auto result = hybrid_idx->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, conf[milvus::knowhere::meta::TOPK]);
// PrintResult(result, nq, k);
hybrid_idx->UnsetQuantizer();

View File

@ -74,7 +74,7 @@ TEST_F(GPURESTEST, copyandsearch) {
auto conf = ParamGenerator::GetInstance().Gen(index_type_);
index_->Train(base_dataset, conf);
index_->AddWithoutIds(base_dataset, conf);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
ReleaseQueryResult(result);
@ -89,7 +89,7 @@ TEST_F(GPURESTEST, copyandsearch) {
auto search_func = [&] {
// TimeRecorder tc("search&load");
for (int i = 0; i < search_count; ++i) {
auto result = search_idx->Query(query_dataset, conf);
auto result = search_idx->Query(query_dataset, conf, nullptr);
ReleaseQueryResult(result);
// if (i > search_count - 6 || i == 0)
// tc.RecordSection("search once");
@ -109,7 +109,7 @@ TEST_F(GPURESTEST, copyandsearch) {
milvus::knowhere::TimeRecorder tc("Basic");
milvus::knowhere::cloner::CopyCpuToGpu(cpu_idx, DEVICEID, milvus::knowhere::Config());
tc.RecordSection("Copy to gpu once");
auto result2 = search_idx->Query(query_dataset, conf);
auto result2 = search_idx->Query(query_dataset, conf, nullptr);
ReleaseQueryResult(result2);
tc.RecordSection("Search once");
search_func();
@ -148,7 +148,7 @@ TEST_F(GPURESTEST, trainandsearch) {
};
auto search_stage = [&](milvus::knowhere::VecIndexPtr& search_idx) {
for (int i = 0; i < search_count; ++i) {
auto result = search_idx->Query(query_dataset, conf);
auto result = search_idx->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
ReleaseQueryResult(result);
}

View File

@ -50,7 +50,7 @@ TEST_P(HNSWTest, HNSW_basic) {
// null faiss index
{
ASSERT_ANY_THROW(index_->Serialize());
ASSERT_ANY_THROW(index_->Query(query_dataset, conf));
ASSERT_ANY_THROW(index_->Query(query_dataset, conf, nullptr));
ASSERT_ANY_THROW(index_->AddWithoutIds(nullptr, conf));
ASSERT_ANY_THROW(index_->Count());
ASSERT_ANY_THROW(index_->Dim());
@ -61,7 +61,7 @@ TEST_P(HNSWTest, HNSW_basic) {
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
ReleaseQueryResult(result);
@ -70,7 +70,7 @@ TEST_P(HNSWTest, HNSW_basic) {
base_dataset->Set(milvus::knowhere::meta::ROWS, rows);
index_->Train(base_dataset, conf);
index_->AddWithoutIds(base_dataset, conf);
auto result2 = index_->Query(query_dataset, conf);
auto result2 = index_->Query(query_dataset, conf, nullptr);
auto res_ids = result2->Get<int64_t*>(milvus::knowhere::meta::IDS);
for (int64_t i = 0; i < nq; i++) {
for (int64_t j = rows; j < k; j++) {
@ -92,12 +92,11 @@ TEST_P(HNSWTest, HNSW_delete) {
for (auto i = 0; i < nq; ++i) {
bitset->set(i);
}
auto result1 = index_->Query(query_dataset, conf);
auto result1 = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result1, nq, k);
ReleaseQueryResult(result1);
index_->SetBlacklist(bitset);
auto result2 = index_->Query(query_dataset, conf);
auto result2 = index_->Query(query_dataset, conf, bitset);
AssertAnns(result2, nq, k, CheckMode::CHECK_NOT_EQUAL);
ReleaseQueryResult(result2);
@ -151,7 +150,7 @@ TEST_P(HNSWTest, HNSW_serialize) {
index_->Load(binaryset);
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, conf[milvus::knowhere::meta::TOPK]);
ReleaseQueryResult(result);
}

View File

@ -73,7 +73,7 @@ TEST_P(IDMAPTest, idmap_basic) {
// null faiss index
{
ASSERT_ANY_THROW(index_->Serialize());
ASSERT_ANY_THROW(index_->Query(query_dataset, conf));
ASSERT_ANY_THROW(index_->Query(query_dataset, conf, nullptr));
ASSERT_ANY_THROW(index_->AddWithoutIds(nullptr, conf));
}
@ -82,7 +82,7 @@ TEST_P(IDMAPTest, idmap_basic) {
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
ASSERT_TRUE(index_->GetRawVectors() != nullptr);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -97,7 +97,7 @@ TEST_P(IDMAPTest, idmap_basic) {
auto binaryset = index_->Serialize();
auto new_index = std::make_shared<milvus::knowhere::IDMAP>();
new_index->Load(binaryset);
auto result2 = new_index->Query(query_dataset, conf);
auto result2 = new_index->Query(query_dataset, conf, nullptr);
AssertAnns(result2, nq, k);
// PrintResult(re_result, nq, k);
ReleaseQueryResult(result2);
@ -114,9 +114,8 @@ TEST_P(IDMAPTest, idmap_basic) {
for (int64_t i = 0; i < nq; ++i) {
concurrent_bitset_ptr->set(i);
}
index_->SetBlacklist(concurrent_bitset_ptr);
auto result_bs_1 = index_->Query(query_dataset, conf);
auto result_bs_1 = index_->Query(query_dataset, conf, concurrent_bitset_ptr);
AssertAnns(result_bs_1, nq, k, CheckMode::CHECK_NOT_EQUAL);
ReleaseQueryResult(result_bs_1);
@ -154,7 +153,7 @@ TEST_P(IDMAPTest, idmap_serialize) {
#endif
}
auto re_result = index_->Query(query_dataset, conf);
auto re_result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(re_result, nq, k);
// PrintResult(re_result, nq, k);
ReleaseQueryResult(re_result);
@ -174,7 +173,7 @@ TEST_P(IDMAPTest, idmap_serialize) {
index_->Load(binaryset);
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -194,7 +193,7 @@ TEST_P(IDMAPTest, idmap_copy) {
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
ASSERT_TRUE(index_->GetRawVectors() != nullptr);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -210,7 +209,7 @@ TEST_P(IDMAPTest, idmap_copy) {
// cpu to gpu
ASSERT_ANY_THROW(milvus::knowhere::cloner::CopyCpuToGpu(index_, -1, conf));
auto clone_index = milvus::knowhere::cloner::CopyCpuToGpu(index_, DEVICEID, conf);
auto clone_result = clone_index->Query(query_dataset, conf);
auto clone_result = clone_index->Query(query_dataset, conf, nullptr);
AssertAnns(clone_result, nq, k);
ReleaseQueryResult(clone_result);
ASSERT_THROW({ std::static_pointer_cast<milvus::knowhere::GPUIDMAP>(clone_index)->GetRawVectors(); },
@ -223,7 +222,7 @@ TEST_P(IDMAPTest, idmap_copy) {
auto binary = clone_index->Serialize();
clone_index->Load(binary);
auto new_result = clone_index->Query(query_dataset, conf);
auto new_result = clone_index->Query(query_dataset, conf, nullptr);
AssertAnns(new_result, nq, k);
ReleaseQueryResult(new_result);
@ -233,7 +232,7 @@ TEST_P(IDMAPTest, idmap_copy) {
// gpu to cpu
auto host_index = milvus::knowhere::cloner::CopyGpuToCpu(clone_index, conf);
auto host_result = host_index->Query(query_dataset, conf);
auto host_result = host_index->Query(query_dataset, conf, nullptr);
AssertAnns(host_result, nq, k);
ReleaseQueryResult(host_result);
ASSERT_TRUE(std::static_pointer_cast<milvus::knowhere::IDMAP>(host_index)->GetRawVectors() != nullptr);
@ -242,7 +241,7 @@ TEST_P(IDMAPTest, idmap_copy) {
auto device_index = milvus::knowhere::cloner::CopyCpuToGpu(index_, DEVICEID, conf);
auto new_device_index =
std::static_pointer_cast<milvus::knowhere::GPUIDMAP>(device_index)->CopyGpuToGpu(DEVICEID, conf);
auto device_result = new_device_index->Query(query_dataset, conf);
auto device_result = new_device_index->Query(query_dataset, conf, nullptr);
AssertAnns(device_result, nq, k);
ReleaseQueryResult(device_result);
}

View File

@ -105,7 +105,7 @@ TEST_P(IVFTest, ivf_basic_cpu) {
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
auto result = index_->Query(query_dataset, conf_);
auto result = index_->Query(query_dataset, conf_, nullptr);
AssertAnns(result, nq, k);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -129,9 +129,8 @@ TEST_P(IVFTest, ivf_basic_cpu) {
for (int64_t i = 0; i < nq; ++i) {
concurrent_bitset_ptr->set(i);
}
index_->SetBlacklist(concurrent_bitset_ptr);
auto result_bs_1 = index_->Query(query_dataset, conf_);
auto result_bs_1 = index_->Query(query_dataset, conf_, concurrent_bitset_ptr);
AssertAnns(result_bs_1, nq, k, CheckMode::CHECK_NOT_EQUAL);
// PrintResult(result, nq, k);
ReleaseQueryResult(result_bs_1);
@ -165,7 +164,7 @@ TEST_P(IVFTest, ivf_basic_gpu) {
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
auto result = index_->Query(query_dataset, conf_);
auto result = index_->Query(query_dataset, conf_, nullptr);
AssertAnns(result, nq, k);
// PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -174,9 +173,8 @@ TEST_P(IVFTest, ivf_basic_gpu) {
for (int64_t i = 0; i < nq; ++i) {
concurrent_bitset_ptr->set(i);
}
index_->SetBlacklist(concurrent_bitset_ptr);
auto result_bs_1 = index_->Query(query_dataset, conf_);
auto result_bs_1 = index_->Query(query_dataset, conf_, concurrent_bitset_ptr);
AssertAnns(result_bs_1, nq, k, CheckMode::CHECK_NOT_EQUAL);
// PrintResult(result, nq, k);
ReleaseQueryResult(result_bs_1);
@ -214,7 +212,7 @@ TEST_P(IVFTest, ivf_serialize) {
index_->Load(binaryset);
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->Dim(), dim);
auto result = index_->Query(query_dataset, conf_);
auto result = index_->Query(query_dataset, conf_, nullptr);
AssertAnns(result, nq, conf_[milvus::knowhere::meta::TOPK]);
ReleaseQueryResult(result);
}
@ -233,7 +231,7 @@ TEST_P(IVFTest, clone_test) {
/* set peseodo index size, avoid throw exception */
index_->SetIndexSize(nq * dim * sizeof(float));
auto result = index_->Query(query_dataset, conf_);
auto result = index_->Query(query_dataset, conf_, nullptr);
AssertAnns(result, nq, conf_[milvus::knowhere::meta::TOPK]);
// PrintResult(result, nq, k);
@ -273,7 +271,7 @@ TEST_P(IVFTest, clone_test) {
if (index_mode_ == milvus::knowhere::IndexMode::MODE_GPU) {
EXPECT_NO_THROW({
auto clone_index = milvus::knowhere::cloner::CopyGpuToCpu(index_, milvus::knowhere::Config());
auto clone_result = clone_index->Query(query_dataset, conf_);
auto clone_result = clone_index->Query(query_dataset, conf_, nullptr);
AssertEqual(result, clone_result);
ReleaseQueryResult(clone_result);
std::cout << "clone G <=> C [" << index_type_ << "] success" << std::endl;
@ -293,7 +291,7 @@ TEST_P(IVFTest, clone_test) {
if (index_type_ != milvus::knowhere::IndexEnum::INDEX_FAISS_IVFSQ8H) {
EXPECT_NO_THROW({
auto clone_index = milvus::knowhere::cloner::CopyCpuToGpu(index_, DEVICEID, milvus::knowhere::Config());
auto clone_result = clone_index->Query(query_dataset, conf_);
auto clone_result = clone_index->Query(query_dataset, conf_, nullptr);
AssertEqual(result, clone_result);
ReleaseQueryResult(clone_result);
std::cout << "clone C <=> G [" << index_type_ << "] success" << std::endl;
@ -313,7 +311,7 @@ TEST_P(IVFTest, gpu_seal_test) {
}
assert(!xb.empty());
ASSERT_ANY_THROW(index_->Query(query_dataset, conf_));
ASSERT_ANY_THROW(index_->Query(query_dataset, conf_, nullptr));
ASSERT_ANY_THROW(index_->Seal());
index_->Train(base_dataset, conf_);
@ -324,16 +322,16 @@ TEST_P(IVFTest, gpu_seal_test) {
/* set peseodo index size, avoid throw exception */
index_->SetIndexSize(nq * dim * sizeof(float));
auto result = index_->Query(query_dataset, conf_);
auto result = index_->Query(query_dataset, conf_, nullptr);
AssertAnns(result, nq, conf_[milvus::knowhere::meta::TOPK]);
ReleaseQueryResult(result);
fiu_init(0);
fiu_enable("IVF.Search.throw_std_exception", 1, nullptr, 0);
ASSERT_ANY_THROW(index_->Query(query_dataset, conf_));
ASSERT_ANY_THROW(index_->Query(query_dataset, conf_, nullptr));
fiu_disable("IVF.Search.throw_std_exception");
fiu_enable("IVF.Search.throw_faiss_exception", 1, nullptr, 0);
ASSERT_ANY_THROW(index_->Query(query_dataset, conf_));
ASSERT_ANY_THROW(index_->Query(query_dataset, conf_, nullptr));
fiu_disable("IVF.Search.throw_faiss_exception");
auto cpu_idx = milvus::knowhere::cloner::CopyGpuToCpu(index_, milvus::knowhere::Config());
@ -374,7 +372,7 @@ TEST_P(IVFTest, invalid_gpu_source) {
fiu_disable("GPUIVF.SerializeImpl.throw_exception");
fiu_enable("GPUIVF.search_impl.invald_index", 1, nullptr, 0);
ASSERT_ANY_THROW(index_->Query(base_dataset, invalid_conf));
ASSERT_ANY_THROW(index_->Query(base_dataset, invalid_conf, nullptr));
fiu_disable("GPUIVF.search_impl.invald_index");
auto ivf_index = std::dynamic_pointer_cast<milvus::knowhere::GPUIVF>(index_);

View File

@ -81,13 +81,13 @@ TEST_F(NSGInterfaceTest, basic_test) {
// untrained index
{
ASSERT_ANY_THROW(index_->Serialize());
ASSERT_ANY_THROW(index_->Query(query_dataset, search_conf));
ASSERT_ANY_THROW(index_->Query(query_dataset, search_conf, nullptr));
ASSERT_ANY_THROW(index_->AddWithoutIds(base_dataset, search_conf));
}
train_conf[milvus::knowhere::meta::DEVICEID] = -1;
index_->BuildAll(base_dataset, train_conf);
auto result = index_->Query(query_dataset, search_conf);
auto result = index_->Query(query_dataset, search_conf, nullptr);
AssertAnns(result, nq, k);
ReleaseQueryResult(result);
@ -102,7 +102,7 @@ TEST_F(NSGInterfaceTest, basic_test) {
auto new_index_1 = std::make_shared<milvus::knowhere::NSG>(DEVICE_GPU0);
train_conf[milvus::knowhere::meta::DEVICEID] = DEVICE_GPU0;
new_index_1->BuildAll(base_dataset, train_conf);
auto new_result_1 = new_index_1->Query(query_dataset, search_conf);
auto new_result_1 = new_index_1->Query(query_dataset, search_conf, nullptr);
AssertAnns(new_result_1, nq, k);
ReleaseQueryResult(new_result_1);
@ -115,7 +115,7 @@ TEST_F(NSGInterfaceTest, basic_test) {
fiu_disable("NSG.Load.throw_exception");
}
auto new_result_2 = new_index_2->Query(query_dataset, search_conf);
auto new_result_2 = new_index_2->Query(query_dataset, search_conf, nullptr);
AssertAnns(new_result_2, nq, k);
ReleaseQueryResult(new_result_2);
@ -144,7 +144,7 @@ TEST_F(NSGInterfaceTest, delete_test) {
train_conf[milvus::knowhere::meta::DEVICEID] = DEVICE_GPU0;
index_->BuildAll(base_dataset, train_conf);
auto result = index_->Query(query_dataset, search_conf);
auto result = index_->Query(query_dataset, search_conf, nullptr);
AssertAnns(result, nq, k);
auto I_before = result->Get<int64_t*>(milvus::knowhere::meta::IDS);
@ -156,8 +156,7 @@ TEST_F(NSGInterfaceTest, delete_test) {
for (int i = 0; i < nq; i++) {
bitset->set(i);
}
index_->SetBlacklist(bitset);
auto result_after = index_->Query(query_dataset, search_conf);
auto result_after = index_->Query(query_dataset, search_conf, bitset);
AssertAnns(result_after, nq, k, CheckMode::CHECK_NOT_EQUAL);
auto I_after = result_after->Get<int64_t*>(milvus::knowhere::meta::IDS);

View File

@ -65,7 +65,7 @@ TEST_P(SPTAGTest, sptag_basic) {
index_->BuildAll(base_dataset, conf);
// index_->Add(base_dataset, conf);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
ReleaseQueryResult(result);
@ -98,7 +98,7 @@ TEST_P(SPTAGTest, sptag_serialize) {
auto binaryset = index_->Serialize();
auto new_index = std::make_shared<milvus::knowhere::CPUSPTAGRNG>(IndexType);
new_index->Load(binaryset);
auto result = new_index->Query(query_dataset, conf);
auto result = new_index->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -135,7 +135,7 @@ TEST_P(SPTAGTest, sptag_serialize) {
auto new_index = std::make_shared<milvus::knowhere::CPUSPTAGRNG>(IndexType);
new_index->Load(load_data_list);
auto result = new_index->Query(query_dataset, conf);
auto result = new_index->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, k);
PrintResult(result, nq, k);
ReleaseQueryResult(result);

View File

@ -82,7 +82,7 @@ TEST_P(VecIndexTest, basic) {
EXPECT_EQ(index_->index_type(), index_type_);
EXPECT_EQ(index_->index_mode(), index_mode_);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, conf[milvus::knowhere::meta::TOPK]);
PrintResult(result, nq, k);
ReleaseQueryResult(result);
@ -94,7 +94,7 @@ TEST_P(VecIndexTest, serialize) {
EXPECT_EQ(index_->Count(), nb);
EXPECT_EQ(index_->index_type(), index_type_);
EXPECT_EQ(index_->index_mode(), index_mode_);
auto result = index_->Query(query_dataset, conf);
auto result = index_->Query(query_dataset, conf, nullptr);
AssertAnns(result, nq, conf[milvus::knowhere::meta::TOPK]);
ReleaseQueryResult(result);
@ -105,7 +105,7 @@ TEST_P(VecIndexTest, serialize) {
EXPECT_EQ(index_->Count(), new_index->Count());
EXPECT_EQ(index_->index_type(), new_index->index_type());
EXPECT_EQ(index_->index_mode(), new_index->index_mode());
auto new_result = new_index_->Query(query_dataset, conf);
auto new_result = new_index_->Query(query_dataset, conf, nullptr);
AssertAnns(new_result, nq, conf[milvus::knowhere::meta::TOPK]);
ReleaseQueryResult(new_result);
}

View File

@ -29,8 +29,6 @@ print_help(const std::string& app_name) {
std::cout << " Options:" << std::endl;
std::cout << " -h --help Print this help." << std::endl;
std::cout << " -c --conf_file filename Read configuration from the file." << std::endl;
std::cout << " -d --daemon Daemonize this application." << std::endl;
std::cout << " -p --pid_file filename PID file used by daemonized app." << std::endl;
std::cout << std::endl;
}
@ -102,16 +100,6 @@ main(int argc, char* argv[]) {
std::cout << "Loading configuration from: " << config_filename << std::endl;
break;
}
case 'p': {
char* pid_filename_ptr = strdup(optarg);
pid_filename = pid_filename_ptr;
free(pid_filename_ptr);
std::cout << pid_filename << std::endl;
break;
}
case 'd':
start_daemonized = 1;
break;
case 'h':
print_help(app_name);
return EXIT_SUCCESS;
@ -132,7 +120,7 @@ main(int argc, char* argv[]) {
signal(SIGUSR2, milvus::server::SignalUtil::HandleSignal);
signal(SIGTERM, milvus::server::SignalUtil::HandleSignal);
server.Init(start_daemonized, pid_filename, config_filename);
server.Init(config_filename);
s = server.Start();
if (s.ok()) {

View File

@ -98,7 +98,7 @@ JobMgr::worker_function() {
engine::utils::GetParentPath(location, segment_dir);
segment::SegmentReader segment_reader(segment_dir);
segment::IdBloomFilterPtr id_bloom_filter_ptr;
segment_reader.LoadBloomFilter(id_bloom_filter_ptr);
segment_reader.LoadBloomFilter(id_bloom_filter_ptr, false);
// Check if the id is present.
bool pass = true;

View File

@ -61,15 +61,12 @@ FaissFlatPass::Run(const TaskPtr& task) {
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissFlatPass: nq < gpu_search_threshold, specify cpu to search!",
"search", 0);
res_ptr = ResMgrInst::GetInstance()->GetResource("cpu");
} else if (search_job->topk() > milvus::server::GPU_QUERY_MAX_NPROBE) {
} else if (search_job->topk() > server::GPU_QUERY_MAX_NPROBE) {
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissFlatPass: topk > gpu_nprobe_threshold, specify cpu to search!",
"search", 0);
res_ptr = ResMgrInst::GetInstance()->GetResource("cpu");
} else {
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissFlatPass: nq >= gpu_search_threshold, specify gpu %d to search!",
"search", 0, search_gpus_[idx_]);
res_ptr = ResMgrInst::GetInstance()->GetResource(ResourceType::GPU, search_gpus_[idx_]);
idx_ = (idx_ + 1) % search_gpus_.size();
res_ptr = PickResource(task, search_gpus_, idx_, "FaissFlatPass");
}
auto label = std::make_shared<SpecResLabel>(res_ptr);
task->label() = label;

View File

@ -86,20 +86,17 @@ FaissIVFPass::Run(const TaskPtr& task) {
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissIVFPass: nq < gpu_search_threshold, specify cpu to search!",
"search", 0);
res_ptr = ResMgrInst::GetInstance()->GetResource("cpu");
} else if (search_job->topk() > milvus::server::GPU_QUERY_MAX_TOPK) {
} else if (search_job->topk() > server::GPU_QUERY_MAX_TOPK) {
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissIVFPass: topk > gpu_topk_threshold, specify cpu to search!",
"search", 0);
res_ptr = ResMgrInst::GetInstance()->GetResource("cpu");
} else if (search_job->extra_params()[knowhere::IndexParams::nprobe].get<int64_t>() >
milvus::server::GPU_QUERY_MAX_NPROBE) {
server::GPU_QUERY_MAX_NPROBE) {
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissIVFPass: nprobe > gpu_nprobe_threshold, specify cpu to search!",
"search", 0);
res_ptr = ResMgrInst::GetInstance()->GetResource("cpu");
} else {
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissIVFPass: nq >= gpu_search_threshold, specify gpu %d to search!",
"search", 0, search_gpus_[idx_]);
res_ptr = ResMgrInst::GetInstance()->GetResource(ResourceType::GPU, search_gpus_[idx_]);
idx_ = (idx_ + 1) % search_gpus_.size();
res_ptr = PickResource(task, search_gpus_, idx_, "FaissIVFPass");
}
#else
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissIVFPass: fpga disable, specify cpu to search!", "search", 0);

View File

@ -58,12 +58,12 @@ FaissIVFSQ8HPass::Run(const TaskPtr& task) {
if (!gpu_enable_) {
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissIVFSQ8HPass: gpu disable, specify cpu to search!", "search", 0);
res_ptr = ResMgrInst::GetInstance()->GetResource("cpu");
} else if (search_job->topk() > milvus::server::GPU_QUERY_MAX_TOPK) {
} else if (search_job->topk() > server::GPU_QUERY_MAX_TOPK) {
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissIVFSQ8HPass: topk > gpu_topk_threshold, specify cpu to search!",
"search", 0);
res_ptr = ResMgrInst::GetInstance()->GetResource("cpu");
} else if (search_job->extra_params()[knowhere::IndexParams::nprobe].get<int64_t>() >
milvus::server::GPU_QUERY_MAX_NPROBE) {
server::GPU_QUERY_MAX_NPROBE) {
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissIVFSQ8HPass: nprobe > gpu_nprobe_threshold, specify cpu to search!",
"search", 0);
res_ptr = ResMgrInst::GetInstance()->GetResource("cpu");
@ -73,10 +73,7 @@ FaissIVFSQ8HPass::Run(const TaskPtr& task) {
res_ptr = ResMgrInst::GetInstance()->GetResource("cpu");
hybrid = true;
} else {
LOG_SERVER_DEBUG_ << LogOut("[%s][%d] FaissIVFSQ8HPass: nq >= gpu_search_threshold, specify gpu %d to search!",
"search", 0, search_gpus_[idx_]);
res_ptr = ResMgrInst::GetInstance()->GetResource(ResourceType::GPU, search_gpus_[idx_]);
idx_ = (idx_ + 1) % search_gpus_.size();
res_ptr = PickResource(task, search_gpus_, idx_, "FaissIVFSQ8HPass");
}
auto label = std::make_shared<SpecResLabel>(res_ptr, hybrid);
task->label() = label;

View File

@ -0,0 +1,57 @@
// Copyright (C) 2019-2020 Zilliz. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#ifdef MILVUS_GPU_VERSION
#include "src/scheduler/selector/Pass.h"
#include <iostream>
#include <string>
#include <vector>
#include "cache/GpuCacheMgr.h"
#include "scheduler/resource/Resource.h"
#include "scheduler/task/SearchTask.h"
#include "src/scheduler/SchedInst.h"
#include "src/utils/Log.h"
namespace milvus {
namespace scheduler {
int64_t
FindProperDevice(const std::vector<int64_t>& device_ids, const std::string& key) {
for (auto& device_id : device_ids) {
auto gpu_cache = milvus::cache::GpuCacheMgr::GetInstance(device_id);
if (gpu_cache->ItemExists(key))
return device_id;
}
return -1;
}
ResourcePtr
PickResource(const TaskPtr& task, const std::vector<int64_t>& device_ids, int64_t& idx, std::string name) {
auto search_task = std::static_pointer_cast<XSearchTask>(task);
auto did = FindProperDevice(device_ids, search_task->GetLocation());
ResourcePtr res_ptr = nullptr;
if (did < 0) {
LOG_SERVER_DEBUG_ << "No cache hit on gpu devices";
LOG_SERVER_DEBUG_ << LogOut("%s: nq >= gpu_search_threshold, specify gpu %d to search!", name.c_str(),
device_ids[idx]);
res_ptr = scheduler::ResMgrInst::GetInstance()->GetResource(ResourceType::GPU, (uint64_t)device_ids[idx]);
idx = (idx + 1) % device_ids.size();
} else {
LOG_SERVER_DEBUG_ << LogOut("Gpu cache hit on device %d", did);
LOG_SERVER_DEBUG_ << LogOut("%s: nq >= gpu_search_threshold, specify gpu %d to search!", name.c_str(), did);
res_ptr = ResMgrInst::GetInstance()->GetResource(ResourceType::GPU, (uint64_t)did);
}
return res_ptr;
}
} // namespace scheduler
} // namespace milvus
#endif

View File

@ -21,6 +21,7 @@
#include <unordered_map>
#include <vector>
#include "scheduler/resource/Resource.h"
#include "scheduler/task/Task.h"
namespace milvus {
@ -36,5 +37,11 @@ class Pass {
};
using PassPtr = std::shared_ptr<Pass>;
#ifdef MILVUS_GPU_VERSION
int64_t
FindProperDevice(const std::vector<int64_t>& device_ids, const std::string& key);
ResourcePtr
PickResource(const TaskPtr& task, const std::vector<int64_t>& device_ids, int64_t& idx, std::string name);
#endif
} // namespace scheduler
} // namespace milvus

View File

@ -46,7 +46,7 @@ XBuildIndexTask::XBuildIndexTask(SegmentSchemaPtr file, TaskLabelPtr label)
auto json = milvus::json::parse(file_->index_params_);
to_index_engine_ = EngineFactory::Build(file_->dimension_, file_->location_, engine_type,
(MetricType)file_->metric_type_, json);
(MetricType)file_->metric_type_, json, file_->updated_time_);
}
}
@ -62,7 +62,7 @@ XBuildIndexTask::Load(milvus::scheduler::LoadType type, uint8_t device_id) {
auto options = build_index_job->options();
try {
if (type == LoadType::DISK2CPU) {
stat = to_index_engine_->Load(options.insert_cache_immediately_);
stat = to_index_engine_->Load(false, options.insert_cache_immediately_);
type_str = "DISK2CPU";
} else if (type == LoadType::CPU2GPU) {
stat = to_index_engine_->CopyToIndexFileToGpu(device_id);

View File

@ -127,7 +127,7 @@ XSearchTask::XSearchTask(const std::shared_ptr<server::Context>& context, Segmen
}
index_engine_ = EngineFactory::Build(file_->dimension_, file_->location_, engine_type,
(MetricType)file_->metric_type_, json_params);
(MetricType)file_->metric_type_, json_params, file_->updated_time_);
}
}
@ -143,7 +143,7 @@ XSearchTask::Load(LoadType type, uint8_t device_id) {
try {
fiu_do_on("XSearchTask.Load.throw_std_exception", throw std::exception());
if (type == LoadType::DISK2CPU) {
stat = index_engine_->Load();
stat = index_engine_->Load(true);
type_str = "DISK2CPU";
} else if (type == LoadType::CPU2GPU) {
bool hybrid = false;

View File

@ -19,6 +19,7 @@
#include "utils/Log.h"
#include "utils/Status.h"
#include <algorithm>
#include <string>
namespace milvus {
@ -66,6 +67,25 @@ IdBloomFilter::Add(const std::vector<doc_id_t>& uids) {
return Status::OK();
}
Status
IdBloomFilter::Add(const std::vector<doc_id_t>& uids, std::vector<offset_t>& delete_docs) {
std::sort(delete_docs.begin(), delete_docs.end());
for (offset_t i = 0, j = 0; i < uids.size();) {
if (j < delete_docs.size() && i >= delete_docs[j]) {
j++;
continue;
}
auto uid = uids[i++];
std::string s = std::to_string(uid);
if (scaling_bloom_add(bloom_filter_, s.c_str(), s.size(), uid) == -1) {
// Counter overflow does not affect bloom filter's normal functionality
LOG_ENGINE_WARNING_ << "Warning adding id=" << s << " to bloom filter: 4 bit counter Overflow";
}
}
return Status::OK();
}
Status
IdBloomFilter::Remove(doc_id_t uid) {
if (bloom_filter_ == nullptr) {

View File

@ -23,6 +23,7 @@
#include "cache/DataObj.h"
#include "dablooms/dablooms.h"
#include "segment/DeletedDocs.h"
#include "utils/Status.h"
namespace milvus {
@ -45,6 +46,9 @@ class IdBloomFilter : public cache::DataObj {
Status
Add(const std::vector<doc_id_t>& uids);
Status
Add(const std::vector<doc_id_t>& uids, std::vector<offset_t>& delete_docs);
Status
Remove(doc_id_t uid);

View File

@ -55,13 +55,28 @@ SegmentReader::Load() {
}
Status
SegmentReader::LoadVectors(off_t offset, size_t num_bytes, std::vector<uint8_t>& raw_vectors) {
SegmentReader::LoadsVectors(VectorsPtr& vectors_ptr) {
codec::DefaultCodec default_codec;
try {
fs_ptr_->operation_ptr_->CreateDirectory();
vectors_ptr = std::make_shared<Vectors>();
default_codec.GetVectorsFormat()->read(fs_ptr_, vectors_ptr);
} catch (std::exception& e) {
std::string err_msg = "Failed to load raw vectors: " + std::string(e.what());
LOG_ENGINE_ERROR_ << err_msg;
return Status(DB_ERROR, e.what());
}
return Status::OK();
}
Status
SegmentReader::LoadsSingleVector(off_t offset, size_t num_bytes, std::vector<uint8_t>& raw_vectors) {
codec::DefaultCodec default_codec;
try {
fs_ptr_->operation_ptr_->CreateDirectory();
default_codec.GetVectorsFormat()->read_vectors(fs_ptr_, offset, num_bytes, raw_vectors);
} catch (std::exception& e) {
std::string err_msg = "Failed to load raw vectors: " + std::string(e.what());
std::string err_msg = "Failed to load single vector: " + std::string(e.what());
LOG_ENGINE_ERROR_ << err_msg;
return Status(DB_ERROR, err_msg);
}
@ -104,7 +119,7 @@ SegmentReader::LoadVectorIndex(const std::string& location, segment::VectorIndex
}
Status
SegmentReader::LoadBloomFilter(segment::IdBloomFilterPtr& id_bloom_filter_ptr) {
SegmentReader::LoadBloomFilter(segment::IdBloomFilterPtr& id_bloom_filter_ptr, bool cache_force) {
codec::DefaultCodec default_codec;
try {
// load id_bloom_filter from cache
@ -117,7 +132,11 @@ SegmentReader::LoadBloomFilter(segment::IdBloomFilterPtr& id_bloom_filter_ptr) {
default_codec.GetIdBloomFilterFormat()->read(fs_ptr_, id_bloom_filter_ptr);
// add id_bloom_filter into cache
cache::CpuCacheMgr::GetInstance()->InsertItem(cache_key, id_bloom_filter_ptr);
if (cache_force) {
cache::CpuCacheMgr::GetInstance()->InsertItem(cache_key, id_bloom_filter_ptr);
} else {
cache::CpuCacheMgr::GetInstance()->InsertItemIfNotExist(cache_key, id_bloom_filter_ptr);
}
}
} catch (std::exception& e) {
std::string err_msg = "Failed to load bloom filter: " + std::string(e.what());

View File

@ -40,7 +40,10 @@ class SegmentReader {
Load();
Status
LoadVectors(off_t offset, size_t num_bytes, std::vector<uint8_t>& raw_vectors);
LoadsVectors(VectorsPtr& vectors_ptr);
Status
LoadsSingleVector(off_t offset, size_t num_bytes, std::vector<uint8_t>& raw_vectors);
Status
LoadUids(UidsPtr& uids);
@ -49,7 +52,7 @@ class SegmentReader {
LoadVectorIndex(const std::string& location, segment::VectorIndexPtr& vector_index_ptr);
Status
LoadBloomFilter(segment::IdBloomFilterPtr& id_bloom_filter_ptr);
LoadBloomFilter(segment::IdBloomFilterPtr& id_bloom_filter_ptr, bool cache_force);
Status
LoadDeletedDocs(segment::DeletedDocsPtr& deleted_docs_ptr);

View File

@ -20,6 +20,7 @@
#include "config/Config.h"
#include "db/DBFactory.h"
#include "index/knowhere/knowhere/index/vector_index/helpers/FaissIO.h"
#include "utils/CommonUtil.h"
#include "utils/Log.h"
#include "utils/StringHelpFunctions.h"
@ -82,7 +83,6 @@ DBWrapper::StartService() {
}
opt.insert_buffer_size_ = insert_buffer_size;
#if 1
bool cluster_enable = false;
std::string cluster_role;
STATUS_CHECK(config.GetClusterConfigEnable(cluster_enable));
@ -98,27 +98,6 @@ DBWrapper::StartService() {
kill(0, SIGUSR1);
}
#else
std::string mode;
s = config.GetServerConfigDeployMode(mode);
if (!s.ok()) {
std::cerr << s.ToString() << std::endl;
return s;
}
if (mode == "single") {
opt.mode_ = engine::DBOptions::MODE::SINGLE;
} else if (mode == "cluster_readonly") {
opt.mode_ = engine::DBOptions::MODE::CLUSTER_READONLY;
} else if (mode == "cluster_writable") {
opt.mode_ = engine::DBOptions::MODE::CLUSTER_WRITABLE;
} else {
std::cerr << "Error: server_config.deploy_mode in server_config.yaml is not one of "
<< "single, cluster_readonly, and cluster_writable." << std::endl;
kill(0, SIGUSR1);
}
#endif
// get wal configurations
s = config.GetWalConfigEnable(opt.wal_enable_);
if (!s.ok()) {
@ -163,7 +142,6 @@ DBWrapper::StartService() {
if (omp_thread > 0) {
omp_set_num_threads(omp_thread);
LOG_SERVER_DEBUG_ << "Specify openmp thread number: " << omp_thread;
} else {
int64_t sys_thread_cnt = 8;
if (CommonUtil::GetSystemAvailableThreads(sys_thread_cnt)) {
@ -171,6 +149,7 @@ DBWrapper::StartService() {
omp_set_num_threads(omp_thread);
}
}
LOG_SERVER_DEBUG_ << "Specify openmp thread number: " << omp_thread;
// init faiss global variable
int64_t use_blas_threshold;
@ -208,29 +187,15 @@ DBWrapper::StartService() {
// create db root folder
s = CommonUtil::CreateDirectory(opt.meta_.path_);
if (!s.ok()) {
std::cerr << "Error: Failed to create database primary path: " << path
<< ". Possible reason: db_config.primary_path is wrong in server_config.yaml or not available."
<< std::endl;
std::cerr << "Error: Failed to create database path: " << path << std::endl;
kill(0, SIGUSR1);
}
for (auto& path : opt.meta_.slave_paths_) {
s = CommonUtil::CreateDirectory(path);
if (!s.ok()) {
std::cerr << "Error: Failed to create database secondary path: " << path
<< ". Possible reason: db_config.secondary_path is wrong in server_config.yaml or not available."
<< std::endl;
kill(0, SIGUSR1);
}
}
// create db instance
try {
db_ = engine::DBFactory::Build(opt);
} catch (std::exception& ex) {
std::cerr << "Error: failed to open database: " << ex.what()
<< ". Possible reason: out of storage, meta schema is damaged "
<< "or created by in-compatible Milvus version." << std::endl;
std::cerr << "Error: Failed to open database: " << ex.what() << std::endl;
kill(0, SIGUSR1);
}
@ -251,6 +216,12 @@ DBWrapper::StartService() {
kill(0, SIGUSR1);
}
bool trace_enable = false;
s = config.GetLogsTraceEnable(trace_enable);
if (s.ok() && trace_enable) {
knowhere::enable_faiss_logging();
}
return Status::OK();
}

View File

@ -52,102 +52,12 @@ Server::GetInstance() {
}
void
Server::Init(int64_t daemonized, const std::string& pid_filename, const std::string& config_filename) {
daemonized_ = daemonized;
pid_filename_ = pid_filename;
Server::Init(const std::string& config_filename) {
config_filename_ = config_filename;
}
void
Server::Daemonize() {
if (daemonized_ == 0) {
return;
}
std::cout << "Milvus server run in daemonize mode";
pid_t pid = 0;
// Fork off the parent process
pid = fork();
// An error occurred
if (pid < 0) {
exit(EXIT_FAILURE);
}
// Success: terminate parent
if (pid > 0) {
exit(EXIT_SUCCESS);
}
// On success: The child process becomes session leader
if (setsid() < 0) {
exit(EXIT_FAILURE);
}
// Ignore signal sent from child to parent process
signal(SIGCHLD, SIG_IGN);
// Fork off for the second time
pid = fork();
// An error occurred
if (pid < 0) {
exit(EXIT_FAILURE);
}
// Terminate the parent
if (pid > 0) {
exit(EXIT_SUCCESS);
}
// Set new file permissions
umask(0);
// Change the working directory to root
int ret = chdir("/");
if (ret != 0) {
return;
}
// Close all open fd
for (int64_t fd = sysconf(_SC_OPEN_MAX); fd > 0; fd--) {
close(fd);
}
std::cout << "Redirect stdin/stdout/stderr to /dev/null";
// Redirect stdin/stdout/stderr to /dev/null
stdin = fopen("/dev/null", "r");
stdout = fopen("/dev/null", "w+");
stderr = fopen("/dev/null", "w+");
// Try to write PID of daemon to lockfile
if (!pid_filename_.empty()) {
pid_fd_ = open(pid_filename_.c_str(), O_RDWR | O_CREAT, 0640);
if (pid_fd_ < 0) {
std::cerr << "Can't open filename: " + pid_filename_ + ", Error: " + strerror(errno);
exit(EXIT_FAILURE);
}
if (lockf(pid_fd_, F_TLOCK, 0) < 0) {
std::cerr << "Can't lock filename: " + pid_filename_ + ", Error: " + strerror(errno);
exit(EXIT_FAILURE);
}
std::string pid_file_context = std::to_string(getpid());
ssize_t res = write(pid_fd_, pid_file_context.c_str(), pid_file_context.size());
if (res != 0) {
return;
}
}
}
Status
Server::Start() {
if (daemonized_ != 0) {
Daemonize();
}
try {
/* Read config file */
Status s = LoadConfig();
@ -161,8 +71,8 @@ Server::Start() {
std::string meta_uri;
STATUS_CHECK(config.GetGeneralConfigMetaURI(meta_uri));
if (meta_uri.length() > 6 && strcasecmp("sqlite", meta_uri.substr(0, 6).c_str()) == 0) {
std::cout << "WARNNING: You are using SQLite as the meta data management, "
"which can't be used in production. Please change it to MySQL!"
std::cout << "NOTICE: You are using SQLite as the meta data management. "
"We recommend change it to MySQL."
<< std::endl;
}
@ -327,29 +237,6 @@ Server::Stop() {
}
#endif
/* Unlock and close lockfile */
if (pid_fd_ != -1) {
int ret = lockf(pid_fd_, F_ULOCK, 0);
if (ret != 0) {
std::cerr << "ERROR: Can't lock file: " << strerror(errno) << std::endl;
exit(0);
}
ret = close(pid_fd_);
if (ret != 0) {
std::cerr << "ERROR: Can't close file: " << strerror(errno) << std::endl;
exit(0);
}
}
/* delete lockfile */
if (!pid_filename_.empty()) {
int ret = unlink(pid_filename_.c_str());
if (ret != 0) {
std::cerr << "ERROR: Can't unlink file: " << strerror(errno) << std::endl;
exit(0);
}
}
StopService();
std::cerr << "Milvus server exit..." << std::endl;

View File

@ -23,7 +23,7 @@ class Server {
GetInstance();
void
Init(int64_t daemonized, const std::string& pid_filename, const std::string& config_filename);
Init(const std::string& config_filename);
Status
Start();
@ -34,9 +34,6 @@ class Server {
Server() = default;
~Server() = default;
void
Daemonize();
Status
LoadConfig();
@ -46,9 +43,6 @@ class Server {
StopService();
private:
int64_t daemonized_ = 0;
int pid_fd_ = -1;
std::string pid_filename_;
std::string config_filename_;
}; // Server

View File

@ -17,7 +17,6 @@
#include <utility>
#include <vector>
#include "context/HybridSearchContext.h"
#include "query/BooleanQuery.h"
#include "server/delivery/request/BaseRequest.h"
#include "utils/Status.h"

View File

@ -39,7 +39,6 @@ RequestGroup(BaseRequest::RequestType type) {
{BaseRequest::kDeleteByID, DDL_DML_REQUEST_GROUP},
{BaseRequest::kGetVectorByID, INFO_REQUEST_GROUP},
{BaseRequest::kGetVectorIDs, INFO_REQUEST_GROUP},
{BaseRequest::kInsertEntity, DDL_DML_REQUEST_GROUP},
// collection operations
{BaseRequest::kShowCollections, INFO_REQUEST_GROUP},
@ -51,8 +50,6 @@ RequestGroup(BaseRequest::RequestType type) {
{BaseRequest::kDropCollection, DDL_DML_REQUEST_GROUP},
{BaseRequest::kPreloadCollection, DQL_REQUEST_GROUP},
{BaseRequest::kReleaseCollection, DQL_REQUEST_GROUP},
{BaseRequest::kCreateHybridCollection, DDL_DML_REQUEST_GROUP},
{BaseRequest::kDescribeHybridCollection, INFO_REQUEST_GROUP},
{BaseRequest::kReloadSegments, DQL_REQUEST_GROUP},
// partition operations
@ -69,7 +66,6 @@ RequestGroup(BaseRequest::RequestType type) {
{BaseRequest::kSearchByID, DQL_REQUEST_GROUP},
{BaseRequest::kSearch, DQL_REQUEST_GROUP},
{BaseRequest::kSearchCombine, DQL_REQUEST_GROUP},
{BaseRequest::kHybridSearch, DQL_REQUEST_GROUP},
};
auto iter = s_map_type_group.find(type);
@ -112,8 +108,8 @@ BaseRequest::Execute() {
Status
BaseRequest::PostExecute() {
status_ = OnPostExecute();
return status_;
// not allow assign status_ here, because PostExecute() and Execute() are running on different threads
return OnPostExecute();
}
Status
@ -148,6 +144,13 @@ BaseRequest::CollectionNotExistMsg(const std::string& collection_name) {
"You also can check whether the collection name exists.";
}
std::string
BaseRequest::PartitionNotExistMsg(const std::string& collection_name, const std::string& partition_tag) {
return "Collection " + collection_name + " partition_tag " + partition_tag +
" does not exist. Use milvus.partition to verify whether the partition exists. "
"You also can check whether the partition name exists.";
}
Status
BaseRequest::WaitToFinish() {
std::unique_lock<std::mutex> lock(finish_mtx_);

View File

@ -117,7 +117,6 @@ class BaseRequest {
kDeleteByID,
kGetVectorByID,
kGetVectorIDs,
kInsertEntity,
// collection operations
kShowCollections = 300,
@ -128,9 +127,6 @@ class BaseRequest {
kShowCollectionInfo,
kDropCollection,
kPreloadCollection,
kCreateHybridCollection,
kHasHybridCollection,
kDescribeHybridCollection,
kReloadSegments,
kReleaseCollection,
@ -148,7 +144,6 @@ class BaseRequest {
kSearchByID = 600,
kSearch,
kSearchCombine,
kHybridSearch,
};
protected:
@ -209,18 +204,21 @@ class BaseRequest {
std::string
CollectionNotExistMsg(const std::string& collection_name);
std::string
PartitionNotExistMsg(const std::string& collection_name, const std::string& partition_tag);
protected:
const std::shared_ptr<milvus::server::Context> context_;
RequestType type_;
std::string request_group_;
bool async_;
Status status_;
private:
mutable std::mutex finish_mtx_;
std::condition_variable finish_cond_;
bool done_;
Status status_;
public:
const std::shared_ptr<milvus::server::Context>&

View File

@ -55,29 +55,25 @@ DeleteByIDRequest::OnExecute() {
return status;
}
// step 2: check collection existence
engine::meta::CollectionSchema collection_schema;
collection_schema.collection_id_ = collection_name_;
status = DBWrapper::DB()->DescribeCollection(collection_schema);
if (!status.ok()) {
if (status.code() == DB_NOT_FOUND) {
return Status(SERVER_COLLECTION_NOT_EXIST, CollectionNotExistMsg(collection_name_));
} else {
if (!partition_tag_.empty()) {
status = ValidationUtil::ValidatePartitionTags({partition_tag_});
if (!status.ok()) {
return status;
}
} else {
if (!collection_schema.owner_collection_.empty()) {
return Status(SERVER_INVALID_COLLECTION_NAME, CollectionNotExistMsg(collection_name_));
}
}
// Check collection's index type supports delete
if (collection_schema.engine_type_ == (int32_t)engine::EngineType::SPTAG_BKT ||
collection_schema.engine_type_ == (int32_t)engine::EngineType::SPTAG_KDT) {
std::string err_msg =
"Index type " + std::to_string(collection_schema.engine_type_) + " does not support delete operation";
LOG_SERVER_ERROR_ << err_msg;
return Status(SERVER_UNSUPPORTED_ERROR, err_msg);
// step 2: check collection and partition existence
bool has_or_not;
DBWrapper::DB()->HasNativeCollection(collection_name_, has_or_not);
if (!has_or_not) {
return Status(SERVER_COLLECTION_NOT_EXIST, CollectionNotExistMsg(collection_name_));
}
if (!partition_tag_.empty()) {
DBWrapper::DB()->HasPartition(collection_name_, partition_tag_, has_or_not);
if (!has_or_not) {
return Status(SERVER_INVALID_PARTITION_TAG, PartitionNotExistMsg(collection_name_, partition_tag_));
}
}
rc.RecordSection("check validation");

Some files were not shown because too many files have changed in this diff Show More