Delete and WAL feature branch merge (#1436)

* add read/write lock

* change compact to ddl queue

* add api to get vector data

* add flush / merge / compact lock

* add api to get vector data

* add data size for table info

* add db recovery test

* add data_size check

* change file name to uppercase

Signed-off-by: jinhai <hai.jin@zilliz.com>

* update wal flush_merge_compact_mutex_

* update wal flush_merge_compact_mutex_

* change requirement

* change requirement

* upd requirement

* add logging

* add logging

* add logging

* add logging

* add logging

* add logging

* add logging

* add logging

* add logging

* delete part

* add all size checks

* fix bug

* update faiss get_vector_by_id

* add get_vector case

* update get vector by id

* update server

* fix DBImpl

* attempting to fix #1268

* lint

* update unit test

* fix #1259

* issue 1271 fix wal config

* update

* fix cases

Signed-off-by: del.zhenwu <zhenxiang.li@zilliz.com>

* update read / write error message

* update read / write error message

* [skip ci] get vectors by id from raw files instead faiss

* [skip ci] update FilesByType meta

* update

* fix ci error

* update

* lint

* Hide partition_name parameter

* Remove douban pip source

Signed-off-by: zhenwu <zw@zilliz.com>

* Update epsilon value in test cases

Signed-off-by: zhenwu <zw@zilliz.com>

* Add default partition

* Caiyd crud (#1313)

* fix clang format

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fix unittest build error

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add faiss_bitset_test

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* avoid user directly operate partition table

* fix has table bug

* Caiyd crud (#1323)

* fix clang format

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fix unittest build error

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* use compile option -O3

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* update faiss_bitset_test.cpp

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* change open flags

* change OngoingFileChecker to static instance

* mark ongoing files when applying deletes

* update clean up with ttl

* fix centos ci

* update

* lint

* update partition

Signed-off-by: zhenwu <zw@zilliz.com>

* update delete and flush to include partitions

* update

* Update cases

Signed-off-by: zhenwu <zw@zilliz.com>

* Fix test cases crud (#1350)

* fix order

* add wal case

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix wal case

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix wal case

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix wal case

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix invalid operation issue

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix invalid operation issue

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix bug

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix bug

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* crud fix

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* crud fix

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* add table info test cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>
Signed-off-by: JinHai-CN <hai.jin@zilliz.com>

* merge cases

Signed-off-by: zhenwu <zw@zilliz.com>

* Shengjun (#1349)

* Add GPU sharing solution on native Kubernetes  (#1102)

* run hadolint with reviewdog

* add LINCENSE in Dockerfile

* run hadolint with reviewdog

* Reporter of reviewdog command is "github-pr-check"

* format Dockerfile

* ignore DL3007 in hadolint

* clean up old docker images

* Add GPU sharing solution on native Kubernetes

* nightly test mailer

* Fix http server bug (#1096)

* refactoring(create_table done)

* refactoring

* refactor server delivery (insert done)

* refactoring server module (count_table done)

* server refactor done

* cmake pass

* refactor server module done.

* set grpc response status correctly

* format done.

* fix redefine ErrorMap()

* optimize insert reducing ids data copy

* optimize grpc request with reducing data copy

* clang format

* [skip ci] Refactor server module done. update changlog. prepare for PR

* remove explicit and change int32_t to int64_t

* add web server

* [skip ci] add license in web module

* modify header include & comment oatpp environment config

* add port configure & create table in handler

* modify web url

* simple url complation done & add swagger

* make sure web url

* web functionality done. debuging

* add web unittest

* web test pass

* add web server port

* add web server port in template

* update unittest cmake file

* change web server default port to 19121

* rename method in web module & unittest pass

* add search case in unittest for web module

* rename some variables

* fix bug

* unittest pass

* web prepare

* fix cmd bug(check server status)

* update changlog

* add web port validate & default set

* clang-format pass

* add web port test in unittest

* add CORS & redirect root to swagger ui

* add web status

* web table method func cascade test pass

* add config url in web module

* modify thirdparty cmake to avoid building oatpp test

* clang format

* update changlog

* add constants in web module

* reserve Config.cpp

* fix constants reference bug

* replace web server with async module

* modify component to support async

* format

* developing controller & add test clent into unittest

* add web port into demo/server_config

* modify thirdparty cmake to allow build test

* remove  unnecessary comment

* add endpoint info in controller

* finish web test(bug here)

* clang format

* add web test cpp to lint exclusions

* check null field in GetConfig

* add macro RETURN STATUS DTo

* fix cmake conflict

* fix crash when exit server

* remove surplus comments & add http param check

* add uri /docs to direct swagger

* format

* change cmd to system

* add default value & unittest in web module

* add macros to judge if GPU supported

* add macros in unit & add default in index dto & print error message when bind http port fail

* format (fix #788)

* fix cors bug (not completed)

* comment cors

* change web framework to simple api

* comments optimize

* change to simple API

* remove comments in controller.hpp

* remove EP_COMMON_CMAKE_ARGS in oatpp and oatpp-swagger

* add ep cmake args to sqlite

* clang-format

* change a format

* test pass

* change name to

* fix compiler issue(oatpp-swagger depend on oatpp)

* add & in start_server.h

* specify lib location with oatpp and oatpp-swagger

* add comments

* add swagger definition

* [skip ci] change http method options status code

* remove oatpp swagger(fix #970)

* remove comments

* check Start web behavior

* add default to cpu_cache_capacity

* remove swagger component.hpp & /docs url

* remove /docs info

* remove /docs in unittest

* remove space in test rpc

* remove repeate info in CHANGLOG

* change cache_insert_data default value as a constant

* [skip ci] Fix some broken links (#960)

* [skip ci] Fix broken link

* [skip ci] Fix broken link

* [skip ci] Fix broken link

* [skip ci] Fix broken links

* fix issue 373 (#964)

* fix issue 373

* Adjustment format

* Adjustment format

* Adjustment format

* change readme

* #966 update NOTICE.md (#967)

* remove comments

* check Start web behavior

* add default to cpu_cache_capacity

* remove swagger component.hpp & /docs url

* remove /docs info

* remove /docs in unittest

* remove space in test rpc

* remove repeate info in CHANGLOG

* change cache_insert_data default value as a constant

* adjust web port cofig place

* rename web_port variable

* change gpu resources invoke way to cmd()

* set advanced config name add DEFAULT

* change config setting to cmd

* modify ..

* optimize code

* assign TableDto' count default value 0 (fix #995)

* check if table exists when show partitions (fix #1028)

* check table exists when drop partition (fix #1029)

* check if partition name is legal (fix #1022)

* modify status code when partition tag is illegal

* update changlog

* add info to /system url

* add binary index and add bin uri & handler method(not completed)

* optimize http insert and search time(fix #1066) | add binary vectors support(fix #1067)

* fix test partition bug

* fix test bug when check insert records

* add binary vectors test

* add default for offset and page_size

* fix uinttest bug

* [skip ci] remove comments

* optimize web code for PR comments

* add new folder named utils

* check offset and pagesize (fix #1082)

* improve error message if offset or page_size is not legal (fix #1075)

* add log into web module

* update changlog

* check gpu sources setting when assign repeated value (fix #990)

* update changlog

* clang-format pass

* add default handler in http handler

* [skip ci] improve error msg when check gpu resources

* change check offset way

* remove func IsIntStr

* add case

* change int32 to int64 when check number str

* add log in we module(doing)

* update test case

* add log in web controller

Co-authored-by: jielinxu <52057195+jielinxu@users.noreply.github.com>
Co-authored-by: JackLCL <53512883+JackLCL@users.noreply.github.com>
Co-authored-by: Cai Yudong <yudong.cai@zilliz.com>

* Filtering for specific paths in Jenkins CI  (#1107)

* run hadolint with reviewdog

* add LINCENSE in Dockerfile

* run hadolint with reviewdog

* Reporter of reviewdog command is "github-pr-check"

* format Dockerfile

* ignore DL3007 in hadolint

* clean up old docker images

* Add GPU sharing solution on native Kubernetes

* nightly test mailer

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Fix Filtering for specific paths in Jenkins CI bug (#1109)

* run hadolint with reviewdog

* add LINCENSE in Dockerfile

* run hadolint with reviewdog

* Reporter of reviewdog command is "github-pr-check"

* format Dockerfile

* ignore DL3007 in hadolint

* clean up old docker images

* Add GPU sharing solution on native Kubernetes

* nightly test mailer

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Fix Filtering for specific paths in Jenkins CI bug (#1110)

* run hadolint with reviewdog

* add LINCENSE in Dockerfile

* run hadolint with reviewdog

* Reporter of reviewdog command is "github-pr-check"

* format Dockerfile

* ignore DL3007 in hadolint

* clean up old docker images

* Add GPU sharing solution on native Kubernetes

* nightly test mailer

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Don't skip ci when triggered by a time (#1113)

* run hadolint with reviewdog

* add LINCENSE in Dockerfile

* run hadolint with reviewdog

* Reporter of reviewdog command is "github-pr-check"

* format Dockerfile

* ignore DL3007 in hadolint

* clean up old docker images

* Add GPU sharing solution on native Kubernetes

* nightly test mailer

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Don't skip ci when triggered by a time

* Don't skip ci when triggered by a time

* Set default sending to Milvus Dev mail group  (#1121)

* run hadolint with reviewdog

* add LINCENSE in Dockerfile

* run hadolint with reviewdog

* Reporter of reviewdog command is "github-pr-check"

* format Dockerfile

* ignore DL3007 in hadolint

* clean up old docker images

* Add GPU sharing solution on native Kubernetes

* nightly test mailer

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Test filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* Filtering for specific paths in Jenkins CI

* No skip ci when triggered by a time

* Don't skip ci when triggered by a time

* Set default sending to Milvus Dev

* Support hnsw (#1131)

* add hnsw

* add config

* format...

* format..

* Remove test.template (#1129)

* Update framework

* remove files

* Remove files

* Remove ann-acc cases && Update java-sdk cases

* change cn to en

* [skip ci] remove doc test

* [skip ci] change cn to en

* Case stability

* Add mail notification when test failed

* Add main notification

* Add main notification

* gen milvus instance from utils

* Distable case with multiprocess

* Add mail notification when nightly test failed

* add milvus handler param

* add http handler

* Remove test.template

Co-authored-by: quicksilver <zhifeng.zhang@zilliz.com>

* Add doc for the RESTful API / Update contributor number in Milvus readme (#1100)

* [skip ci] Update contributor number.

* [skip ci] Add RESTful API doc.

* [skip ci] Some updates.

* [skip ci] Change port to 19121.

* [skip ci] Update README.md.

Update the descriptions for OPTIONS.

* Update README.md

Fix a typo.

* #1105 update error message when creating IVFSQ8H index without GPU resources (#1117)

* [skip ci] Update README (#1104)

* remove Nvidia owned files from faiss (#1136)

* #1135 remove Nvidia owned files from faiss

* Revert "#1135 remove Nvidia owned files from faiss"

This reverts commit 3bc007c28c.

* #1135 remove Nvidia API implementation

* #1135 remove Nvidia owned files from faiss

* Update CODE_OF_CONDUCT.md (#1163)

* Improve codecov (#1095)

* Optimize config test. Dir src/config 99% lines covered

* add unittest coverage

* optimize cache&config unittest

* code format

* format

* format code

* fix merge conflict

* cover src/utils unittest

*  '#831 fix exe_path judge error'

* #831 fix exe_path judge error

* add some unittest coverage

* add some unittest coverage

* improve coverage of src/wrapper

* improve src/wrapper coverage

* *test optimize db/meta unittest

* fix bug

* *test optimize mysqlMetaImpl unittest

* *style: format code

* import server& scheduler unittest coverage

* handover next work

* *test: add some test_meta test case

* *format code

* *fix: fix typo

* feat(codecov): improve code coverage for src/db(#872)

* feat(codecov): improve code coverage for src/db/engine(#872)

* feat(codecov): improve code coverage(#872)

* fix config unittest bug

* feat(codecov): improve code coverage core/db/engine(#872)

* feat(codecov): improve code coverage core/knowhere

* feat(codecov): improve code coverage core/knowhere

* feat(codecov): improve code coverage

* feat(codecov): fix cpu test some error

* feat(codecov): improve code coverage

* feat(codecov): rename some fiu

* fix(db/meta): fix switch/case default action

* feat(codecov): improve code coverage(#872)
* fix error caused by merge code
* format code

* feat(codecov): improve code coverage & format code(#872)

* feat(codecov): fix test error(#872)

* feat(codecov): fix unittest test_mem(#872)

* feat(codecov): fix unittest(#872)

* feat(codecov): fix unittest for resource manager(#872)

* feat(codecov): code format (#872)

* feat(codecov): trigger ci(#872)

* fix(RequestScheduler): remove a wrong sleep statement

* test(test_rpc): fix rpc test

* Fix format issue

* Remove unused comments

* Fix unit test error

Co-authored-by: ABNER-1 <ABNER-1@users.noreply.github.com>
Co-authored-by: Jin Hai <hai.jin@zilliz.com>

* Support run dev test with http handler in python SDK (#1116)

* refactoring(create_table done)

* refactoring

* refactor server delivery (insert done)

* refactoring server module (count_table done)

* server refactor done

* cmake pass

* refactor server module done.

* set grpc response status correctly

* format done.

* fix redefine ErrorMap()

* optimize insert reducing ids data copy

* optimize grpc request with reducing data copy

* clang format

* [skip ci] Refactor server module done. update changlog. prepare for PR

* remove explicit and change int32_t to int64_t

* add web server

* [skip ci] add license in web module

* modify header include & comment oatpp environment config

* add port configure & create table in handler

* modify web url

* simple url complation done & add swagger

* make sure web url

* web functionality done. debuging

* add web unittest

* web test pass

* add web server port

* add web server port in template

* update unittest cmake file

* change web server default port to 19121

* rename method in web module & unittest pass

* add search case in unittest for web module

* rename some variables

* fix bug

* unittest pass

* web prepare

* fix cmd bug(check server status)

* update changlog

* add web port validate & default set

* clang-format pass

* add web port test in unittest

* add CORS & redirect root to swagger ui

* add web status

* web table method func cascade test pass

* add config url in web module

* modify thirdparty cmake to avoid building oatpp test

* clang format

* update changlog

* add constants in web module

* reserve Config.cpp

* fix constants reference bug

* replace web server with async module

* modify component to support async

* format

* developing controller & add test clent into unittest

* add web port into demo/server_config

* modify thirdparty cmake to allow build test

* remove  unnecessary comment

* add endpoint info in controller

* finish web test(bug here)

* clang format

* add web test cpp to lint exclusions

* check null field in GetConfig

* add macro RETURN STATUS DTo

* fix cmake conflict

* fix crash when exit server

* remove surplus comments & add http param check

* add uri /docs to direct swagger

* format

* change cmd to system

* add default value & unittest in web module

* add macros to judge if GPU supported

* add macros in unit & add default in index dto & print error message when bind http port fail

* format (fix #788)

* fix cors bug (not completed)

* comment cors

* change web framework to simple api

* comments optimize

* change to simple API

* remove comments in controller.hpp

* remove EP_COMMON_CMAKE_ARGS in oatpp and oatpp-swagger

* add ep cmake args to sqlite

* clang-format

* change a format

* test pass

* change name to

* fix compiler issue(oatpp-swagger depend on oatpp)

* add & in start_server.h

* specify lib location with oatpp and oatpp-swagger

* add comments

* add swagger definition

* [skip ci] change http method options status code

* remove oatpp swagger(fix #970)

* remove comments

* check Start web behavior

* add default to cpu_cache_capacity

* remove swagger component.hpp & /docs url

* remove /docs info

* remove /docs in unittest

* remove space in test rpc

* remove repeate info in CHANGLOG

* change cache_insert_data default value as a constant

* [skip ci] Fix some broken links (#960)

* [skip ci] Fix broken link

* [skip ci] Fix broken link

* [skip ci] Fix broken link

* [skip ci] Fix broken links

* fix issue 373 (#964)

* fix issue 373

* Adjustment format

* Adjustment format

* Adjustment format

* change readme

* #966 update NOTICE.md (#967)

* remove comments

* check Start web behavior

* add default to cpu_cache_capacity

* remove swagger component.hpp & /docs url

* remove /docs info

* remove /docs in unittest

* remove space in test rpc

* remove repeate info in CHANGLOG

* change cache_insert_data default value as a constant

* adjust web port cofig place

* rename web_port variable

* change gpu resources invoke way to cmd()

* set advanced config name add DEFAULT

* change config setting to cmd

* modify ..

* optimize code

* assign TableDto' count default value 0 (fix #995)

* check if table exists when show partitions (fix #1028)

* check table exists when drop partition (fix #1029)

* check if partition name is legal (fix #1022)

* modify status code when partition tag is illegal

* update changlog

* add info to /system url

* add binary index and add bin uri & handler method(not completed)

* optimize http insert and search time(fix #1066) | add binary vectors support(fix #1067)

* fix test partition bug

* fix test bug when check insert records

* add binary vectors test

* add default for offset and page_size

* fix uinttest bug

* [skip ci] remove comments

* optimize web code for PR comments

* add new folder named utils

* check offset and pagesize (fix #1082)

* improve error message if offset or page_size is not legal (fix #1075)

* add log into web module

* update changlog

* check gpu sources setting when assign repeated value (fix #990)

* update changlog

* clang-format pass

* add default handler in http handler

* [skip ci] improve error msg when check gpu resources

* change check offset way

* remove func IsIntStr

* add case

* change int32 to int64 when check number str

* add log in we module(doing)

* update test case

* add log in web controller

* remove surplus dot

* add preload into /system/

* change get_milvus() to get_milvus(args['handler'])

* support load table into memory with http server (fix #1115)

* [skip ci] comment surplus dto in VectorDto

Co-authored-by: jielinxu <52057195+jielinxu@users.noreply.github.com>
Co-authored-by: JackLCL <53512883+JackLCL@users.noreply.github.com>
Co-authored-by: Cai Yudong <yudong.cai@zilliz.com>

* Fix #1140 (#1162)

* fix

Signed-off-by: Nicky <nicky.xj.lin@gmail.com>

* update...

Signed-off-by: Nicky <nicky.xj.lin@gmail.com>

* fix2

Signed-off-by: Nicky <nicky.xj.lin@gmail.com>

* fix3

Signed-off-by: Nicky <nicky.xj.lin@gmail.com>

* update changelog

Signed-off-by: Nicky <nicky.xj.lin@gmail.com>

* Update INSTALL.md (#1175)

* Update INSTALL.md

1. Change image tag and Milvus source code to latest.
2. Fix a typo

Signed-off-by: Lu Wang <yamasite@qq.com>

* Update INSTALL.md

Signed-off-by: lu.wang <yamasite@qq.com>

* add Tanimoto ground truth (#1138)

* add milvus ground truth

* add milvus groundtruth

* [skip ci] add milvus ground truth

* [skip ci]add tanimoto ground truth

* fix mix case bug (#1208)

* fix mix case bug

Signed-off-by: del.zhenwu <zhenxiang.li@zilliz.com>

* Remove case.md

Signed-off-by: del.zhenwu <zhenxiang.li@zilliz.com>

* Update README.md (#1206)

Add LFAI mailing lists.

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* Add design.md to store links to design docs (#1219)

* Update README.md

Add link to Milvus design docs

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* Create design.md

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* Update design.md

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* Add troubleshooting info about libmysqlpp.so.3 error (#1225)

* Update INSTALL.md

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* Update INSTALL.md

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* Update README.md (#1233)

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* #1240 Update license declaration of each file (#1241)

* #1240 Update license declaration of each files

Signed-off-by: jinhai <hai.jin@zilliz.com>

* #1240 Update CHANGELOG

Signed-off-by: jinhai <hai.jin@zilliz.com>

* Update README.md (#1258)

Add Jenkins master badge.

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* Update INSTALL.md (#1265)

Fix indentation.

* support CPU profiling (#1251)

* #1250 support CPU profiling

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* #1250 fix code coverage

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* Fix HNSW crash (#1262)

* fix

Signed-off-by: xiaojun.lin <xiaojun.lin@zilliz.com>

* update.

Signed-off-by: xiaojun.lin <xiaojun.lin@zilliz.com>

* Add troubleshooting information for INSTALL.md and enhance readability (#1274)

* Update INSTALL.md

1. Add new troubleshooting message;
2. Enhance readability.

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* Update INSTALL.md

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* Update INSTALL.md

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* Update INSTALL.md

Add CentOS link.

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* Create COMMUNITY.md (#1292)

Signed-off-by: Lutkin Wang <yamasite@qq.com>

* fix gtest

* add copyright

* fix gtest

* MERGE_NOT_YET

* fix lint

Co-authored-by: quicksilver <zhifeng.zhang@zilliz.com>
Co-authored-by: BossZou <40255591+BossZou@users.noreply.github.com>
Co-authored-by: jielinxu <52057195+jielinxu@users.noreply.github.com>
Co-authored-by: JackLCL <53512883+JackLCL@users.noreply.github.com>
Co-authored-by: Cai Yudong <yudong.cai@zilliz.com>
Co-authored-by: Tinkerrr <linxiaojun.cn@outlook.com>
Co-authored-by: del-zhenwu <56623710+del-zhenwu@users.noreply.github.com>
Co-authored-by: Lutkin Wang <yamasite@qq.com>
Co-authored-by: shengjh <46514371+shengjh@users.noreply.github.com>
Co-authored-by: ABNER-1 <ABNER-1@users.noreply.github.com>
Co-authored-by: Jin Hai <hai.jin@zilliz.com>
Co-authored-by: shiyu22 <cshiyu22@gmail.com>

* #1302 Get all record IDs in a segment by given a segment id

* Remove query time ranges

Signed-off-by: zhenwu <zw@zilliz.com>

* #1295 let wal enable by default

* fix cases

Signed-off-by: zhenwu <zw@zilliz.com>

* fix partition cases

Signed-off-by: zhenwu <zw@zilliz.com>

* [skip ci] update test_db

* update

* fix case bug

Signed-off-by: zhenwu <zw@zilliz.com>

* lint

* fix test case failures

* remove some code

* Caiyd crud 1 (#1377)

* fix clang format

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fix unittest build error

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fix build issue when enable profiling

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fix hastable bug

* update bloom filter

* update

* benchmark

* update benchmark

* update

* update

* remove wal record size

Signed-off-by: shengjun.li <shengjun.li@zilliz.com>

* remove wal record size config

Signed-off-by: shengjun.li <shengjun.li@zilliz.com>

* update apply deletes: switch to binary search

* update sdk_simple

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* update apply deletes: switch to binary search

* add test_search_by_id

Signed-off-by: zhenwu <zw@zilliz.com>

* add more log

* flush error with multi same ids

Signed-off-by: zhenwu <zw@zilliz.com>

* modify wal config

Signed-off-by: shengjun.li <shengjun.li@zilliz.com>

* update

* add binary search_by_id

* fix case bug

Signed-off-by: zhenwu <zw@zilliz.com>

* update cases

Signed-off-by: zhenwu <zw@zilliz.com>

* fix unit test #1395

* improve merge performance

* add uids_ for VectorIndex to improve search performance

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fix error

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* update

* fix search

* fix record num

Signed-off-by: shengjun.li <shengjun.li@zilliz.com>

* refine code

* refine code

* Add get_vector_ids test cases (#1407)

* fix order

* add wal case

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix wal case

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix wal case

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix wal case

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix invalid operation issue

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix invalid operation issue

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix bug

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* fix bug

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* crud fix

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* crud fix

Signed-off-by: sahuang <xiaohaix@student.unimelb.edu.au>

* add table info test cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>
Signed-off-by: JinHai-CN <hai.jin@zilliz.com>

* add to compact case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* add to compact case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* add to compact case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* add case and debug compact

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* test pdb

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* test pdb

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* test pdb

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix cases

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update table_info case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update table_info case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update table_info case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update get vector ids case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update get vector ids case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update get vector ids case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update get vector ids case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* update case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* pdb test

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* pdb test

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* add tests for get_vector_ids

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix case

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* add binary and ip

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix binary index

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* fix pdb

Signed-off-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>

* #1408 fix search result in-correct after DeleteById

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* add one case

* delete failed segment

* update serialize

* update serialize

* fix case

Signed-off-by: zhenwu <zw@zilliz.com>

* update

* update case assertion

Signed-off-by: zhenwu <zw@zilliz.com>

* [skip ci] update config

* change bloom filter msync flag to async

* #1319 add more timing debug info

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* update

* update

* add normalize

Signed-off-by: zhenwu <zw@zilliz.com>

* add normalize

Signed-off-by: zhenwu <zw@zilliz.com>

* add normalize

Signed-off-by: zhenwu <zw@zilliz.com>

* Fix compiling error

Signed-off-by: jinhai <hai.jin@zilliz.com>

* support ip (#1383)

* support ip

Signed-off-by: xiaojun.lin <xiaojun.lin@zilliz.com>

* IP result distance sort by descend

Signed-off-by: Nicky <nicky.xj.lin@gmail.com>

* update

Signed-off-by: Nicky <nicky.xj.lin@gmail.com>

* format

Signed-off-by: xiaojun.lin <xiaojun.lin@zilliz.com>

* get table lsn

* Remove unused third party

Signed-off-by: jinhai <hai.jin@zilliz.com>

* Refine code

Signed-off-by: jinhai <hai.jin@zilliz.com>

* #1319 fix clang format

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* fix wal applied lsn

Signed-off-by: shengjun.li <shengjun.li@zilliz.com>

* validate partition tag

* #1319 improve search performance

Signed-off-by: yudong.cai <yudong.cai@zilliz.com>

* build error

Co-authored-by: Zhiru Zhu <youny626@hotmail.com>
Co-authored-by: groot <yihua.mo@zilliz.com>
Co-authored-by: Xiaohai Xu <xiaohaix@student.unimelb.edu.au>
Co-authored-by: shengjh <46514371+shengjh@users.noreply.github.com>
Co-authored-by: del-zhenwu <56623710+del-zhenwu@users.noreply.github.com>
Co-authored-by: shengjun.li <49774184+shengjun1985@users.noreply.github.com>
Co-authored-by: Cai Yudong <yudong.cai@zilliz.com>
Co-authored-by: quicksilver <zhifeng.zhang@zilliz.com>
Co-authored-by: BossZou <40255591+BossZou@users.noreply.github.com>
Co-authored-by: jielinxu <52057195+jielinxu@users.noreply.github.com>
Co-authored-by: JackLCL <53512883+JackLCL@users.noreply.github.com>
Co-authored-by: Tinkerrr <linxiaojun.cn@outlook.com>
Co-authored-by: Lutkin Wang <yamasite@qq.com>
Co-authored-by: ABNER-1 <ABNER-1@users.noreply.github.com>
Co-authored-by: shiyu22 <cshiyu22@gmail.com>
pull/1195/head^2
Jin Hai 2020-02-29 16:11:31 +08:00 committed by GitHub
parent 636f5c9cb6
commit dab74700b2
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
297 changed files with 35404 additions and 7332 deletions

View File

@ -22,9 +22,11 @@ Please mark all change in change log and use the issue from GitHub
- \#1075 - improve error message when page size or offset is illegal
- \#1082 - check page_size or offset value to avoid float
- \#1115 - http server support load table into memory
- \#1152 - Error log output continuously after server start
- \#1211 - Server down caused by searching with index_type: HNSW
- \#1240 - Update license declaration
- \#1298 - Unittest failed when on CPU2GPU case
- \#1359 - Negative distance value returned when searching with HNSW index type
## Feature
- \#216 - Add CLI to get server info
@ -39,6 +41,8 @@ Please mark all change in change log and use the issue from GitHub
- \#823 - Support binary vector tanimoto/jaccard/hamming metric
- \#853 - Support HNSW
- \#910 - Change Milvus c++ standard to c++17
- \#1204 - Add api to get table data information
- \#1302 - Get all record IDs in a segment by given a segment id
## Improvement
- \#738 - Use Openblas / lapack from apt install
@ -53,11 +57,14 @@ Please mark all change in change log and use the issue from GitHub
- \#1002 - Rename minio to s3 in Storage Config section
- \#1078 - Move 'insert_buffer_size' to Cache Config section
- \#1105 - Error message is not clear when creating IVFSQ8H index without gpu resources
- \#1297 - Hide partition_name parameter, avid user directly access partition table
- \#1310 - Add default partition tag for a table
- \#740, #849, #878, #972, #1033, #1161, #1173, #1199, #1190, #1223, #1222, #1257, #1264, #1269, #1164, #1303, #1304, #1324, #1388 - Various fixes and improvements for Milvus documentation.
- \#1234 - Do S3 server validation check when Milvus startup
- \#1263 - Allow system conf modifiable and some take effect directly
- \#1320 - Remove debug logging from faiss
## Task
- \#1327 - Exclude third-party code from codebeat
- \#1331 - Exclude third-party code from codacy

View File

@ -6,7 +6,6 @@
| Name | License |
| ------------- | ------------------------------------------------------------ |
| Apache Arrow | [Apache License 2.0](https://github.com/apache/arrow/blob/master/LICENSE.txt) |
| Boost | [Boost Software License](https://github.com/boostorg/boost/blob/master/LICENSE_1_0.txt) |
| FAISS | [MIT](https://github.com/facebookresearch/faiss/blob/master/LICENSE) |
| Gtest | [BSD 3-Clause](https://github.com/google/googletest/blob/master/LICENSE) |

View File

@ -1,6 +1,7 @@
timeout(time: 90, unit: 'MINUTES') {
dir ("tests/milvus_python_test") {
sh 'python3 -m pip install -r requirements.txt -i http://pypi.douban.com/simple --trusted-host pypi.douban.com'
// sh 'python3 -m pip install -r requirements.txt -i http://pypi.douban.com/simple --trusted-host pypi.douban.com'
sh 'python3 -m pip install -r requirements.txt'
sh "pytest . --alluredir=\"test_out/dev/single/sqlite\" --ip ${env.HELM_RELEASE_NAME}.milvus.svc.cluster.local"
}
// mysql database backend test

View File

@ -1,6 +1,7 @@
timeout(time: 60, unit: 'MINUTES') {
dir ("tests/milvus_python_test") {
sh 'python3 -m pip install -r requirements.txt -i http://pypi.douban.com/simple --trusted-host pypi.douban.com'
// sh 'python3 -m pip install -r requirements.txt -i http://pypi.douban.com/simple --trusted-host pypi.douban.com'
sh 'python3 -m pip install -r requirements.txt'
sh "pytest . --alluredir=\"test_out/dev/single/sqlite\" --level=1 --ip ${env.HELM_RELEASE_NAME}.milvus.svc.cluster.local"
}

View File

@ -90,7 +90,7 @@ if (MILVUS_VERSION_MAJOR STREQUAL ""
OR MILVUS_VERSION_MINOR STREQUAL ""
OR MILVUS_VERSION_PATCH STREQUAL "")
message(WARNING "Failed to determine Milvus version from git branch name")
set(MILVUS_VERSION "0.6.0")
set(MILVUS_VERSION "0.7.0")
endif ()
message(STATUS "Build version = ${MILVUS_VERSION}")

View File

@ -106,9 +106,7 @@ metric_config:
# | The sum of 'insert_buffer_size' and 'cpu_cache_capacity' | | |
# | must be less than system memory size. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
# cache_insert_data | Whether to load inserted data into cache immediately for | Boolean | false |
# | hot query. If want to simultaneously insert and query | | |
# | vectors, it's recommended to enable this config. | | |
# cache_insert_data | Whether to load data to cache for hot query | Boolean | false |
#----------------------+------------------------------------------------------------+------------+-----------------+
cache_config:
cpu_cache_capacity: 4

View File

@ -46,9 +46,12 @@ server_config:
# | loaded when Milvus server starts up. | | |
# | '*' means preload all existing tables. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
# auto_flush_interval | Interval of auto flush. Unit is millisecond. | Integer | 1000 |
#----------------------+------------------------------------------------------------+------------+-----------------+
db_config:
backend_url: sqlite://:@:/
preload_table:
auto_flush_interval: 1000
#----------------------+------------------------------------------------------------+------------+-----------------+
# Storage Config | Description | Type | Default |
@ -106,9 +109,7 @@ metric_config:
# | The sum of 'insert_buffer_size' and 'cpu_cache_capacity' | | |
# | must be less than system memory size. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
# cache_insert_data | Whether to load inserted data into cache immediately for | Boolean | false |
# | hot query. If want to simultaneously insert and query | | |
# | vectors, it's recommended to enable this config. | | |
# cache_insert_data | Whether to load data to cache for hot query | Boolean | false |
#----------------------+------------------------------------------------------------+------------+-----------------+
cache_config:
cpu_cache_capacity: 4
@ -167,3 +168,24 @@ gpu_resource_config:
#----------------------+------------------------------------------------------------+------------+-----------------+
tracing_config:
json_config_path:
#----------------------+------------------------------------------------------------+------------+-----------------+
# Wal Config | Description | Type | Default |
#----------------------+------------------------------------------------------------+------------+-----------------+
# enable | Switch of function wal. | Boolean | false |
#----------------------+------------------------------------------------------------+------------+-----------------+
# recovery_error_ignore| Whether ignore the error which happens during wal recovery | Boolean | true |
# | stage. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
# buffer_size | The size of the wal buffer. Unit is MB. | Integer | 256 |
# | It should be in range [64, 4096]. If the value set out of | | |
# | the range, the system will use the boundary value. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
# wal_path | The root path of wal relative files, include wal meta | String | NULL |
# | files. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
wal_config:
enable: true
recovery_error_ignore: true
buffer_size: 256 # MB
wal_path: /tmp/milvus/wal

View File

@ -106,9 +106,7 @@ metric_config:
# | The sum of 'insert_buffer_size' and 'cpu_cache_capacity' | | |
# | must be less than system memory size. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
# cache_insert_data | Whether to load inserted data into cache immediately for | Boolean | false |
# | hot query. If want to simultaneously insert and query | | |
# | vectors, it's recommended to enable this config. | | |
# cache_insert_data | Whether to load data to cache for hot query | Boolean | false |
#----------------------+------------------------------------------------------------+------------+-----------------+
cache_config:
cpu_cache_capacity: 4
@ -167,3 +165,24 @@ gpu_resource_config:
#----------------------+------------------------------------------------------------+------------+-----------------+
tracing_config:
json_config_path:
#----------------------+------------------------------------------------------------+------------+-----------------+
# Wal Config | Description | Type | Default |
#----------------------+------------------------------------------------------------+------------+-----------------+
# enable | Switch of function wal. | Boolean | false |
#----------------------+------------------------------------------------------------+------------+-----------------+
# recovery_error_ignore| Whether ignore the error which happens during wal recovery | Boolean | true |
# | stage. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
# buffer_size | The size of the wal buffer. Unit is MB. | Integer | 256 |
# | It should be in range [64, 4096]. If the value set out of | | |
# | the range, the system will use the boundary value. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
# wal_path | The root path of wal relative files, include wal meta | String | NULL |
# | files. | | |
#----------------------+------------------------------------------------------------+------------+-----------------+
wal_config:
enable: true
recovery_error_ignore: true
buffer_size: 256 # MB
wal_path: /tmp/milvus/wal

View File

@ -36,6 +36,7 @@ aux_source_directory(${MILVUS_ENGINE_SRC}/db db_main_files)
aux_source_directory(${MILVUS_ENGINE_SRC}/db/engine db_engine_files)
aux_source_directory(${MILVUS_ENGINE_SRC}/db/insert db_insert_files)
aux_source_directory(${MILVUS_ENGINE_SRC}/db/meta db_meta_files)
aux_source_directory(${MILVUS_ENGINE_SRC}/db/wal db_wal_files)
set(grpc_service_files
${MILVUS_ENGINE_SRC}/grpc/gen-milvus/milvus.grpc.pb.cc
@ -65,9 +66,11 @@ set(scheduler_files
aux_source_directory(${MILVUS_THIRDPARTY_SRC}/easyloggingpp thirdparty_easyloggingpp_files)
aux_source_directory(${MILVUS_THIRDPARTY_SRC}/nlohmann thirdparty_nlohmann_files)
aux_source_directory(${MILVUS_THIRDPARTY_SRC}/dablooms thirdparty_dablooms_files)
set(thirdparty_files
${thirdparty_easyloggingpp_files}
${thirdparty_nlohmann_files}
${thirdparty_dablooms_files}
)
aux_source_directory(${MILVUS_ENGINE_SRC}/server server_service_files)
@ -113,10 +116,18 @@ set(storage_files
)
aux_source_directory(${MILVUS_ENGINE_SRC}/utils utils_files)
aux_source_directory(${MILVUS_ENGINE_SRC}/wrapper wrapper_files)
aux_source_directory(${MILVUS_ENGINE_SRC}/tracing tracing_files)
aux_source_directory(${MILVUS_ENGINE_SRC}/codecs codecs_files)
aux_source_directory(${MILVUS_ENGINE_SRC}/codecs/default codecs_default_files)
aux_source_directory(${MILVUS_ENGINE_SRC}/segment segment_files)
aux_source_directory(${MILVUS_ENGINE_SRC}/store store_files)
set(engine_files
${CMAKE_CURRENT_SOURCE_DIR}/main.cpp
${cache_files}
@ -124,11 +135,16 @@ set(engine_files
${db_engine_files}
${db_insert_files}
${db_meta_files}
${db_wal_files}
${metrics_files}
${storage_files}
${thirdparty_files}
${utils_files}
${wrapper_files}
${codecs_files}
${codecs_default_files}
${segment_files}
${store_files}
)
if (MILVUS_WITH_PROMETHEUS)

View File

@ -0,0 +1,33 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
namespace milvus {
namespace codec {
class AttrsFormat {
// public:
// virtual Attrs
// read() = 0;
//
// virtual void
// write(Attrs attrs) = 0;
};
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,33 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
namespace milvus {
namespace codec {
class AttrsIndexFormat {
// public:
// virtual AttrsIndex
// read() = 0;
//
// virtual void
// write(AttrsIndex attrs_index) = 0;
};
} // namespace codec
} // namespace milvus

60
core/src/codecs/Codec.h Normal file
View File

@ -0,0 +1,60 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
#include "AttrsFormat.h"
#include "AttrsIndexFormat.h"
#include "DeletedDocsFormat.h"
#include "IdBloomFilterFormat.h"
#include "IdIndexFormat.h"
#include "VectorsFormat.h"
#include "VectorsIndexFormat.h"
namespace milvus {
namespace codec {
class Codec {
public:
virtual VectorsFormatPtr
GetVectorsFormat() = 0;
virtual DeletedDocsFormatPtr
GetDeletedDocsFormat() = 0;
virtual IdBloomFilterFormatPtr
GetIdBloomFilterFormat() = 0;
// TODO(zhiru)
/*
virtual AttrsFormat
GetAttrsFormat() = 0;
virtual VectorsIndexFormat
GetVectorsIndexFormat() = 0;
virtual AttrsIndexFormat
GetAttrsIndexFormat() = 0;
virtual IdIndexFormat
GetIdIndexFormat() = 0;
*/
};
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,40 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
#include <memory>
#include "segment/DeletedDocs.h"
#include "store/Directory.h"
namespace milvus {
namespace codec {
class DeletedDocsFormat {
public:
virtual void
read(const store::DirectoryPtr& directory_ptr, segment::DeletedDocsPtr& deleted_docs) = 0;
virtual void
write(const store::DirectoryPtr& directory_ptr, const segment::DeletedDocsPtr& deleted_docs) = 0;
};
using DeletedDocsFormatPtr = std::shared_ptr<DeletedDocsFormat>;
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,43 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
#include <memory>
#include "segment/IdBloomFilter.h"
#include "store/Directory.h"
namespace milvus {
namespace codec {
class IdBloomFilterFormat {
public:
virtual void
read(const store::DirectoryPtr& directory_ptr, segment::IdBloomFilterPtr& id_bloom_filter_ptr) = 0;
virtual void
write(const store::DirectoryPtr& directory_ptr, const segment::IdBloomFilterPtr& id_bloom_filter_ptr) = 0;
virtual void
create(const store::DirectoryPtr& directory_ptr, segment::IdBloomFilterPtr& id_bloom_filter_ptr) = 0;
};
using IdBloomFilterFormatPtr = std::shared_ptr<IdBloomFilterFormat>;
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,33 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
namespace milvus {
namespace codec {
class IdIndexFormat {
// public:
// virtual IdIndex
// read() = 0;
//
// virtual void
// write(IdIndex id_index) = 0;
};
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,48 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
#include <memory>
#include <vector>
#include "segment/Vectors.h"
#include "store/Directory.h"
namespace milvus {
namespace codec {
class VectorsFormat {
public:
virtual void
read(const store::DirectoryPtr& directory_ptr, segment::VectorsPtr& vectors_read) = 0;
virtual void
write(const store::DirectoryPtr& directory_ptr, const segment::VectorsPtr& vectors) = 0;
virtual void
read_uids(const store::DirectoryPtr& directory_ptr, std::vector<segment::doc_id_t>& uids) = 0;
virtual void
read_vectors(const store::DirectoryPtr& directory_ptr, off_t offset, size_t num_bytes,
std::vector<uint8_t>& raw_vectors) = 0;
};
using VectorsFormatPtr = std::shared_ptr<VectorsFormat>;
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,33 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
namespace milvus {
namespace codec {
class VectorsIndexFormat {
// public:
// virtual VectorsIndex
// read() = 0;
//
// virtual void
// write(VectorsIndex vectors_index) = 0;
};
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,51 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include "codecs/default/DefaultCodec.h"
#include <memory>
#include "DefaultDeletedDocsFormat.h"
#include "DefaultIdBloomFilterFormat.h"
#include "DefaultVectorsFormat.h"
namespace milvus {
namespace codec {
DefaultCodec::DefaultCodec() {
vectors_format_ptr_ = std::make_shared<DefaultVectorsFormat>();
deleted_docs_format_ptr_ = std::make_shared<DefaultDeletedDocsFormat>();
id_bloom_filter_format_ptr_ = std::make_shared<DefaultIdBloomFilterFormat>();
}
VectorsFormatPtr
DefaultCodec::GetVectorsFormat() {
return vectors_format_ptr_;
}
DeletedDocsFormatPtr
DefaultCodec::GetDeletedDocsFormat() {
return deleted_docs_format_ptr_;
}
IdBloomFilterFormatPtr
DefaultCodec::GetIdBloomFilterFormat() {
return id_bloom_filter_format_ptr_;
}
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,45 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
#include "codecs/Codec.h"
namespace milvus {
namespace codec {
class DefaultCodec : public Codec {
public:
DefaultCodec();
VectorsFormatPtr
GetVectorsFormat() override;
DeletedDocsFormatPtr
GetDeletedDocsFormat() override;
IdBloomFilterFormatPtr
GetIdBloomFilterFormat() override;
private:
VectorsFormatPtr vectors_format_ptr_;
DeletedDocsFormatPtr deleted_docs_format_ptr_;
IdBloomFilterFormatPtr id_bloom_filter_format_ptr_;
};
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,73 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include "codecs/default/DefaultDeletedDocsFormat.h"
#include <boost/filesystem.hpp>
#include <memory>
#include <string>
#include <vector>
#include "segment/Types.h"
#include "utils/Exception.h"
#include "utils/Log.h"
namespace milvus {
namespace codec {
void
DefaultDeletedDocsFormat::read(const store::DirectoryPtr& directory_ptr, segment::DeletedDocsPtr& deleted_docs) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = directory_ptr->GetDirPath();
const std::string del_file_path = dir_path + "/" + deleted_docs_filename_;
FILE* del_file = fopen(del_file_path.c_str(), "rb");
if (del_file == nullptr) {
std::string err_msg = "Failed to open file: " + del_file_path;
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_CANNOT_CREATE_FILE, err_msg);
}
auto file_size = boost::filesystem::file_size(boost::filesystem::path(del_file_path));
auto deleted_docs_size = file_size / sizeof(segment::offset_t);
std::vector<segment::offset_t> deleted_docs_list;
deleted_docs_list.resize(deleted_docs_size);
fread((void*)(deleted_docs_list.data()), sizeof(segment::offset_t), deleted_docs_size, del_file);
deleted_docs = std::make_shared<segment::DeletedDocs>(deleted_docs_list);
fclose(del_file);
}
void
DefaultDeletedDocsFormat::write(const store::DirectoryPtr& directory_ptr, const segment::DeletedDocsPtr& deleted_docs) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = directory_ptr->GetDirPath();
const std::string del_file_path = dir_path + "/" + deleted_docs_filename_;
FILE* del_file = fopen(del_file_path.c_str(), "ab"); // TODO(zhiru): append mode
if (del_file == nullptr) {
std::string err_msg = "Failed to open file: " + del_file_path;
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_CANNOT_CREATE_FILE, err_msg);
}
auto deleted_docs_list = deleted_docs->GetDeletedDocs();
fwrite((void*)(deleted_docs_list.data()), sizeof(segment::offset_t), deleted_docs->GetSize(), del_file);
fclose(del_file);
}
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,54 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
#include <mutex>
#include <string>
#include "codecs/DeletedDocsFormat.h"
namespace milvus {
namespace codec {
class DefaultDeletedDocsFormat : public DeletedDocsFormat {
public:
DefaultDeletedDocsFormat() = default;
void
read(const store::DirectoryPtr& directory_ptr, segment::DeletedDocsPtr& deleted_docs) override;
void
write(const store::DirectoryPtr& directory_ptr, const segment::DeletedDocsPtr& deleted_docs) override;
// No copy and move
DefaultDeletedDocsFormat(const DefaultDeletedDocsFormat&) = delete;
DefaultDeletedDocsFormat(DefaultDeletedDocsFormat&&) = delete;
DefaultDeletedDocsFormat&
operator=(const DefaultDeletedDocsFormat&) = delete;
DefaultDeletedDocsFormat&
operator=(DefaultDeletedDocsFormat&&) = delete;
private:
std::mutex mutex_;
const std::string deleted_docs_filename_ = "deleted_docs";
};
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,82 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include "codecs/default/DefaultIdBloomFilterFormat.h"
#include <memory>
#include <string>
#include "utils/Exception.h"
#include "utils/Log.h"
namespace milvus {
namespace codec {
constexpr unsigned int bloom_filter_capacity = 500000;
constexpr double bloom_filter_error_rate = 0.01;
void
DefaultIdBloomFilterFormat::read(const store::DirectoryPtr& directory_ptr,
segment::IdBloomFilterPtr& id_bloom_filter_ptr) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = directory_ptr->GetDirPath();
const std::string bloom_filter_file_path = dir_path + "/" + bloom_filter_filename_;
scaling_bloom_t* bloom_filter =
new_scaling_bloom_from_file(bloom_filter_capacity, bloom_filter_error_rate, bloom_filter_file_path.c_str());
if (bloom_filter == nullptr) {
std::string err_msg =
"Failed to read bloom filter from file: " + bloom_filter_file_path + ". " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_UNEXPECTED_ERROR, err_msg);
}
id_bloom_filter_ptr = std::make_shared<segment::IdBloomFilter>(bloom_filter);
}
void
DefaultIdBloomFilterFormat::write(const store::DirectoryPtr& directory_ptr,
const segment::IdBloomFilterPtr& id_bloom_filter_ptr) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = directory_ptr->GetDirPath();
const std::string bloom_filter_file_path = dir_path + "/" + bloom_filter_filename_;
if (scaling_bloom_flush(id_bloom_filter_ptr->GetBloomFilter()) == -1) {
std::string err_msg =
"Failed to write bloom filter to file: " + bloom_filter_file_path + ". " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_UNEXPECTED_ERROR, err_msg);
}
}
void
DefaultIdBloomFilterFormat::create(const store::DirectoryPtr& directory_ptr,
segment::IdBloomFilterPtr& id_bloom_filter_ptr) {
std::string dir_path = directory_ptr->GetDirPath();
const std::string bloom_filter_file_path = dir_path + "/" + bloom_filter_filename_;
scaling_bloom_t* bloom_filter =
new_scaling_bloom(bloom_filter_capacity, bloom_filter_error_rate, bloom_filter_file_path.c_str());
if (bloom_filter == nullptr) {
std::string err_msg =
"Failed to read bloom filter from file: " + bloom_filter_file_path + ". " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_UNEXPECTED_ERROR, err_msg);
}
id_bloom_filter_ptr = std::make_shared<segment::IdBloomFilter>(bloom_filter);
}
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,59 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
#include <mutex>
#include <string>
#include "codecs/IdBloomFilterFormat.h"
#include "segment/IdBloomFilter.h"
#include "store/Directory.h"
namespace milvus {
namespace codec {
class DefaultIdBloomFilterFormat : public IdBloomFilterFormat {
public:
DefaultIdBloomFilterFormat() = default;
void
read(const store::DirectoryPtr& directory_ptr, segment::IdBloomFilterPtr& id_bloom_filter_ptr) override;
void
write(const store::DirectoryPtr& directory_ptr, const segment::IdBloomFilterPtr& id_bloom_filter_ptr) override;
void
create(const store::DirectoryPtr& directory_ptr, segment::IdBloomFilterPtr& id_bloom_filter_ptr) override;
// No copy and move
DefaultIdBloomFilterFormat(const DefaultIdBloomFilterFormat&) = delete;
DefaultIdBloomFilterFormat(DefaultIdBloomFilterFormat&&) = delete;
DefaultIdBloomFilterFormat&
operator=(const DefaultIdBloomFilterFormat&) = delete;
DefaultIdBloomFilterFormat&
operator=(DefaultIdBloomFilterFormat&&) = delete;
private:
std::mutex mutex_;
const std::string bloom_filter_filename_ = "bloom_filter";
};
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,263 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#include "codecs/default/DefaultVectorsFormat.h"
#include <fcntl.h>
#include <unistd.h>
#include <boost/filesystem.hpp>
#include "utils/Exception.h"
#include "utils/Log.h"
namespace milvus {
namespace codec {
void
DefaultVectorsFormat::read(const store::DirectoryPtr& directory_ptr, segment::VectorsPtr& vectors_read) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = directory_ptr->GetDirPath();
if (!boost::filesystem::is_directory(dir_path)) {
std::string err_msg = "Directory: " + dir_path + "does not exist";
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_INVALID_ARGUMENT, err_msg);
}
boost::filesystem::path target_path(dir_path);
typedef boost::filesystem::directory_iterator d_it;
d_it it_end;
d_it it(target_path);
// for (auto& it : boost::filesystem::directory_iterator(dir_path)) {
for (; it != it_end; ++it) {
const auto& path = it->path();
if (path.extension().string() == raw_vector_extension_) {
int rv_fd = open(path.c_str(), O_RDONLY, 00664);
if (rv_fd == -1) {
std::string err_msg = "Failed to open file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_CANNOT_CREATE_FILE, err_msg);
}
size_t num_bytes = boost::filesystem::file_size(path);
std::vector<uint8_t> vector_list;
vector_list.resize(num_bytes);
if (::read(rv_fd, vector_list.data(), num_bytes) == -1) {
std::string err_msg = "Failed to read from file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
vectors_read->AddData(vector_list);
vectors_read->SetName(path.stem().string());
if (::close(rv_fd) == -1) {
std::string err_msg = "Failed to close file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
}
if (path.extension().string() == user_id_extension_) {
int uid_fd = open(path.c_str(), O_RDONLY, 00664);
if (uid_fd == -1) {
std::string err_msg = "Failed to open file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_CANNOT_CREATE_FILE, err_msg);
}
auto file_size = boost::filesystem::file_size(path);
auto count = file_size / sizeof(segment::doc_id_t);
std::vector<segment::doc_id_t> uids;
uids.resize(count);
if (::read(uid_fd, uids.data(), file_size) == -1) {
std::string err_msg = "Failed to read from file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
vectors_read->AddUids(uids);
if (::close(uid_fd) == -1) {
std::string err_msg = "Failed to close file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
}
}
}
void
DefaultVectorsFormat::write(const store::DirectoryPtr& directory_ptr, const segment::VectorsPtr& vectors) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = directory_ptr->GetDirPath();
const std::string rv_file_path = dir_path + "/" + vectors->GetName() + raw_vector_extension_;
const std::string uid_file_path = dir_path + "/" + vectors->GetName() + user_id_extension_;
/*
FILE* rv_file = fopen(rv_file_path.c_str(), "wb");
if (rv_file == nullptr) {
std::string err_msg = "Failed to open file: " + rv_file_path;
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_CANNOT_CREATE_FILE, err_msg);
}
fwrite((void*)(it.second->GetData()), sizeof(char), it.second->GetNumBytes(), rv_file);
fclose(rv_file);
FILE* uid_file = fopen(uid_file_path.c_str(), "wb");
if (uid_file == nullptr) {
std::string err_msg = "Failed to open file: " + uid_file_path;
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_CANNOT_CREATE_FILE, err_msg);
}
fwrite((void*)(it.second->GetUids()), sizeof it.second->GetUids()[0], it.second->GetCount(), uid_file);
fclose(rv_file);
*/
int rv_fd = open(rv_file_path.c_str(), O_WRONLY | O_TRUNC | O_CREAT, 00664);
if (rv_fd == -1) {
std::string err_msg = "Failed to open file: " + rv_file_path + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_CANNOT_CREATE_FILE, err_msg);
}
int uid_fd = open(uid_file_path.c_str(), O_WRONLY | O_TRUNC | O_CREAT, 00664);
if (uid_fd == -1) {
std::string err_msg = "Failed to open file: " + uid_file_path + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_CANNOT_CREATE_FILE, err_msg);
}
if (::write(rv_fd, vectors->GetData().data(), vectors->GetData().size()) == -1) {
std::string err_msg = "Failed to write to file" + rv_file_path + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
if (::close(rv_fd) == -1) {
std::string err_msg = "Failed to close file: " + rv_file_path + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
if (::write(uid_fd, vectors->GetUids().data(), sizeof(segment::doc_id_t) * vectors->GetCount()) == -1) {
std::string err_msg = "Failed to write to file" + uid_file_path + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
if (::close(uid_fd) == -1) {
std::string err_msg = "Failed to close file: " + uid_file_path + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
}
void
DefaultVectorsFormat::read_uids(const store::DirectoryPtr& directory_ptr, std::vector<segment::doc_id_t>& uids) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = directory_ptr->GetDirPath();
if (!boost::filesystem::is_directory(dir_path)) {
std::string err_msg = "Directory: " + dir_path + "does not exist";
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_INVALID_ARGUMENT, err_msg);
}
boost::filesystem::path target_path(dir_path);
typedef boost::filesystem::directory_iterator d_it;
d_it it_end;
d_it it(target_path);
// for (auto& it : boost::filesystem::directory_iterator(dir_path)) {
for (; it != it_end; ++it) {
const auto& path = it->path();
if (path.extension().string() == user_id_extension_) {
int uid_fd = open(path.c_str(), O_RDONLY, 00664);
if (uid_fd == -1) {
std::string err_msg = "Failed to open file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_CANNOT_CREATE_FILE, err_msg);
}
auto file_size = boost::filesystem::file_size(path);
auto count = file_size / sizeof(segment::doc_id_t);
uids.resize(count);
if (::read(uid_fd, uids.data(), file_size) == -1) {
std::string err_msg = "Failed to read from file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
if (::close(uid_fd) == -1) {
std::string err_msg = "Failed to close file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
}
}
}
void
DefaultVectorsFormat::read_vectors(const store::DirectoryPtr& directory_ptr, off_t offset, size_t num_bytes,
std::vector<uint8_t>& raw_vectors) {
const std::lock_guard<std::mutex> lock(mutex_);
std::string dir_path = directory_ptr->GetDirPath();
if (!boost::filesystem::is_directory(dir_path)) {
std::string err_msg = "Directory: " + dir_path + "does not exist";
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_INVALID_ARGUMENT, err_msg);
}
boost::filesystem::path target_path(dir_path);
typedef boost::filesystem::directory_iterator d_it;
d_it it_end;
d_it it(target_path);
// for (auto& it : boost::filesystem::directory_iterator(dir_path)) {
for (; it != it_end; ++it) {
const auto& path = it->path();
if (path.extension().string() == raw_vector_extension_) {
int rv_fd = open(path.c_str(), O_RDONLY, 00664);
if (rv_fd == -1) {
std::string err_msg = "Failed to open file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_CANNOT_CREATE_FILE, err_msg);
}
int off = lseek(rv_fd, offset, SEEK_SET);
if (off == -1) {
std::string err_msg = "Failed to seek file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
raw_vectors.resize(num_bytes);
if (::read(rv_fd, raw_vectors.data(), num_bytes) == -1) {
std::string err_msg = "Failed to read from file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
if (::close(rv_fd) == -1) {
std::string err_msg = "Failed to close file: " + path.string() + ", error: " + std::strerror(errno);
ENGINE_LOG_ERROR << err_msg;
throw Exception(SERVER_WRITE_ERROR, err_msg);
}
}
}
}
} // namespace codec
} // namespace milvus

View File

@ -0,0 +1,64 @@
// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements. See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership. The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied. See the License for the
// specific language governing permissions and limitations
// under the License.
#pragma once
#include <mutex>
#include <string>
#include <vector>
#include "codecs/VectorsFormat.h"
#include "segment/Vectors.h"
namespace milvus {
namespace codec {
class DefaultVectorsFormat : public VectorsFormat {
public:
DefaultVectorsFormat() = default;
void
read(const store::DirectoryPtr& directory_ptr, segment::VectorsPtr& vectors_read) override;
void
write(const store::DirectoryPtr& directory_ptr, const segment::VectorsPtr& vectors) override;
void
read_uids(const store::DirectoryPtr& directory_ptr, std::vector<segment::doc_id_t>& uids) override;
void
read_vectors(const store::DirectoryPtr& directory_ptr, off_t offset, size_t num_bytes,
std::vector<uint8_t>& raw_vectors) override;
// No copy and move
DefaultVectorsFormat(const DefaultVectorsFormat&) = delete;
DefaultVectorsFormat(DefaultVectorsFormat&&) = delete;
DefaultVectorsFormat&
operator=(const DefaultVectorsFormat&) = delete;
DefaultVectorsFormat&
operator=(DefaultVectorsFormat&&) = delete;
private:
std::mutex mutex_;
const std::string raw_vector_extension_ = ".rv";
const std::string user_id_extension_ = ".uid";
};
} // namespace codec
} // namespace milvus

View File

@ -44,7 +44,7 @@ class DB {
CreateTable(meta::TableSchema& table_schema_) = 0;
virtual Status
DropTable(const std::string& table_id, const meta::DatesT& dates) = 0;
DropTable(const std::string& table_id) = 0;
virtual Status
DescribeTable(meta::TableSchema& table_schema_) = 0;
@ -52,9 +52,15 @@ class DB {
virtual Status
HasTable(const std::string& table_id, bool& has_or_not_) = 0;
virtual Status
HasNativeTable(const std::string& table_id, bool& has_or_not_) = 0;
virtual Status
AllTables(std::vector<meta::TableSchema>& table_schema_array) = 0;
virtual Status
GetTableInfo(const std::string& table_id, TableInfo& table_info) = 0;
virtual Status
GetTableRowCount(const std::string& table_id, uint64_t& row_count) = 0;
@ -80,20 +86,44 @@ class DB {
virtual Status
InsertVectors(const std::string& table_id, const std::string& partition_tag, VectorsData& vectors) = 0;
virtual Status
DeleteVector(const std::string& table_id, IDNumber vector_id) = 0;
virtual Status
DeleteVectors(const std::string& table_id, IDNumbers vector_ids) = 0;
virtual Status
Flush(const std::string& table_id) = 0;
virtual Status
Flush() = 0;
virtual Status
Compact(const std::string& table_id) = 0;
virtual Status
GetVectorByID(const std::string& table_id, const IDNumber& vector_id, VectorsData& vector) = 0;
virtual Status
GetVectorIDs(const std::string& table_id, const std::string& segment_id, IDNumbers& vector_ids) = 0;
// virtual Status
// Merge(const std::set<std::string>& table_ids) = 0;
virtual Status
QueryByID(const std::shared_ptr<server::Context>& context, const std::string& table_id,
const std::vector<std::string>& partition_tags, uint64_t k, uint64_t nprobe, IDNumber vector_id,
ResultIds& result_ids, ResultDistances& result_distances) = 0;
virtual Status
Query(const std::shared_ptr<server::Context>& context, const std::string& table_id,
const std::vector<std::string>& partition_tags, uint64_t k, uint64_t nprobe, const VectorsData& vectors,
ResultIds& result_ids, ResultDistances& result_distances) = 0;
virtual Status
Query(const std::shared_ptr<server::Context>& context, const std::string& table_id,
const std::vector<std::string>& partition_tags, uint64_t k, uint64_t nprobe, const VectorsData& vectors,
const meta::DatesT& dates, ResultIds& result_ids, ResultDistances& result_distances) = 0;
virtual Status
QueryByFileID(const std::shared_ptr<server::Context>& context, const std::string& table_id,
const std::vector<std::string>& file_ids, uint64_t k, uint64_t nprobe, const VectorsData& vectors,
const meta::DatesT& dates, ResultIds& result_ids, ResultDistances& result_distances) = 0;
ResultIds& result_ids, ResultDistances& result_distances) = 0;
virtual Status
Size(uint64_t& result) = 0;

File diff suppressed because it is too large Load Diff

View File

@ -28,6 +28,7 @@
#include "db/Types.h"
#include "db/insert/MemManager.h"
#include "utils/ThreadPool.h"
#include "wal/WalManager.h"
namespace milvus {
namespace engine {
@ -52,7 +53,7 @@ class DBImpl : public DB {
CreateTable(meta::TableSchema& table_schema) override;
Status
DropTable(const std::string& table_id, const meta::DatesT& dates) override;
DropTable(const std::string& table_id) override;
Status
DescribeTable(meta::TableSchema& table_schema) override;
@ -60,9 +61,15 @@ class DBImpl : public DB {
Status
HasTable(const std::string& table_id, bool& has_or_not) override;
Status
HasNativeTable(const std::string& table_id, bool& has_or_not_) override;
Status
AllTables(std::vector<meta::TableSchema>& table_schema_array) override;
Status
GetTableInfo(const std::string& table_id, TableInfo& table_info) override;
Status
PreloadTable(const std::string& table_id) override;
@ -88,6 +95,30 @@ class DBImpl : public DB {
Status
InsertVectors(const std::string& table_id, const std::string& partition_tag, VectorsData& vectors) override;
Status
DeleteVector(const std::string& table_id, IDNumber vector_id) override;
Status
DeleteVectors(const std::string& table_id, IDNumbers vector_ids) override;
Status
Flush(const std::string& table_id) override;
Status
Flush() override;
Status
Compact(const std::string& table_id) override;
Status
GetVectorByID(const std::string& table_id, const IDNumber& vector_id, VectorsData& vector) override;
Status
GetVectorIDs(const std::string& table_id, const std::string& segment_id, IDNumbers& vector_ids) override;
// Status
// Merge(const std::set<std::string>& table_ids) override;
Status
CreateIndex(const std::string& table_id, const TableIndex& index) override;
@ -97,20 +128,20 @@ class DBImpl : public DB {
Status
DropIndex(const std::string& table_id) override;
Status
QueryByID(const std::shared_ptr<server::Context>& context, const std::string& table_id,
const std::vector<std::string>& partition_tags, uint64_t k, uint64_t nprobe, IDNumber vector_id,
ResultIds& result_ids, ResultDistances& result_distances) override;
Status
Query(const std::shared_ptr<server::Context>& context, const std::string& table_id,
const std::vector<std::string>& partition_tags, uint64_t k, uint64_t nprobe, const VectorsData& vectors,
ResultIds& result_ids, ResultDistances& result_distances) override;
Status
Query(const std::shared_ptr<server::Context>& context, const std::string& table_id,
const std::vector<std::string>& partition_tags, uint64_t k, uint64_t nprobe, const VectorsData& vectors,
const meta::DatesT& dates, ResultIds& result_ids, ResultDistances& result_distances) override;
Status
QueryByFileID(const std::shared_ptr<server::Context>& context, const std::string& table_id,
const std::vector<std::string>& file_ids, uint64_t k, uint64_t nprobe, const VectorsData& vectors,
const meta::DatesT& dates, ResultIds& result_ids, ResultDistances& result_distances) override;
ResultIds& result_ids, ResultDistances& result_distances) override;
Status
Size(uint64_t& result) override;
@ -121,6 +152,10 @@ class DBImpl : public DB {
const meta::TableFilesSchema& files, uint64_t k, uint64_t nprobe, const VectorsData& vectors,
ResultIds& result_ids, ResultDistances& result_distances);
Status
GetVectorByIdHelper(const std::string& table_id, IDNumber vector_id, VectorsData& vector,
const meta::TableFilesSchema& files);
void
BackgroundTimerTask();
void
@ -132,36 +167,44 @@ class DBImpl : public DB {
StartMetricTask();
void
StartCompactionTask();
StartMergeTask();
Status
MergeFiles(const std::string& table_id, const meta::DateT& date, const meta::TableFilesSchema& files);
MergeFiles(const std::string& table_id, const meta::TableFilesSchema& files);
Status
BackgroundMergeFiles(const std::string& table_id);
void
BackgroundCompaction(std::set<std::string> table_ids);
BackgroundMerge(std::set<std::string> table_ids);
void
StartBuildIndexTask(bool force = false);
void
BackgroundBuildIndex();
Status
CompactFile(const std::string& table_id, const milvus::engine::meta::TableFileSchema& file);
/*
Status
SyncMemData(std::set<std::string>& sync_table_ids);
*/
Status
GetFilesToBuildIndex(const std::string& table_id, const std::vector<int>& file_types,
meta::TableFilesSchema& files);
Status
GetFilesToSearch(const std::string& table_id, const std::vector<size_t>& file_ids, const meta::DatesT& dates,
meta::TableFilesSchema& files);
GetFilesToSearch(const std::string& table_id, const std::vector<size_t>& file_ids, meta::TableFilesSchema& files);
Status
GetPartitionByTag(const std::string& table_id, const std::string& partition_tag, std::string& partition_name);
Status
GetPartitionsByTags(const std::string& table_id, const std::vector<std::string>& partition_tags,
std::set<std::string>& partition_name_array);
Status
DropTableRecursively(const std::string& table_id, const meta::DatesT& dates);
DropTableRecursively(const std::string& table_id);
Status
UpdateTableIndexRecursively(const std::string& table_id, const TableIndex& index);
@ -175,6 +218,12 @@ class DBImpl : public DB {
Status
GetTableRowCountRecursively(const std::string& table_id, uint64_t& row_count);
Status
ExecWalRecord(const wal::MXLogRecord& record);
void
BackgroundWalTask();
private:
const DBOptions options_;
@ -184,12 +233,49 @@ class DBImpl : public DB {
meta::MetaPtr meta_ptr_;
MemManagerPtr mem_mgr_;
std::mutex mem_serialize_mutex_;
ThreadPool compact_thread_pool_;
std::mutex compact_result_mutex_;
std::list<std::future<void>> compact_thread_results_;
std::set<std::string> compact_table_ids_;
std::shared_ptr<wal::WalManager> wal_mgr_;
std::thread bg_wal_thread_;
struct SimpleWaitNotify {
bool notified_ = false;
std::mutex mutex_;
std::condition_variable cv_;
void
Wait() {
std::unique_lock<std::mutex> lck(mutex_);
if (!notified_) {
cv_.wait(lck);
}
notified_ = false;
}
void
Wait_Until(const std::chrono::system_clock::time_point& tm_pint) {
std::unique_lock<std::mutex> lck(mutex_);
if (!notified_) {
cv_.wait_until(lck, tm_pint);
}
notified_ = false;
}
void
Notify() {
std::unique_lock<std::mutex> lck(mutex_);
notified_ = true;
lck.unlock();
cv_.notify_one();
}
};
SimpleWaitNotify wal_task_swn_;
SimpleWaitNotify flush_task_swn_;
ThreadPool merge_thread_pool_;
std::mutex merge_result_mutex_;
std::list<std::future<void>> merge_thread_results_;
std::set<std::string> merge_table_ids_;
ThreadPool index_thread_pool_;
std::mutex index_result_mutex_;
@ -198,7 +284,8 @@ class DBImpl : public DB {
std::mutex build_index_mutex_;
IndexFailedChecker index_failed_checker_;
OngoingFileChecker ongoing_files_checker_;
std::mutex flush_merge_compact_mutex_;
}; // DBImpl
} // namespace engine

View File

@ -17,6 +17,12 @@
namespace milvus {
namespace engine {
OngoingFileChecker&
OngoingFileChecker::GetInstance() {
static OngoingFileChecker instance;
return instance;
}
Status
OngoingFileChecker::MarkOngoingFile(const meta::TableFileSchema& table_file) {
std::lock_guard<std::mutex> lck(mutex_);

View File

@ -23,8 +23,11 @@
namespace milvus {
namespace engine {
class OngoingFileChecker : public meta::Meta::CleanUpFilter {
class OngoingFileChecker {
public:
static OngoingFileChecker&
GetInstance();
Status
MarkOngoingFile(const meta::TableFileSchema& table_file);
@ -38,7 +41,7 @@ class OngoingFileChecker : public meta::Meta::CleanUpFilter {
UnmarkOngoingFiles(const meta::TableFilesSchema& table_files);
bool
IsIgnored(const meta::TableFileSchema& schema) override;
IsIgnored(const meta::TableFileSchema& schema);
private:
Status

View File

@ -70,6 +70,14 @@ struct DBOptions {
size_t insert_buffer_size_ = 4 * ONE_GB;
bool insert_cache_immediately_ = false;
int auto_flush_interval_ = 1000;
// wal relative configurations
bool wal_enable_ = true;
bool recovery_error_ignore_ = true;
uint32_t buffer_size_ = 256;
std::string mxlog_path_ = "/tmp/milvus/wal/";
}; // Options
} // namespace engine

View File

@ -11,20 +11,22 @@
#pragma once
#include "db/engine/ExecutionEngine.h"
#include <faiss/Index.h>
#include <stdint.h>
#include <cstdint>
#include <map>
#include <set>
#include <string>
#include <utility>
#include <vector>
#include "db/engine/ExecutionEngine.h"
#include "segment/Types.h"
namespace milvus {
namespace engine {
typedef int64_t IDNumber;
typedef segment::doc_id_t IDNumber;
typedef IDNumber* IDNumberPtr;
typedef std::vector<IDNumber> IDNumbers;
@ -49,5 +51,23 @@ using Table2FileErr = std::map<std::string, File2ErrArray>;
using File2RefCount = std::map<std::string, int64_t>;
using Table2FileRef = std::map<std::string, File2RefCount>;
struct SegmentStat {
std::string name_;
int64_t row_count_ = 0;
std::string index_name_;
int64_t data_size_ = 0;
};
struct PartitionStat {
std::string tag_;
std::vector<SegmentStat> segments_stat_;
};
struct TableInfo {
std::vector<PartitionStat> partitions_stat_;
};
static const char* DEFAULT_PARTITON_TAG = "_default";
} // namespace engine
} // namespace milvus

View File

@ -10,10 +10,6 @@
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "db/Utils.h"
#include "server/Config.h"
#include "storage/s3/S3ClientWrapper.h"
#include "utils/CommonUtil.h"
#include "utils/Log.h"
#include <fiu-local.h>
#include <boost/filesystem.hpp>
@ -22,6 +18,11 @@
#include <regex>
#include <vector>
#include "server/Config.h"
#include "storage/s3/S3ClientWrapper.h"
#include "utils/CommonUtil.h"
#include "utils/Log.h"
namespace milvus {
namespace engine {
namespace utils {
@ -36,7 +37,7 @@ std::mutex index_file_counter_mutex;
static std::string
ConstructParentFolder(const std::string& db_path, const meta::TableFileSchema& table_file) {
std::string table_path = db_path + TABLES_FOLDER + table_file.table_id_;
std::string partition_path = table_path + "/" + std::to_string(table_file.date_);
std::string partition_path = table_path + "/" + table_file.segment_id_;
return partition_path;
}
@ -163,7 +164,7 @@ GetTableFilePath(const DBMetaOptions& options, meta::TableFileSchema& table_file
return Status::OK();
}
if (boost::filesystem::exists(file_path)) {
if (boost::filesystem::exists(parent_path)) {
table_file.location_ = file_path;
return Status::OK();
}
@ -171,7 +172,7 @@ GetTableFilePath(const DBMetaOptions& options, meta::TableFileSchema& table_file
for (auto& path : options.slave_paths_) {
parent_path = ConstructParentFolder(path, table_file);
file_path = parent_path + "/" + table_file.file_id_;
if (boost::filesystem::exists(file_path)) {
if (boost::filesystem::exists(parent_path)) {
table_file.location_ = file_path;
return Status::OK();
}
@ -192,6 +193,22 @@ DeleteTableFilePath(const DBMetaOptions& options, meta::TableFileSchema& table_f
return Status::OK();
}
Status
DeleteSegment(const DBMetaOptions& options, meta::TableFileSchema& table_file) {
utils::GetTableFilePath(options, table_file);
std::string segment_dir;
GetParentPath(table_file.location_, segment_dir);
boost::filesystem::remove_all(segment_dir);
return Status::OK();
}
Status
GetParentPath(const std::string& path, std::string& parent_path) {
boost::filesystem::path p(path);
parent_path = p.parent_path().string();
return Status::OK();
}
bool
IsSameIndex(const TableIndex& index1, const TableIndex& index2) {
return index1.engine_type_ == index2.engine_type_ && index1.nlist_ == index2.nlist_ &&

View File

@ -11,13 +11,13 @@
#pragma once
#include <ctime>
#include <string>
#include "Options.h"
#include "db/Types.h"
#include "db/meta/MetaTypes.h"
#include <ctime>
#include <string>
namespace milvus {
namespace engine {
namespace utils {
@ -36,6 +36,11 @@ Status
GetTableFilePath(const DBMetaOptions& options, meta::TableFileSchema& table_file);
Status
DeleteTableFilePath(const DBMetaOptions& options, meta::TableFileSchema& table_file);
Status
DeleteSegment(const DBMetaOptions& options, meta::TableFileSchema& table_file);
Status
GetParentPath(const std::string& path, std::string& parent_path);
bool
IsSameIndex(const TableIndex& index1, const TableIndex& index2);

View File

@ -84,8 +84,14 @@ class ExecutionEngine {
// virtual std::shared_ptr<ExecutionEngine>
// Clone() = 0;
// virtual Status
// Merge(const std::string& location) = 0;
virtual Status
Merge(const std::string& location) = 0;
GetVectorByID(const int64_t& id, float* vector, bool hybrid) = 0;
virtual Status
GetVectorByID(const int64_t& id, uint8_t* vector, bool hybrid) = 0;
virtual Status
Search(int64_t n, const float* data, int64_t k, int64_t nprobe, float* distances, int64_t* labels, bool hybrid) = 0;
@ -94,6 +100,10 @@ class ExecutionEngine {
Search(int64_t n, const uint8_t* data, int64_t k, int64_t nprobe, float* distances, int64_t* labels,
bool hybrid) = 0;
virtual Status
Search(int64_t n, const std::vector<int64_t>& ids, int64_t k, int64_t nprobe, float* distances, int64_t* labels,
bool hybrid) = 0;
virtual std::shared_ptr<ExecutionEngine>
BuildIndex(const std::string& location, EngineType engine_type) = 0;

View File

@ -11,13 +11,16 @@
#include "db/engine/ExecutionEngineImpl.h"
#include <faiss/utils/ConcurrentBitset.h>
#include <fiu-local.h>
#include <stdexcept>
#include <utility>
#include <vector>
#include "cache/CpuCacheMgr.h"
#include "cache/GpuCacheMgr.h"
#include "db/Utils.h"
#include "knowhere/common/Config.h"
#include "metrics/Metrics.h"
#include "scheduler/Utils.h"
@ -25,8 +28,8 @@
#include "utils/CommonUtil.h"
#include "utils/Exception.h"
#include "utils/Log.h"
#include "utils/TimeRecorder.h"
#include "utils/ValidationUtil.h"
#include "wrapper/BinVecImpl.h"
#include "wrapper/ConfAdapter.h"
#include "wrapper/ConfAdapterMgr.h"
@ -356,6 +359,7 @@ ExecutionEngineImpl::Serialize() {
return status;
}
/*
Status
ExecutionEngineImpl::Load(bool to_cache) {
index_ = std::static_pointer_cast<VecIndex>(cache::CpuCacheMgr::GetInstance()->GetIndex(location_));
@ -383,6 +387,134 @@ ExecutionEngineImpl::Load(bool to_cache) {
}
return Status::OK();
}
*/
Status
ExecutionEngineImpl::Load(bool to_cache) {
// TODO(zhiru): refactor
index_ = std::static_pointer_cast<VecIndex>(cache::CpuCacheMgr::GetInstance()->GetIndex(location_));
bool already_in_cache = (index_ != nullptr);
if (!already_in_cache) {
std::string segment_dir;
utils::GetParentPath(location_, segment_dir);
auto segment_reader_ptr = std::make_shared<segment::SegmentReader>(segment_dir);
if (index_type_ == EngineType::FAISS_IDMAP || index_type_ == EngineType::FAISS_BIN_IDMAP) {
index_ = index_type_ == EngineType::FAISS_IDMAP ? GetVecIndexFactory(IndexType::FAISS_IDMAP)
: GetVecIndexFactory(IndexType::FAISS_BIN_IDMAP);
TempMetaConf temp_conf;
temp_conf.gpu_id = gpu_num_;
temp_conf.dim = dim_;
auto status = MappingMetricType(metric_type_, temp_conf.metric_type);
if (!status.ok()) {
return status;
}
auto adapter = AdapterMgr::GetInstance().GetAdapter(index_->GetType());
auto conf = adapter->Match(temp_conf);
status = segment_reader_ptr->Load();
if (!status.ok()) {
std::string msg = "Failed to load segment from " + location_;
ENGINE_LOG_ERROR << msg;
return Status(DB_ERROR, msg);
}
segment::SegmentPtr segment_ptr;
segment_reader_ptr->GetSegment(segment_ptr);
auto& vectors = segment_ptr->vectors_ptr_;
auto& deleted_docs = segment_ptr->deleted_docs_ptr_->GetDeletedDocs();
auto vectors_uids = vectors->GetUids();
index_->SetUids(vectors_uids);
auto vectors_data = vectors->GetData();
faiss::ConcurrentBitsetPtr concurrent_bitset_ptr =
std::make_shared<faiss::ConcurrentBitset>(vectors->GetCount());
for (auto& offset : deleted_docs) {
if (!concurrent_bitset_ptr->test(offset)) {
concurrent_bitset_ptr->set(offset);
}
}
ErrorCode ec = KNOWHERE_UNEXPECTED_ERROR;
if (index_type_ == EngineType::FAISS_IDMAP) {
std::vector<float> float_vectors;
float_vectors.resize(vectors_data.size() / sizeof(float));
memcpy(float_vectors.data(), vectors_data.data(), vectors_data.size());
ec = std::static_pointer_cast<BFIndex>(index_)->Build(conf);
if (ec != KNOWHERE_SUCCESS) {
return status;
}
status = std::static_pointer_cast<BFIndex>(index_)->AddWithoutIds(vectors->GetCount(),
float_vectors.data(), Config());
status = std::static_pointer_cast<BFIndex>(index_)->SetBlacklist(concurrent_bitset_ptr);
} else if (index_type_ == EngineType::FAISS_BIN_IDMAP) {
ec = std::static_pointer_cast<BinBFIndex>(index_)->Build(conf);
if (ec != KNOWHERE_SUCCESS) {
return status;
}
status = std::static_pointer_cast<BinBFIndex>(index_)->AddWithoutIds(vectors->GetCount(),
vectors_data.data(), Config());
status = std::static_pointer_cast<BinBFIndex>(index_)->SetBlacklist(concurrent_bitset_ptr);
}
if (!status.ok()) {
return status;
}
ENGINE_LOG_DEBUG << "Finished loading raw data from segment " << segment_dir;
} else {
try {
double physical_size = PhysicalSize();
server::CollectExecutionEngineMetrics metrics(physical_size);
index_ = read_index(location_);
if (index_ == nullptr) {
std::string msg = "Failed to load index from " + location_;
ENGINE_LOG_ERROR << msg;
return Status(DB_ERROR, msg);
} else {
segment::DeletedDocsPtr deleted_docs_ptr;
auto status = segment_reader_ptr->LoadDeletedDocs(deleted_docs_ptr);
if (!status.ok()) {
std::string msg = "Failed to load deleted docs from " + location_;
ENGINE_LOG_ERROR << msg;
return Status(DB_ERROR, msg);
}
auto& deleted_docs = deleted_docs_ptr->GetDeletedDocs();
faiss::ConcurrentBitsetPtr concurrent_bitset_ptr =
std::make_shared<faiss::ConcurrentBitset>(index_->Count());
for (auto& offset : deleted_docs) {
if (!concurrent_bitset_ptr->test(offset)) {
concurrent_bitset_ptr->set(offset);
}
}
index_->SetBlacklist(concurrent_bitset_ptr);
std::vector<segment::doc_id_t> uids;
segment_reader_ptr->LoadUids(uids);
index_->SetUids(uids);
ENGINE_LOG_DEBUG << "Finished loading index file from segment " << segment_dir;
}
} catch (std::exception& e) {
ENGINE_LOG_ERROR << e.what();
return Status(DB_ERROR, e.what());
}
}
}
if (!already_in_cache && to_cache) {
Cache();
}
return Status::OK();
} // namespace engine
Status
ExecutionEngineImpl::CopyToGpu(uint64_t device_id, bool hybrid) {
@ -520,6 +652,7 @@ ExecutionEngineImpl::CopyToCpu() {
// return ret;
//}
/*
Status
ExecutionEngineImpl::Merge(const std::string& location) {
if (location == location_) {
@ -564,6 +697,7 @@ ExecutionEngineImpl::Merge(const std::string& location) {
return Status(DB_ERROR, "file index type is not idmap");
}
}
*/
ExecutionEnginePtr
ExecutionEngineImpl::BuildIndex(const std::string& location, EngineType engine_type) {
@ -664,6 +798,7 @@ ExecutionEngineImpl::Search(int64_t n, const float* data, int64_t k, int64_t npr
}
}
#endif
TimeRecorder rc("ExecutionEngineImpl::Search");
if (index_ == nullptr) {
ENGINE_LOG_ERROR << "ExecutionEngineImpl: index is null, failed to search";
@ -684,7 +819,20 @@ ExecutionEngineImpl::Search(int64_t n, const float* data, int64_t k, int64_t npr
HybridLoad();
}
rc.RecordSection("search prepare");
auto status = index_->Search(n, data, distances, labels, conf);
rc.RecordSection("search done");
// map offsets to ids
const std::vector<segment::doc_id_t>& uids = index_->GetUids();
for (int64_t i = 0; i < n * k; i++) {
int64_t offset = labels[i];
if (offset != -1) {
labels[i] = uids[offset];
}
}
rc.RecordSection("map uids");
if (hybrid) {
HybridUnset();
@ -699,6 +847,8 @@ ExecutionEngineImpl::Search(int64_t n, const float* data, int64_t k, int64_t npr
Status
ExecutionEngineImpl::Search(int64_t n, const uint8_t* data, int64_t k, int64_t nprobe, float* distances,
int64_t* labels, bool hybrid) {
TimeRecorder rc("ExecutionEngineImpl::Search");
if (index_ == nullptr) {
ENGINE_LOG_ERROR << "ExecutionEngineImpl: index is null, failed to search";
return Status(DB_ERROR, "index is null");
@ -718,7 +868,174 @@ ExecutionEngineImpl::Search(int64_t n, const uint8_t* data, int64_t k, int64_t n
HybridLoad();
}
rc.RecordSection("search prepare");
auto status = index_->Search(n, data, distances, labels, conf);
rc.RecordSection("search done");
// map offsets to ids
const std::vector<segment::doc_id_t>& uids = index_->GetUids();
for (int64_t i = 0; i < n * k; i++) {
int64_t offset = labels[i];
if (offset != -1) {
labels[i] = uids[offset];
}
}
rc.RecordSection("map uids");
if (hybrid) {
HybridUnset();
}
if (!status.ok()) {
ENGINE_LOG_ERROR << "Search error:" << status.message();
}
return status;
}
Status
ExecutionEngineImpl::Search(int64_t n, const std::vector<int64_t>& ids, int64_t k, int64_t nprobe, float* distances,
int64_t* labels, bool hybrid) {
TimeRecorder rc("ExecutionEngineImpl::Search");
if (index_ == nullptr) {
ENGINE_LOG_ERROR << "ExecutionEngineImpl: index is null, failed to search";
return Status(DB_ERROR, "index is null");
}
ENGINE_LOG_DEBUG << "Search by ids Params: [k] " << k << " [nprobe] " << nprobe;
// TODO(linxj): remove here. Get conf from function
TempMetaConf temp_conf;
temp_conf.k = k;
temp_conf.nprobe = nprobe;
auto adapter = AdapterMgr::GetInstance().GetAdapter(index_->GetType());
auto conf = adapter->MatchSearch(temp_conf, index_->GetType());
if (hybrid) {
HybridLoad();
}
rc.RecordSection("search prepare");
// std::string segment_dir;
// utils::GetParentPath(location_, segment_dir);
// segment::SegmentReader segment_reader(segment_dir);
// segment::IdBloomFilterPtr id_bloom_filter_ptr;
// segment_reader.LoadBloomFilter(id_bloom_filter_ptr);
// Check if the id is present. If so, find its offset
const std::vector<segment::doc_id_t>& uids = index_->GetUids();
std::vector<int64_t> offsets;
/*
std::vector<segment::doc_id_t> uids;
auto status = segment_reader.LoadUids(uids);
if (!status.ok()) {
return status;
}
*/
// There is only one id in ids
for (auto& id : ids) {
// if (id_bloom_filter_ptr->Check(id)) {
// if (uids.empty()) {
// segment_reader.LoadUids(uids);
// }
// auto found = std::find(uids.begin(), uids.end(), id);
// if (found != uids.end()) {
// auto offset = std::distance(uids.begin(), found);
// offsets.emplace_back(offset);
// }
// }
auto found = std::find(uids.begin(), uids.end(), id);
if (found != uids.end()) {
auto offset = std::distance(uids.begin(), found);
offsets.emplace_back(offset);
}
}
rc.RecordSection("get offset");
auto status = Status::OK();
if (!offsets.empty()) {
status = index_->SearchById(offsets.size(), offsets.data(), distances, labels, conf);
rc.RecordSection("search by id done");
// map offsets to ids
for (int64_t i = 0; i < offsets.size() * k; i++) {
int64_t offset = labels[i];
if (offset != -1) {
labels[i] = uids[offset];
}
}
rc.RecordSection("map uids");
}
if (hybrid) {
HybridUnset();
}
if (!status.ok()) {
ENGINE_LOG_ERROR << "Search error:" << status.message();
}
return status;
}
Status
ExecutionEngineImpl::GetVectorByID(const int64_t& id, float* vector, bool hybrid) {
if (index_ == nullptr) {
ENGINE_LOG_ERROR << "ExecutionEngineImpl: index is null, failed to search";
return Status(DB_ERROR, "index is null");
}
// TODO(linxj): remove here. Get conf from function
TempMetaConf temp_conf;
auto adapter = AdapterMgr::GetInstance().GetAdapter(index_->GetType());
auto conf = adapter->MatchSearch(temp_conf, index_->GetType());
if (hybrid) {
HybridLoad();
}
// Only one id for now
std::vector<int64_t> ids{id};
auto status = index_->GetVectorById(1, ids.data(), vector, conf);
if (hybrid) {
HybridUnset();
}
if (!status.ok()) {
ENGINE_LOG_ERROR << "Search error:" << status.message();
}
return status;
}
Status
ExecutionEngineImpl::GetVectorByID(const int64_t& id, uint8_t* vector, bool hybrid) {
if (index_ == nullptr) {
ENGINE_LOG_ERROR << "ExecutionEngineImpl: index is null, failed to search";
return Status(DB_ERROR, "index is null");
}
ENGINE_LOG_DEBUG << "Get binary vector by id: " << id;
// TODO(linxj): remove here. Get conf from function
TempMetaConf temp_conf;
auto adapter = AdapterMgr::GetInstance().GetAdapter(index_->GetType());
auto conf = adapter->MatchSearch(temp_conf, index_->GetType());
if (hybrid) {
HybridLoad();
}
// Only one id for now
std::vector<int64_t> ids{id};
auto status = index_->GetVectorById(1, ids.data(), vector, conf);
if (hybrid) {
HybridUnset();

View File

@ -11,11 +11,14 @@
#pragma once
#include "ExecutionEngine.h"
#include "wrapper/VecIndex.h"
#include <src/segment/SegmentReader.h>
#include <memory>
#include <string>
#include <vector>
#include "ExecutionEngine.h"
#include "wrapper/VecIndex.h"
namespace milvus {
namespace engine {
@ -64,8 +67,14 @@ class ExecutionEngineImpl : public ExecutionEngine {
// ExecutionEnginePtr
// Clone() override;
// Status
// Merge(const std::string& location) override;
Status
Merge(const std::string& location) override;
GetVectorByID(const int64_t& id, float* vector, bool hybrid) override;
Status
GetVectorByID(const int64_t& id, uint8_t* vector, bool hybrid) override;
Status
Search(int64_t n, const float* data, int64_t k, int64_t nprobe, float* distances, int64_t* labels,
@ -75,6 +84,10 @@ class ExecutionEngineImpl : public ExecutionEngine {
Search(int64_t n, const uint8_t* data, int64_t k, int64_t nprobe, float* distances, int64_t* labels,
bool hybrid = false) override;
Status
Search(int64_t n, const std::vector<int64_t>& ids, int64_t k, int64_t nprobe, float* distances, int64_t* labels,
bool hybrid) override;
ExecutionEnginePtr
BuildIndex(const std::string& location, EngineType engine_type) override;

View File

@ -11,23 +11,40 @@
#pragma once
#include "db/Types.h"
#include "utils/Status.h"
#include <memory>
#include <set>
#include <string>
#include "db/Types.h"
#include "utils/Status.h"
namespace milvus {
namespace engine {
class MemManager {
public:
virtual Status
InsertVectors(const std::string& table_id, VectorsData& vectors) = 0;
InsertVectors(const std::string& table_id, int64_t length, const IDNumber* vector_ids, int64_t dim,
const float* vectors, uint64_t lsn, std::set<std::string>& flushed_tables) = 0;
virtual Status
Serialize(std::set<std::string>& table_ids) = 0;
InsertVectors(const std::string& table_id, int64_t length, const IDNumber* vector_ids, int64_t dim,
const uint8_t* vectors, uint64_t lsn, std::set<std::string>& flushed_tables) = 0;
virtual Status
DeleteVector(const std::string& table_id, IDNumber vector_id, uint64_t lsn) = 0;
virtual Status
DeleteVectors(const std::string& table_id, int64_t length, const IDNumber* vector_ids, uint64_t lsn) = 0;
virtual Status
Flush(const std::string& table_id) = 0;
virtual Status
Flush(std::set<std::string>& table_ids) = 0;
// virtual Status
// Serialize(std::set<std::string>& table_ids) = 0;
virtual Status
EraseMemVector(const std::string& table_id) = 0;

View File

@ -10,12 +10,13 @@
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "db/insert/MemManagerImpl.h"
#include <thread>
#include "VectorSource.h"
#include "db/Constants.h"
#include "utils/Log.h"
#include <thread>
namespace milvus {
namespace engine {
@ -31,37 +32,177 @@ MemManagerImpl::GetMemByTable(const std::string& table_id) {
}
Status
MemManagerImpl::InsertVectors(const std::string& table_id, VectorsData& vectors) {
while (GetCurrentMem() > options_.insert_buffer_size_) {
std::this_thread::sleep_for(std::chrono::milliseconds(1));
MemManagerImpl::InsertVectors(const std::string& table_id, int64_t length, const IDNumber* vector_ids, int64_t dim,
const float* vectors, uint64_t lsn, std::set<std::string>& flushed_tables) {
flushed_tables.clear();
if (GetCurrentMem() > options_.insert_buffer_size_) {
ENGINE_LOG_DEBUG << "Insert buffer size exceeds limit. Performing force flush";
auto status = Flush(flushed_tables);
if (!status.ok()) {
return status;
}
}
VectorsData vectors_data;
vectors_data.vector_count_ = length;
vectors_data.float_data_.resize(length * dim);
memcpy(vectors_data.float_data_.data(), vectors, length * dim * sizeof(float));
vectors_data.id_array_.resize(length);
memcpy(vectors_data.id_array_.data(), vector_ids, length * sizeof(IDNumber));
VectorSourcePtr source = std::make_shared<VectorSource>(vectors_data);
std::unique_lock<std::mutex> lock(mutex_);
return InsertVectorsNoLock(table_id, vectors);
return InsertVectorsNoLock(table_id, source, lsn);
}
Status
MemManagerImpl::InsertVectorsNoLock(const std::string& table_id, VectorsData& vectors) {
MemTablePtr mem = GetMemByTable(table_id);
VectorSourcePtr source = std::make_shared<VectorSource>(vectors);
auto status = mem->Add(source);
if (status.ok()) {
if (vectors.id_array_.empty()) {
vectors.id_array_ = source->GetVectorIds();
MemManagerImpl::InsertVectors(const std::string& table_id, int64_t length, const IDNumber* vector_ids, int64_t dim,
const uint8_t* vectors, uint64_t lsn, std::set<std::string>& flushed_tables) {
flushed_tables.clear();
if (GetCurrentMem() > options_.insert_buffer_size_) {
ENGINE_LOG_DEBUG << "Insert buffer size exceeds limit. Performing force flush";
auto status = Flush(flushed_tables);
if (!status.ok()) {
return status;
}
}
VectorsData vectors_data;
vectors_data.vector_count_ = length;
vectors_data.binary_data_.resize(length * dim);
memcpy(vectors_data.binary_data_.data(), vectors, length * dim * sizeof(uint8_t));
vectors_data.id_array_.resize(length);
memcpy(vectors_data.id_array_.data(), vector_ids, length * sizeof(IDNumber));
VectorSourcePtr source = std::make_shared<VectorSource>(vectors_data);
std::unique_lock<std::mutex> lock(mutex_);
return InsertVectorsNoLock(table_id, source, lsn);
}
Status
MemManagerImpl::InsertVectorsNoLock(const std::string& table_id, const VectorSourcePtr& source, uint64_t lsn) {
MemTablePtr mem = GetMemByTable(table_id);
mem->SetLSN(lsn);
auto status = mem->Add(source);
return status;
}
Status
MemManagerImpl::DeleteVector(const std::string& table_id, IDNumber vector_id, uint64_t lsn) {
std::unique_lock<std::mutex> lock(mutex_);
MemTablePtr mem = GetMemByTable(table_id);
mem->SetLSN(lsn);
auto status = mem->Delete(vector_id);
return status;
}
Status
MemManagerImpl::DeleteVectors(const std::string& table_id, int64_t length, const IDNumber* vector_ids, uint64_t lsn) {
std::unique_lock<std::mutex> lock(mutex_);
MemTablePtr mem = GetMemByTable(table_id);
mem->SetLSN(lsn);
IDNumbers ids;
ids.resize(length);
memcpy(ids.data(), vector_ids, length * sizeof(IDNumber));
auto status = mem->Delete(ids);
if (!status.ok()) {
return status;
}
// // TODO(zhiru): loop for now
// for (auto& id : ids) {
// auto status = mem->Delete(id);
// if (!status.ok()) {
// return status;
// }
// }
return Status::OK();
}
Status
MemManagerImpl::Flush(const std::string& table_id) {
ToImmutable(table_id);
// TODO: There is actually only one memTable in the immutable list
MemList temp_immutable_list;
{
std::unique_lock<std::mutex> lock(mutex_);
immu_mem_list_.swap(temp_immutable_list);
}
std::unique_lock<std::mutex> lock(serialization_mtx_);
auto max_lsn = GetMaxLSN(temp_immutable_list);
for (auto& mem : temp_immutable_list) {
ENGINE_LOG_DEBUG << "Flushing table: " << mem->GetTableId();
auto status = mem->Serialize(max_lsn);
if (!status.ok()) {
ENGINE_LOG_ERROR << "Flush table " << mem->GetTableId() << " failed";
return status;
}
ENGINE_LOG_DEBUG << "Flushed table: " << mem->GetTableId();
}
return Status::OK();
}
Status
MemManagerImpl::Flush(std::set<std::string>& table_ids) {
ToImmutable();
MemList temp_immutable_list;
{
std::unique_lock<std::mutex> lock(mutex_);
immu_mem_list_.swap(temp_immutable_list);
}
std::unique_lock<std::mutex> lock(serialization_mtx_);
table_ids.clear();
auto max_lsn = GetMaxLSN(temp_immutable_list);
for (auto& mem : temp_immutable_list) {
ENGINE_LOG_DEBUG << "Flushing table: " << mem->GetTableId();
auto status = mem->Serialize(max_lsn);
if (!status.ok()) {
ENGINE_LOG_ERROR << "Flush table " << mem->GetTableId() << " failed";
return status;
}
table_ids.insert(mem->GetTableId());
ENGINE_LOG_DEBUG << "Flushed table: " << mem->GetTableId();
}
meta_->SetGlobalLastLSN(max_lsn);
return Status::OK();
}
Status
MemManagerImpl::ToImmutable(const std::string& table_id) {
std::unique_lock<std::mutex> lock(mutex_);
auto memIt = mem_id_map_.find(table_id);
if (memIt != mem_id_map_.end()) {
if (!memIt->second->Empty()) {
immu_mem_list_.push_back(memIt->second);
mem_id_map_.erase(memIt);
}
// std::string err_msg = "Could not find table = " + table_id + " to flush";
// ENGINE_LOG_ERROR << err_msg;
// return Status(DB_NOT_FOUND, err_msg);
}
return Status::OK();
}
Status
MemManagerImpl::ToImmutable() {
std::unique_lock<std::mutex> lock(mutex_);
MemIdMap temp_map;
for (auto& kv : mem_id_map_) {
if (kv.second->Empty()) {
// empty table, no need to serialize
// empty table without any deletes, no need to serialize
temp_map.insert(kv);
} else {
immu_mem_list_.push_back(kv.second);
@ -72,19 +213,6 @@ MemManagerImpl::ToImmutable() {
return Status::OK();
}
Status
MemManagerImpl::Serialize(std::set<std::string>& table_ids) {
ToImmutable();
std::unique_lock<std::mutex> lock(serialization_mtx_);
table_ids.clear();
for (auto& mem : immu_mem_list_) {
mem->Serialize();
table_ids.insert(mem->GetTableId());
}
immu_mem_list_.clear();
return Status::OK();
}
Status
MemManagerImpl::EraseMemVector(const std::string& table_id) {
{ // erase MemVector from rapid-insert cache
@ -132,5 +260,17 @@ MemManagerImpl::GetCurrentMem() {
return GetCurrentMutableMem() + GetCurrentImmutableMem();
}
uint64_t
MemManagerImpl::GetMaxLSN(const MemList& tables) {
uint64_t max_lsn = 0;
for (auto& table : tables) {
auto cur_lsn = table->GetLSN();
if (table->GetLSN() > max_lsn) {
max_lsn = cur_lsn;
}
}
return max_lsn;
}
} // namespace engine
} // namespace milvus

View File

@ -11,12 +11,6 @@
#pragma once
#include "MemManager.h"
#include "MemTable.h"
#include "db/meta/Meta.h"
#include "server/Config.h"
#include "utils/Status.h"
#include <ctime>
#include <map>
#include <memory>
@ -25,12 +19,20 @@
#include <string>
#include <vector>
#include "MemManager.h"
#include "MemTable.h"
#include "db/meta/Meta.h"
#include "server/Config.h"
#include "utils/Status.h"
namespace milvus {
namespace engine {
class MemManagerImpl : public MemManager {
public:
using Ptr = std::shared_ptr<MemManagerImpl>;
using MemIdMap = std::map<std::string, MemTablePtr>;
using MemList = std::vector<MemTablePtr>;
MemManagerImpl(const meta::MetaPtr& meta, const DBOptions& options) : meta_(meta), options_(options) {
server::Config& config = server::Config::GetInstance();
@ -56,10 +58,27 @@ class MemManagerImpl : public MemManager {
}
Status
InsertVectors(const std::string& table_id, VectorsData& vectors) override;
InsertVectors(const std::string& table_id, int64_t length, const IDNumber* vector_ids, int64_t dim,
const float* vectors, uint64_t lsn, std::set<std::string>& flushed_tables) override;
Status
Serialize(std::set<std::string>& table_ids) override;
InsertVectors(const std::string& table_id, int64_t length, const IDNumber* vector_ids, int64_t dim,
const uint8_t* vectors, uint64_t lsn, std::set<std::string>& flushed_tables) override;
Status
DeleteVector(const std::string& table_id, IDNumber vector_id, uint64_t lsn) override;
Status
DeleteVectors(const std::string& table_id, int64_t length, const IDNumber* vector_ids, uint64_t lsn) override;
Status
Flush(const std::string& table_id) override;
Status
Flush(std::set<std::string>& table_ids) override;
// Status
// Serialize(std::set<std::string>& table_ids) override;
Status
EraseMemVector(const std::string& table_id) override;
@ -78,12 +97,17 @@ class MemManagerImpl : public MemManager {
GetMemByTable(const std::string& table_id);
Status
InsertVectorsNoLock(const std::string& table_id, VectorsData& vectors);
InsertVectorsNoLock(const std::string& table_id, const VectorSourcePtr& source, uint64_t lsn);
Status
ToImmutable();
using MemIdMap = std::map<std::string, MemTablePtr>;
using MemList = std::vector<MemTablePtr>;
Status
ToImmutable(const std::string& table_id);
uint64_t
GetMaxLSN(const MemList& tables);
std::string identity_;
MemIdMap mem_id_map_;
MemList immu_mem_list_;

View File

@ -10,10 +10,20 @@
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "db/insert/MemTable.h"
#include "utils/Log.h"
#include <cache/CpuCacheMgr.h>
#include <segment/SegmentReader.h>
#include <wrapper/VecIndex.h>
#include <algorithm>
#include <chrono>
#include <memory>
#include <string>
#include <unordered_map>
#include "db/OngoingFileChecker.h"
#include "db/Utils.h"
#include "utils/Log.h"
namespace milvus {
namespace engine {
@ -23,7 +33,7 @@ MemTable::MemTable(const std::string& table_id, const meta::MetaPtr& meta, const
}
Status
MemTable::Add(VectorSourcePtr& source) {
MemTable::Add(const VectorSourcePtr& source) {
while (!source->AllAdded()) {
MemTableFilePtr current_mem_table_file;
if (!mem_table_file_list_.empty()) {
@ -50,6 +60,32 @@ MemTable::Add(VectorSourcePtr& source) {
return Status::OK();
}
Status
MemTable::Delete(segment::doc_id_t doc_id) {
// Locate which table file the doc id lands in
for (auto& table_file : mem_table_file_list_) {
table_file->Delete(doc_id);
}
// Add the id to delete list so it can be applied to other segments on disk during the next flush
doc_ids_to_delete_.insert(doc_id);
return Status::OK();
}
Status
MemTable::Delete(const std::vector<segment::doc_id_t>& doc_ids) {
// Locate which table file the doc id lands in
for (auto& table_file : mem_table_file_list_) {
table_file->Delete(doc_ids);
}
// Add the id to delete list so it can be applied to other segments on disk during the next flush
for (auto& id : doc_ids) {
doc_ids_to_delete_.insert(id);
}
return Status::OK();
}
void
MemTable::GetCurrentMemTableFile(MemTableFilePtr& mem_table_file) {
mem_table_file = mem_table_file_list_.back();
@ -61,23 +97,48 @@ MemTable::GetTableFileCount() {
}
Status
MemTable::Serialize() {
for (auto mem_table_file = mem_table_file_list_.begin(); mem_table_file != mem_table_file_list_.end();) {
auto status = (*mem_table_file)->Serialize();
MemTable::Serialize(uint64_t wal_lsn) {
auto start = std::chrono::high_resolution_clock::now();
if (!doc_ids_to_delete_.empty()) {
auto status = ApplyDeletes();
if (!status.ok()) {
std::string err_msg = "Insert data serialize failed: " + status.ToString();
ENGINE_LOG_ERROR << err_msg;
return Status(DB_ERROR, err_msg);
return Status(DB_ERROR, status.message());
}
std::lock_guard<std::mutex> lock(mutex_);
mem_table_file = mem_table_file_list_.erase(mem_table_file);
}
for (auto mem_table_file = mem_table_file_list_.begin(); mem_table_file != mem_table_file_list_.end();) {
auto status = (*mem_table_file)->Serialize(wal_lsn);
if (!status.ok()) {
return status;
}
ENGINE_LOG_DEBUG << "Flushed segment " << (*mem_table_file)->GetSegmentId();
{
std::lock_guard<std::mutex> lock(mutex_);
mem_table_file = mem_table_file_list_.erase(mem_table_file);
}
}
// Update flush lsn
auto status = meta_->UpdateTableFlushLSN(table_id_, wal_lsn);
if (!status.ok()) {
std::string err_msg = "Failed to write flush lsn to meta: " + status.ToString();
ENGINE_LOG_ERROR << err_msg;
return Status(DB_ERROR, err_msg);
}
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> diff = end - start;
ENGINE_LOG_DEBUG << "Finished flushing for table " << table_id_ << " in " << diff.count() << " s";
return Status::OK();
}
bool
MemTable::Empty() {
return mem_table_file_list_.empty();
return mem_table_file_list_.empty() && doc_ids_to_delete_.empty();
}
const std::string&
@ -95,5 +156,236 @@ MemTable::GetCurrentMem() {
return total_mem;
}
Status
MemTable::ApplyDeletes() {
// Applying deletes to other segments on disk and their corresponding cache:
// For each segment in table:
// Load its bloom filter
// For each id in delete list:
// If present, add the uid to segment's uid list
// For each segment
// Get its cache if exists
// Load its uids file.
// Scan the uids, if any uid in segment's uid list exists:
// add its offset to deletedDoc
// remove the id from bloom filter
// set black list in cache
// Serialize segment's deletedDoc TODO(zhiru): append directly to previous file for now, may have duplicates
// Serialize bloom filter
ENGINE_LOG_DEBUG << "Applying " << doc_ids_to_delete_.size() << " deletes in table: " << table_id_;
auto start_total = std::chrono::high_resolution_clock::now();
auto start = std::chrono::high_resolution_clock::now();
std::vector<int> file_types{meta::TableFileSchema::FILE_TYPE::RAW, meta::TableFileSchema::FILE_TYPE::TO_INDEX,
meta::TableFileSchema::FILE_TYPE::BACKUP};
meta::TableFilesSchema table_files;
auto status = meta_->FilesByType(table_id_, file_types, table_files);
if (!status.ok()) {
std::string err_msg = "Failed to apply deletes: " + status.ToString();
ENGINE_LOG_ERROR << err_msg;
return Status(DB_ERROR, err_msg);
}
OngoingFileChecker::GetInstance().MarkOngoingFiles(table_files);
std::unordered_map<size_t, std::vector<segment::doc_id_t>> ids_to_check_map;
for (size_t i = 0; i < table_files.size(); ++i) {
auto& table_file = table_files[i];
std::string segment_dir;
utils::GetParentPath(table_file.location_, segment_dir);
segment::SegmentReader segment_reader(segment_dir);
segment::IdBloomFilterPtr id_bloom_filter_ptr;
segment_reader.LoadBloomFilter(id_bloom_filter_ptr);
for (auto& id : doc_ids_to_delete_) {
if (id_bloom_filter_ptr->Check(id)) {
ids_to_check_map[i].emplace_back(id);
}
}
}
meta::TableFilesSchema files_to_check;
for (auto& kv : ids_to_check_map) {
files_to_check.emplace_back(table_files[kv.first]);
}
OngoingFileChecker::GetInstance().UnmarkOngoingFiles(table_files);
OngoingFileChecker::GetInstance().MarkOngoingFiles(files_to_check);
auto end = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> diff = end - start;
ENGINE_LOG_DEBUG << "Found " << ids_to_check_map.size() << " segment to apply deletes in " << diff.count() << " s";
meta::TableFilesSchema table_files_to_update;
for (auto& kv : ids_to_check_map) {
auto& table_file = table_files[kv.first];
ENGINE_LOG_DEBUG << "Applying deletes in segment: " << table_file.segment_id_;
start = std::chrono::high_resolution_clock::now();
std::string segment_dir;
utils::GetParentPath(table_file.location_, segment_dir);
segment::SegmentReader segment_reader(segment_dir);
auto index =
std::static_pointer_cast<VecIndex>(cache::CpuCacheMgr::GetInstance()->GetIndex(table_file.location_));
faiss::ConcurrentBitsetPtr blacklist = nullptr;
if (index != nullptr) {
status = index->GetBlacklist(blacklist);
}
std::vector<segment::doc_id_t> uids;
status = segment_reader.LoadUids(uids);
if (!status.ok()) {
break;
}
segment::IdBloomFilterPtr id_bloom_filter_ptr;
status = segment_reader.LoadBloomFilter(id_bloom_filter_ptr);
if (!status.ok()) {
break;
}
auto& ids_to_check = kv.second;
segment::DeletedDocsPtr deleted_docs = std::make_shared<segment::DeletedDocs>();
end = std::chrono::high_resolution_clock::now();
diff = end - start;
ENGINE_LOG_DEBUG << "Loading uids and deleted docs took " << diff.count() << " s";
start = std::chrono::high_resolution_clock::now();
std::sort(ids_to_check.begin(), ids_to_check.end());
end = std::chrono::high_resolution_clock::now();
diff = end - start;
ENGINE_LOG_DEBUG << "Sorting " << ids_to_check.size() << " ids took " << diff.count() << " s";
size_t delete_count = 0;
auto find_diff = std::chrono::duration<double>::zero();
auto set_diff = std::chrono::duration<double>::zero();
for (size_t i = 0; i < uids.size(); ++i) {
auto find_start = std::chrono::high_resolution_clock::now();
auto found = std::binary_search(ids_to_check.begin(), ids_to_check.end(), uids[i]);
auto find_end = std::chrono::high_resolution_clock::now();
find_diff += (find_end - find_start);
if (found) {
auto set_start = std::chrono::high_resolution_clock::now();
delete_count++;
deleted_docs->AddDeletedDoc(i);
if (id_bloom_filter_ptr->Check(uids[i])) {
id_bloom_filter_ptr->Remove(uids[i]);
}
if (blacklist != nullptr) {
if (!blacklist->test(i)) {
blacklist->set(i);
}
}
auto set_end = std::chrono::high_resolution_clock::now();
set_diff += (set_end - set_start);
}
}
ENGINE_LOG_DEBUG << "Finding " << ids_to_check.size() << " uids in " << uids.size() << " uids took "
<< find_diff.count() << " s in total";
ENGINE_LOG_DEBUG << "Setting deleted docs and bloom filter took " << set_diff.count() << " s in total";
if (index != nullptr) {
index->SetBlacklist(blacklist);
}
start = std::chrono::high_resolution_clock::now();
segment::Segment tmp_segment;
segment::SegmentWriter segment_writer(segment_dir);
status = segment_writer.WriteDeletedDocs(deleted_docs);
if (!status.ok()) {
break;
}
end = std::chrono::high_resolution_clock::now();
diff = end - start;
ENGINE_LOG_DEBUG << "Appended " << deleted_docs->GetSize()
<< " offsets to deleted docs in segment: " << table_file.segment_id_ << " in " << diff.count()
<< " s";
start = std::chrono::high_resolution_clock::now();
status = segment_writer.WriteBloomFilter(id_bloom_filter_ptr);
if (!status.ok()) {
break;
}
end = std::chrono::high_resolution_clock::now();
diff = end - start;
ENGINE_LOG_DEBUG << "Updated bloom filter in segment: " << table_file.segment_id_ << " in " << diff.count()
<< " s";
// Update table file row count
start = std::chrono::high_resolution_clock::now();
auto& segment_id = table_file.segment_id_;
meta::TableFilesSchema segment_files;
status = meta_->GetTableFilesBySegmentId(segment_id, segment_files);
if (!status.ok()) {
break;
}
for (auto& file : segment_files) {
if (file.file_type_ == meta::TableFileSchema::RAW || file.file_type_ == meta::TableFileSchema::TO_INDEX ||
file.file_type_ == meta::TableFileSchema::INDEX || file.file_type_ == meta::TableFileSchema::BACKUP) {
file.row_count_ -= delete_count;
table_files_to_update.emplace_back(file);
}
}
}
end = std::chrono::high_resolution_clock::now();
diff = end - start;
status = meta_->UpdateTableFiles(table_files_to_update);
ENGINE_LOG_DEBUG << "Updated meta in table: " << table_id_ << " in " << diff.count() << " s";
if (!status.ok()) {
std::string err_msg = "Failed to apply deletes: " + status.ToString();
ENGINE_LOG_ERROR << err_msg;
return Status(DB_ERROR, err_msg);
}
doc_ids_to_delete_.clear();
auto end_total = std::chrono::high_resolution_clock::now();
std::chrono::duration<double> diff_total = end_total - start_total;
ENGINE_LOG_DEBUG << "Finished applying deletes in table " << table_id_ << " in " << diff_total.count() << " s";
OngoingFileChecker::GetInstance().UnmarkOngoingFiles(files_to_check);
return Status::OK();
}
uint64_t
MemTable::GetLSN() {
return lsn_;
}
void
MemTable::SetLSN(uint64_t lsn) {
lsn_ = lsn;
}
} // namespace engine
} // namespace milvus

View File

@ -11,15 +11,17 @@
#pragma once
#include <atomic>
#include <memory>
#include <mutex>
#include <set>
#include <string>
#include <vector>
#include "MemTableFile.h"
#include "VectorSource.h"
#include "utils/Status.h"
#include <memory>
#include <mutex>
#include <string>
#include <vector>
namespace milvus {
namespace engine {
@ -30,7 +32,13 @@ class MemTable {
MemTable(const std::string& table_id, const meta::MetaPtr& meta, const DBOptions& options);
Status
Add(VectorSourcePtr& source);
Add(const VectorSourcePtr& source);
Status
Delete(segment::doc_id_t doc_id);
Status
Delete(const std::vector<segment::doc_id_t>& doc_ids);
void
GetCurrentMemTableFile(MemTableFilePtr& mem_table_file);
@ -39,7 +47,7 @@ class MemTable {
GetTableFileCount();
Status
Serialize();
Serialize(uint64_t wal_lsn);
bool
Empty();
@ -50,6 +58,16 @@ class MemTable {
size_t
GetCurrentMem();
uint64_t
GetLSN();
void
SetLSN(uint64_t lsn);
private:
Status
ApplyDeletes();
private:
const std::string table_id_;
@ -60,6 +78,10 @@ class MemTable {
DBOptions options_;
std::mutex mutex_;
std::set<segment::doc_id_t> doc_ids_to_delete_;
std::atomic<uint64_t> lsn_;
}; // MemTable
using MemTablePtr = std::shared_ptr<MemTable>;

View File

@ -10,13 +10,20 @@
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "db/insert/MemTableFile.h"
#include <algorithm>
#include <cmath>
#include <iterator>
#include <string>
#include <vector>
#include "db/Constants.h"
#include "db/Utils.h"
#include "db/engine/EngineFactory.h"
#include "metrics/Metrics.h"
#include "segment/SegmentReader.h"
#include "utils/Log.h"
#include <cmath>
#include <string>
#include "utils/ValidationUtil.h"
namespace milvus {
namespace engine {
@ -26,9 +33,12 @@ MemTableFile::MemTableFile(const std::string& table_id, const meta::MetaPtr& met
current_mem_ = 0;
auto status = CreateTableFile();
if (status.ok()) {
execution_engine_ = EngineFactory::Build(
/*execution_engine_ = EngineFactory::Build(
table_file_schema_.dimension_, table_file_schema_.location_, (EngineType)table_file_schema_.engine_type_,
(MetricType)table_file_schema_.metric_type_, table_file_schema_.nlist_);
(MetricType)table_file_schema_.metric_type_, table_file_schema_.nlist_);*/
std::string directory;
utils::GetParentPath(table_file_schema_.location_, directory);
segment_writer_ptr_ = std::make_shared<segment::SegmentWriter>(directory);
}
}
@ -47,7 +57,7 @@ MemTableFile::CreateTableFile() {
}
Status
MemTableFile::Add(VectorSourcePtr& source) {
MemTableFile::Add(const VectorSourcePtr& source) {
if (table_file_schema_.dimension_ <= 0) {
std::string err_msg =
"MemTableFile::Add: table_file_schema dimension = " + std::to_string(table_file_schema_.dimension_) +
@ -61,7 +71,9 @@ MemTableFile::Add(VectorSourcePtr& source) {
if (mem_left >= single_vector_mem_size) {
size_t num_vectors_to_add = std::ceil(mem_left / single_vector_mem_size);
size_t num_vectors_added;
auto status = source->Add(execution_engine_, table_file_schema_, num_vectors_to_add, num_vectors_added);
auto status = source->Add(/*execution_engine_,*/ segment_writer_ptr_, table_file_schema_, num_vectors_to_add,
num_vectors_added);
if (status.ok()) {
current_mem_ += (num_vectors_added * single_vector_mem_size);
}
@ -70,6 +82,39 @@ MemTableFile::Add(VectorSourcePtr& source) {
return Status::OK();
}
Status
MemTableFile::Delete(segment::doc_id_t doc_id) {
segment::SegmentPtr segment_ptr;
segment_writer_ptr_->GetSegment(segment_ptr);
// Check wither the doc_id is present, if yes, delete it's corresponding buffer
auto uids = segment_ptr->vectors_ptr_->GetUids();
auto found = std::find(uids.begin(), uids.end(), doc_id);
if (found != uids.end()) {
auto offset = std::distance(uids.begin(), found);
segment_ptr->vectors_ptr_->Erase(offset);
}
return Status::OK();
}
Status
MemTableFile::Delete(const std::vector<segment::doc_id_t>& doc_ids) {
segment::SegmentPtr segment_ptr;
segment_writer_ptr_->GetSegment(segment_ptr);
// Check wither the doc_id is present, if yes, delete it's corresponding buffer
auto uids = segment_ptr->vectors_ptr_->GetUids();
for (auto& doc_id : doc_ids) {
auto found = std::find(uids.begin(), uids.end(), doc_id);
if (found != uids.end()) {
auto offset = std::distance(uids.begin(), found);
segment_ptr->vectors_ptr_->Erase(offset);
uids = segment_ptr->vectors_ptr_->GetUids();
}
}
return Status::OK();
}
size_t
MemTableFile::GetCurrentMem() {
return current_mem_;
@ -87,15 +132,35 @@ MemTableFile::IsFull() {
}
Status
MemTableFile::Serialize() {
MemTableFile::Serialize(uint64_t wal_lsn) {
size_t size = GetCurrentMem();
server::CollectSerializeMetrics metrics(size);
execution_engine_->Serialize();
table_file_schema_.file_size_ = execution_engine_->PhysicalSize();
table_file_schema_.row_count_ = execution_engine_->Count();
auto status = segment_writer_ptr_->Serialize();
if (!status.ok()) {
ENGINE_LOG_ERROR << "Failed to serialize segment: " << table_file_schema_.segment_id_;
// if index type isn't IDMAP, set file type to TO_INDEX if file size execeed index_file_size
/* Can't mark it as to_delete because data is stored in this mem table file. Any further flush
* will try to serialize the same mem table file and it won't be able to find the directory
* to write to or update the associated table file in meta.
*
table_file_schema_.file_type_ = meta::TableFileSchema::TO_DELETE;
meta_->UpdateTableFile(table_file_schema_);
ENGINE_LOG_DEBUG << "Failed to serialize segment, mark file: " << table_file_schema_.file_id_
<< " to to_delete";
*/
return status;
}
// execution_engine_->Serialize();
// TODO(zhiru):
// table_file_schema_.file_size_ = execution_engine_->PhysicalSize();
// table_file_schema_.row_count_ = execution_engine_->Count();
table_file_schema_.file_size_ = segment_writer_ptr_->Size();
table_file_schema_.row_count_ = segment_writer_ptr_->VectorCount();
// if index type isn't IDMAP, set file type to TO_INDEX if file size exceed index_file_size
// else set file type to RAW, no need to build index
if (table_file_schema_.engine_type_ != (int)EngineType::FAISS_IDMAP &&
table_file_schema_.engine_type_ != (int)EngineType::FAISS_BIN_IDMAP) {
@ -105,17 +170,32 @@ MemTableFile::Serialize() {
table_file_schema_.file_type_ = meta::TableFileSchema::RAW;
}
auto status = meta_->UpdateTableFile(table_file_schema_);
// Set table file's flush_lsn so WAL can roll back and delete garbage files which can be obtained from
// GetTableFilesByFlushLSN() in meta.
table_file_schema_.flush_lsn_ = wal_lsn;
status = meta_->UpdateTableFile(table_file_schema_);
ENGINE_LOG_DEBUG << "New " << ((table_file_schema_.file_type_ == meta::TableFileSchema::RAW) ? "raw" : "to_index")
<< " file " << table_file_schema_.file_id_ << " of size " << size << " bytes";
<< " file " << table_file_schema_.file_id_ << " of size " << size << " bytes, lsn = " << wal_lsn;
// TODO(zhiru): cache
/*
if (options_.insert_cache_immediately_) {
execution_engine_->Cache();
}
*/
if (options_.insert_cache_immediately_) {
execution_engine_->Cache();
segment_writer_ptr_->Cache();
}
return status;
}
const std::string&
MemTableFile::GetSegmentId() const {
return table_file_schema_.segment_id_;
}
} // namespace engine
} // namespace milvus

View File

@ -11,14 +11,17 @@
#pragma once
#include <segment/SegmentWriter.h>
#include <memory>
#include <string>
#include <vector>
#include "VectorSource.h"
#include "db/engine/ExecutionEngine.h"
#include "db/meta/Meta.h"
#include "utils/Status.h"
#include <memory>
#include <string>
namespace milvus {
namespace engine {
@ -27,7 +30,13 @@ class MemTableFile {
MemTableFile(const std::string& table_id, const meta::MetaPtr& meta, const DBOptions& options);
Status
Add(VectorSourcePtr& source);
Add(const VectorSourcePtr& source);
Status
Delete(segment::doc_id_t doc_id);
Status
Delete(const std::vector<segment::doc_id_t>& doc_ids);
size_t
GetCurrentMem();
@ -39,7 +48,10 @@ class MemTableFile {
IsFull();
Status
Serialize();
Serialize(uint64_t wal_lsn);
const std::string&
GetSegmentId() const;
private:
Status
@ -52,7 +64,8 @@ class MemTableFile {
DBOptions options_;
size_t current_mem_;
ExecutionEnginePtr execution_engine_;
// ExecutionEnginePtr execution_engine_;
segment::SegmentWriterPtr segment_writer_ptr_;
}; // MemTableFile
using MemTableFilePtr = std::shared_ptr<MemTableFile>;

View File

@ -10,6 +10,10 @@
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "db/insert/VectorSource.h"
#include <utility>
#include <vector>
#include "db/engine/EngineFactory.h"
#include "db/engine/ExecutionEngine.h"
#include "metrics/Metrics.h"
@ -18,14 +22,15 @@
namespace milvus {
namespace engine {
VectorSource::VectorSource(VectorsData& vectors)
: vectors_(vectors), id_generator_(std::make_shared<SimpleIDGenerator>()) {
VectorSource::VectorSource(VectorsData vectors)
: vectors_(std::move(vectors)), id_generator_(std::make_shared<SimpleIDGenerator>()) {
current_num_vectors_added = 0;
}
Status
VectorSource::Add(const ExecutionEnginePtr& execution_engine, const meta::TableFileSchema& table_file_schema,
const size_t& num_vectors_to_add, size_t& num_vectors_added) {
VectorSource::Add(/*const ExecutionEnginePtr& execution_engine,*/ const segment::SegmentWriterPtr& segment_writer_ptr,
const meta::TableFileSchema& table_file_schema, const size_t& num_vectors_to_add,
size_t& num_vectors_added) {
uint64_t n = vectors_.vector_count_;
server::CollectAddMetrics metrics(n, table_file_schema.dimension_);
@ -36,25 +41,46 @@ VectorSource::Add(const ExecutionEnginePtr& execution_engine, const meta::TableF
id_generator_->GetNextIDNumbers(num_vectors_added, vector_ids_to_add);
} else {
vector_ids_to_add.resize(num_vectors_added);
for (int pos = current_num_vectors_added; pos < current_num_vectors_added + num_vectors_added; pos++) {
for (size_t pos = current_num_vectors_added; pos < current_num_vectors_added + num_vectors_added; pos++) {
vector_ids_to_add[pos - current_num_vectors_added] = vectors_.id_array_[pos];
}
}
Status status;
if (!vectors_.float_data_.empty()) {
/*
status = execution_engine->AddWithIds(
num_vectors_added, vectors_.float_data_.data() + current_num_vectors_added * table_file_schema.dimension_,
vector_ids_to_add.data());
*/
std::vector<uint8_t> vectors;
auto size = num_vectors_added * table_file_schema.dimension_ * sizeof(float);
vectors.resize(size);
memcpy(vectors.data(), vectors_.float_data_.data() + current_num_vectors_added * table_file_schema.dimension_,
size);
status = segment_writer_ptr->AddVectors(table_file_schema.file_id_, vectors, vector_ids_to_add);
} else if (!vectors_.binary_data_.empty()) {
/*
status = execution_engine->AddWithIds(
num_vectors_added,
vectors_.binary_data_.data() + current_num_vectors_added * SingleVectorSize(table_file_schema.dimension_),
vector_ids_to_add.data());
*/
std::vector<uint8_t> vectors;
auto size = num_vectors_added * SingleVectorSize(table_file_schema.dimension_) * sizeof(uint8_t);
vectors.resize(size);
memcpy(
vectors.data(),
vectors_.binary_data_.data() + current_num_vectors_added * SingleVectorSize(table_file_schema.dimension_),
size);
status = segment_writer_ptr->AddVectors(table_file_schema.file_id_, vectors, vector_ids_to_add);
}
// Clear vector data
if (status.ok()) {
current_num_vectors_added += num_vectors_added;
// TODO(zhiru): remove
vector_ids_.insert(vector_ids_.end(), std::make_move_iterator(vector_ids_to_add.begin()),
std::make_move_iterator(vector_ids_to_add.end()));
} else {

View File

@ -11,23 +11,26 @@
#pragma once
#include <memory>
#include "db/IDGenerator.h"
#include "db/engine/ExecutionEngine.h"
#include "db/meta/Meta.h"
#include "segment/SegmentWriter.h"
#include "utils/Status.h"
#include <memory>
namespace milvus {
namespace engine {
// TODO(zhiru): this class needs to be refactored once attributes are added
class VectorSource {
public:
explicit VectorSource(VectorsData& vectors);
explicit VectorSource(VectorsData vectors);
Status
Add(const ExecutionEnginePtr& execution_engine, const meta::TableFileSchema& table_file_schema,
const size_t& num_vectors_to_add, size_t& num_vectors_added);
Add(/*const ExecutionEnginePtr& execution_engine,*/ const segment::SegmentWriterPtr& segment_writer_ptr,
const meta::TableFileSchema& table_file_schema, const size_t& num_vectors_to_add, size_t& num_vectors_added);
size_t
GetNumVectorsAdded();
@ -42,7 +45,7 @@ class VectorSource {
GetVectorIds();
private:
VectorsData& vectors_;
VectorsData vectors_;
IDNumbers vector_ids_;
size_t current_num_vectors_added;

View File

@ -11,30 +11,33 @@
#pragma once
#include "MetaTypes.h"
#include "db/Options.h"
#include "db/Types.h"
#include "utils/Status.h"
#include <cstddef>
#include <memory>
#include <string>
#include <vector>
#include "MetaTypes.h"
#include "db/Options.h"
#include "db/Types.h"
#include "utils/Status.h"
namespace milvus {
namespace engine {
namespace meta {
static const char* META_ENVIRONMENT = "Environment";
static const char* META_TABLES = "Tables";
static const char* META_TABLEFILES = "TableFiles";
class Meta {
/*
public:
class CleanUpFilter {
public:
virtual bool
IsIgnored(const TableFileSchema& schema) = 0;
};
*/
public:
virtual ~Meta() = default;
@ -54,6 +57,15 @@ class Meta {
virtual Status
UpdateTableFlag(const std::string& table_id, int64_t flag) = 0;
virtual Status
UpdateTableFlushLSN(const std::string& table_id, uint64_t flush_lsn) = 0;
virtual Status
GetTableFlushLSN(const std::string& table_id, uint64_t& flush_lsn) = 0;
virtual Status
GetTableFilesByFlushLSN(uint64_t flush_lsn, TableFilesSchema& table_files) = 0;
virtual Status
DropTable(const std::string& table_id) = 0;
@ -64,10 +76,10 @@ class Meta {
CreateTableFile(TableFileSchema& file_schema) = 0;
virtual Status
DropDataByDate(const std::string& table_id, const DatesT& dates) = 0;
GetTableFiles(const std::string& table_id, const std::vector<size_t>& ids, TableFilesSchema& table_files) = 0;
virtual Status
GetTableFiles(const std::string& table_id, const std::vector<size_t>& ids, TableFilesSchema& table_files) = 0;
GetTableFilesBySegmentId(const std::string& segment_id, TableFilesSchema& table_files) = 0;
virtual Status
UpdateTableFile(TableFileSchema& file_schema) = 0;
@ -88,7 +100,8 @@ class Meta {
DropTableIndex(const std::string& table_id) = 0;
virtual Status
CreatePartition(const std::string& table_name, const std::string& partition_name, const std::string& tag) = 0;
CreatePartition(const std::string& table_name, const std::string& partition_name, const std::string& tag,
uint64_t lsn) = 0;
virtual Status
DropPartition(const std::string& partition_name) = 0;
@ -100,11 +113,10 @@ class Meta {
GetPartitionName(const std::string& table_name, const std::string& tag, std::string& partition_name) = 0;
virtual Status
FilesToSearch(const std::string& table_id, const std::vector<size_t>& ids, const DatesT& dates,
DatePartionedTableFilesSchema& files) = 0;
FilesToSearch(const std::string& table_id, const std::vector<size_t>& ids, TableFilesSchema& files) = 0;
virtual Status
FilesToMerge(const std::string& table_id, DatePartionedTableFilesSchema& files) = 0;
FilesToMerge(const std::string& table_id, TableFilesSchema& files) = 0;
virtual Status
FilesToIndex(TableFilesSchema&) = 0;
@ -122,13 +134,19 @@ class Meta {
CleanUpShadowFiles() = 0;
virtual Status
CleanUpFilesWithTTL(uint64_t seconds, CleanUpFilter* filter = nullptr) = 0;
CleanUpFilesWithTTL(uint64_t seconds /*, CleanUpFilter* filter = nullptr*/) = 0;
virtual Status
DropAll() = 0;
virtual Status
Count(const std::string& table_id, uint64_t& result) = 0;
virtual Status
SetGlobalLastLSN(uint64_t lsn) = 0;
virtual Status
GetGlobalLastLSN(uint64_t& lsn) = 0;
}; // MetaData
using MetaPtr = std::shared_ptr<Meta>;

View File

@ -11,15 +11,15 @@
#pragma once
#include "db/Constants.h"
#include "db/engine/ExecutionEngine.h"
#include "src/version.h"
#include <map>
#include <memory>
#include <string>
#include <vector>
#include "db/Constants.h"
#include "db/engine/ExecutionEngine.h"
#include "src/version.h"
namespace milvus {
namespace engine {
namespace meta {
@ -35,7 +35,10 @@ constexpr int64_t FLAG_MASK_HAS_USERID = 0x1 << 1;
using DateT = int;
const DateT EmptyDate = -1;
using DatesT = std::vector<DateT>;
struct EnvironmentSchema {
uint64_t global_lsn_ = 0;
}; // EnvironmentSchema
struct TableSchema {
typedef enum {
@ -56,6 +59,7 @@ struct TableSchema {
std::string owner_table_;
std::string partition_tag_;
std::string version_ = CURRENT_VERSION;
uint64_t flush_lsn_ = 0;
}; // TableSchema
struct TableFileSchema {
@ -72,12 +76,14 @@ struct TableFileSchema {
size_t id_ = 0;
std::string table_id_;
std::string segment_id_;
std::string file_id_;
int32_t file_type_ = NEW;
size_t file_size_ = 0;
size_t row_count_ = 0;
DateT date_ = EmptyDate;
uint16_t dimension_ = 0;
// TODO(zhiru)
std::string location_;
int64_t updated_time_ = 0;
int64_t created_on_ = 0;
@ -85,11 +91,11 @@ struct TableFileSchema {
int32_t engine_type_ = DEFAULT_ENGINE_TYPE;
int32_t nlist_ = DEFAULT_NLIST; // not persist to meta
int32_t metric_type_ = DEFAULT_METRIC_TYPE; // not persist to meta
}; // TableFileSchema
uint64_t flush_lsn_ = 0;
}; // TableFileSchema
using TableFileSchemaPtr = std::shared_ptr<meta::TableFileSchema>;
using TableFilesSchema = std::vector<TableFileSchema>;
using DatePartionedTableFilesSchema = std::map<DateT, TableFilesSchema>;
} // namespace meta
} // namespace engine

View File

@ -10,19 +10,12 @@
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "db/meta/MySQLMetaImpl.h"
#include "MetaConsts.h"
#include "db/IDGenerator.h"
#include "db/Utils.h"
#include "metrics/Metrics.h"
#include "utils/CommonUtil.h"
#include "utils/Exception.h"
#include "utils/Log.h"
#include "utils/StringHelpFunctions.h"
#include <fiu-local.h>
#include <mysql++/mysql++.h>
#include <string.h>
#include <unistd.h>
#include <boost/filesystem.hpp>
#include <chrono>
#include <fstream>
@ -35,6 +28,16 @@
#include <string>
#include <thread>
#include "MetaConsts.h"
#include "db/IDGenerator.h"
#include "db/OngoingFileChecker.h"
#include "db/Utils.h"
#include "metrics/Metrics.h"
#include "utils/CommonUtil.h"
#include "utils/Exception.h"
#include "utils/Log.h"
#include "utils/StringHelpFunctions.h"
namespace milvus {
namespace engine {
namespace meta {
@ -146,12 +149,14 @@ static const MetaSchema TABLES_SCHEMA(META_TABLES, {
MetaField("partition_tag", "VARCHAR(255)", "NOT NULL"),
MetaField("version", "VARCHAR(64)",
std::string("DEFAULT '") + CURRENT_VERSION + "'"),
MetaField("flush_lsn", "BIGINT", "DEFAULT 0 NOT NULL"),
});
// TableFiles schema
static const MetaSchema TABLEFILES_SCHEMA(META_TABLEFILES, {
MetaField("id", "BIGINT", "PRIMARY KEY AUTO_INCREMENT"),
MetaField("table_id", "VARCHAR(255)", "NOT NULL"),
MetaField("segment_id", "VARCHAR(255)", "NOT NULL"),
MetaField("engine_type", "INT", "DEFAULT 1 NOT NULL"),
MetaField("file_id", "VARCHAR(255)", "NOT NULL"),
MetaField("file_type", "INT", "DEFAULT 0 NOT NULL"),
@ -160,6 +165,7 @@ static const MetaSchema TABLEFILES_SCHEMA(META_TABLEFILES, {
MetaField("updated_time", "BIGINT", "NOT NULL"),
MetaField("created_on", "BIGINT", "NOT NULL"),
MetaField("date", "INT", "DEFAULT -1 NOT NULL"),
MetaField("flush_lsn", "BIGINT", "DEFAULT 0 NOT NULL"),
});
} // namespace
@ -395,12 +401,13 @@ MySQLMetaImpl::CreateTable(TableSchema& table_schema) {
std::string& owner_table = table_schema.owner_table_;
std::string& partition_tag = table_schema.partition_tag_;
std::string& version = table_schema.version_;
std::string flush_lsn = std::to_string(table_schema.flush_lsn_);
createTableQuery << "INSERT INTO " << META_TABLES << " VALUES(" << id << ", " << mysqlpp::quote << table_id
<< ", " << state << ", " << dimension << ", " << created_on << ", " << flag << ", "
<< index_file_size << ", " << engine_type << ", " << nlist << ", " << metric_type << ", "
<< mysqlpp::quote << owner_table << ", " << mysqlpp::quote << partition_tag << ", "
<< mysqlpp::quote << version << ");";
<< mysqlpp::quote << version << ", " << flush_lsn << ");";
ENGINE_LOG_DEBUG << "MySQLMetaImpl::CreateTable: " << createTableQuery.str();
@ -438,7 +445,7 @@ MySQLMetaImpl::DescribeTable(TableSchema& table_schema) {
mysqlpp::Query describeTableQuery = connectionPtr->query();
describeTableQuery
<< "SELECT id, state, dimension, created_on, flag, index_file_size, engine_type, nlist, metric_type"
<< " ,owner_table, partition_tag, version"
<< " ,owner_table, partition_tag, version, flush_lsn"
<< " FROM " << META_TABLES << " WHERE table_id = " << mysqlpp::quote << table_schema.table_id_
<< " AND state <> " << std::to_string(TableSchema::TO_DELETE) << ";";
@ -461,6 +468,7 @@ MySQLMetaImpl::DescribeTable(TableSchema& table_schema) {
resRow["owner_table"].to_string(table_schema.owner_table_);
resRow["partition_tag"].to_string(table_schema.partition_tag_);
resRow["version"].to_string(table_schema.version_);
table_schema.flush_lsn_ = resRow["flush_lsn"];
} else {
return Status(DB_NOT_FOUND, "Table " + table_schema.table_id_ + " not found");
}
@ -525,9 +533,9 @@ MySQLMetaImpl::AllTables(std::vector<TableSchema>& table_schema_array) {
mysqlpp::Query allTablesQuery = connectionPtr->query();
allTablesQuery << "SELECT id, table_id, dimension, engine_type, nlist, index_file_size, metric_type"
<< " ,owner_table, partition_tag, version"
<< " ,owner_table, partition_tag, version, flush_lsn"
<< " FROM " << META_TABLES << " WHERE state <> " << std::to_string(TableSchema::TO_DELETE)
<< ";";
<< " AND owner_table = \"\";";
ENGINE_LOG_DEBUG << "MySQLMetaImpl::AllTables: " << allTablesQuery.str();
@ -546,6 +554,7 @@ MySQLMetaImpl::AllTables(std::vector<TableSchema>& table_schema_array) {
resRow["owner_table"].to_string(table_schema.owner_table_);
resRow["partition_tag"].to_string(table_schema.partition_tag_);
resRow["version"].to_string(table_schema.version_);
table_schema.flush_lsn_ = resRow["flush_lsn"];
table_schema_array.emplace_back(table_schema);
}
@ -653,6 +662,9 @@ MySQLMetaImpl::CreateTableFile(TableFileSchema& file_schema) {
server::MetricCollector metric;
NextFileId(file_schema.file_id_);
if (file_schema.segment_id_.empty()) {
file_schema.segment_id_ = file_schema.file_id_;
}
file_schema.dimension_ = table_schema.dimension_;
file_schema.file_size_ = 0;
file_schema.row_count_ = 0;
@ -665,6 +677,7 @@ MySQLMetaImpl::CreateTableFile(TableFileSchema& file_schema) {
std::string id = "NULL"; // auto-increment
std::string table_id = file_schema.table_id_;
std::string segment_id = file_schema.segment_id_;
std::string engine_type = std::to_string(file_schema.engine_type_);
std::string file_id = file_schema.file_id_;
std::string file_type = std::to_string(file_schema.file_type_);
@ -673,6 +686,7 @@ MySQLMetaImpl::CreateTableFile(TableFileSchema& file_schema) {
std::string updated_time = std::to_string(file_schema.updated_time_);
std::string created_on = std::to_string(file_schema.created_on_);
std::string date = std::to_string(file_schema.date_);
std::string flush_lsn = std::to_string(file_schema.flush_lsn_);
{
mysqlpp::ScopedConnection connectionPtr(*mysql_connection_pool_, safe_grab_);
@ -687,9 +701,10 @@ MySQLMetaImpl::CreateTableFile(TableFileSchema& file_schema) {
mysqlpp::Query createTableFileQuery = connectionPtr->query();
createTableFileQuery << "INSERT INTO " << META_TABLEFILES << " VALUES(" << id << ", " << mysqlpp::quote
<< table_id << ", " << engine_type << ", " << mysqlpp::quote << file_id << ", "
<< file_type << ", " << file_size << ", " << row_count << ", " << updated_time << ", "
<< created_on << ", " << date << ");";
<< table_id << ", " << mysqlpp::quote << segment_id << ", " << engine_type << ", "
<< mysqlpp::quote << file_id << ", " << file_type << ", " << file_size << ", "
<< row_count << ", " << updated_time << ", " << created_on << ", " << date << ", "
<< flush_lsn << ");";
ENGINE_LOG_DEBUG << "MySQLMetaImpl::CreateTableFile: " << createTableFileQuery.str();
@ -709,61 +724,6 @@ MySQLMetaImpl::CreateTableFile(TableFileSchema& file_schema) {
}
}
// TODO(myh): Delete single vecotor by id
Status
MySQLMetaImpl::DropDataByDate(const std::string& table_id, const DatesT& dates) {
if (dates.empty()) {
return Status::OK();
}
TableSchema table_schema;
table_schema.table_id_ = table_id;
auto status = DescribeTable(table_schema);
if (!status.ok()) {
return status;
}
try {
std::stringstream dateListSS;
for (auto& date : dates) {
dateListSS << std::to_string(date) << ", ";
}
std::string dateListStr = dateListSS.str();
dateListStr = dateListStr.substr(0, dateListStr.size() - 2); // remove the last ", "
{
mysqlpp::ScopedConnection connectionPtr(*mysql_connection_pool_, safe_grab_);
bool is_null_connection = (connectionPtr == nullptr);
fiu_do_on("MySQLMetaImpl.DropDataByDate.null_connection", is_null_connection = true);
fiu_do_on("MySQLMetaImpl.DropDataByDate.throw_exception", throw std::exception(););
if (is_null_connection) {
return Status(DB_ERROR, "Failed to connect to meta server(mysql)");
}
mysqlpp::Query dropPartitionsByDatesQuery = connectionPtr->query();
dropPartitionsByDatesQuery << "UPDATE " << META_TABLEFILES
<< " SET file_type = " << std::to_string(TableFileSchema::TO_DELETE)
<< " ,updated_time = " << utils::GetMicroSecTimeStamp()
<< " WHERE table_id = " << mysqlpp::quote << table_id << " AND date in ("
<< dateListStr << ");";
ENGINE_LOG_DEBUG << "MySQLMetaImpl::DropDataByDate: " << dropPartitionsByDatesQuery.str();
if (!dropPartitionsByDatesQuery.exec()) {
return HandleException("QUERY ERROR WHEN DROPPING PARTITIONS BY DATES",
dropPartitionsByDatesQuery.error());
}
} // Scoped Connection
ENGINE_LOG_DEBUG << "Successfully drop data by date, table id = " << table_schema.table_id_;
} catch (std::exception& e) {
return HandleException("GENERAL ERROR WHEN DROPPING PARTITIONS BY DATES", e.what());
}
return Status::OK();
}
Status
MySQLMetaImpl::GetTableFiles(const std::string& table_id, const std::vector<size_t>& ids,
TableFilesSchema& table_files) {
@ -791,10 +751,11 @@ MySQLMetaImpl::GetTableFiles(const std::string& table_id, const std::vector<size
}
mysqlpp::Query getTableFileQuery = connectionPtr->query();
getTableFileQuery << "SELECT id, engine_type, file_id, file_type, file_size, row_count, date, created_on"
<< " FROM " << META_TABLEFILES << " WHERE table_id = " << mysqlpp::quote << table_id
<< " AND (" << idStr << ")"
<< " AND file_type <> " << std::to_string(TableFileSchema::TO_DELETE) << ";";
getTableFileQuery
<< "SELECT id, segment_id, engine_type, file_id, file_type, file_size, row_count, date, created_on"
<< " FROM " << META_TABLEFILES << " WHERE table_id = " << mysqlpp::quote << table_id << " AND ("
<< idStr << ")"
<< " AND file_type <> " << std::to_string(TableFileSchema::TO_DELETE) << ";";
ENGINE_LOG_DEBUG << "MySQLMetaImpl::GetTableFiles: " << getTableFileQuery.str();
@ -810,6 +771,7 @@ MySQLMetaImpl::GetTableFiles(const std::string& table_id, const std::vector<size
TableFileSchema file_schema;
file_schema.id_ = resRow["id"];
file_schema.table_id_ = table_id;
resRow["segment_id"].to_string(file_schema.segment_id_);
file_schema.index_file_size_ = table_schema.index_file_size_;
file_schema.engine_type_ = resRow["engine_type"];
file_schema.nlist_ = table_schema.nlist_;
@ -833,6 +795,66 @@ MySQLMetaImpl::GetTableFiles(const std::string& table_id, const std::vector<size
}
}
Status
MySQLMetaImpl::GetTableFilesBySegmentId(const std::string& segment_id,
milvus::engine::meta::TableFilesSchema& table_files) {
try {
mysqlpp::StoreQueryResult res;
{
mysqlpp::ScopedConnection connectionPtr(*mysql_connection_pool_, safe_grab_);
if (connectionPtr == nullptr) {
return Status(DB_ERROR, "Failed to connect to meta server(mysql)");
}
mysqlpp::Query getTableFileQuery = connectionPtr->query();
getTableFileQuery << "SELECT id, table_id, segment_id, engine_type, file_id, file_type, file_size, "
<< "row_count, date, created_on"
<< " FROM " << META_TABLEFILES << " WHERE segment_id = " << mysqlpp::quote << segment_id
<< " AND file_type <> " << std::to_string(TableFileSchema::TO_DELETE) << ";";
ENGINE_LOG_DEBUG << "MySQLMetaImpl::GetTableFilesBySegmentId: " << getTableFileQuery.str();
res = getTableFileQuery.store();
} // Scoped Connection
if (!res.empty()) {
TableSchema table_schema;
res[0]["table_id"].to_string(table_schema.table_id_);
auto status = DescribeTable(table_schema);
if (!status.ok()) {
return status;
}
for (auto& resRow : res) {
TableFileSchema file_schema;
file_schema.id_ = resRow["id"];
file_schema.table_id_ = table_schema.table_id_;
resRow["segment_id"].to_string(file_schema.segment_id_);
file_schema.index_file_size_ = table_schema.index_file_size_;
file_schema.engine_type_ = resRow["engine_type"];
file_schema.nlist_ = table_schema.nlist_;
file_schema.metric_type_ = table_schema.metric_type_;
resRow["file_id"].to_string(file_schema.file_id_);
file_schema.file_type_ = resRow["file_type"];
file_schema.file_size_ = resRow["file_size"];
file_schema.row_count_ = resRow["row_count"];
file_schema.date_ = resRow["date"];
file_schema.created_on_ = resRow["created_on"];
file_schema.dimension_ = table_schema.dimension_;
utils::GetTableFilePath(options_, file_schema);
table_files.emplace_back(file_schema);
}
}
ENGINE_LOG_DEBUG << "Get table files by segment id";
return Status::OK();
} catch (std::exception& e) {
return HandleException("GENERAL ERROR WHEN RETRIEVING TABLE FILES BY SEGMENT ID", e.what());
}
}
Status
MySQLMetaImpl::UpdateTableIndex(const std::string& table_id, const TableIndex& index) {
try {
@ -924,6 +946,113 @@ MySQLMetaImpl::UpdateTableFlag(const std::string& table_id, int64_t flag) {
return Status::OK();
}
Status
MySQLMetaImpl::UpdateTableFlushLSN(const std::string& table_id, uint64_t flush_lsn) {
try {
server::MetricCollector metric;
{
mysqlpp::ScopedConnection connectionPtr(*mysql_connection_pool_, safe_grab_);
if (connectionPtr == nullptr) {
return Status(DB_ERROR, "Failed to connect to meta server(mysql)");
}
mysqlpp::Query updateTableFlagQuery = connectionPtr->query();
updateTableFlagQuery << "UPDATE " << META_TABLES << " SET flush_lsn = " << flush_lsn
<< " WHERE table_id = " << mysqlpp::quote << table_id << ";";
ENGINE_LOG_DEBUG << "MySQLMetaImpl::UpdateTableFlushLSN: " << updateTableFlagQuery.str();
if (!updateTableFlagQuery.exec()) {
return HandleException("QUERY ERROR WHEN UPDATING TABLE FLUSH_LSN", updateTableFlagQuery.error());
}
} // Scoped Connection
ENGINE_LOG_DEBUG << "Successfully update table flush_lsn, table id = " << table_id;
} catch (std::exception& e) {
return HandleException("GENERAL ERROR WHEN UPDATING TABLE FLUSH_LSN", e.what());
}
return Status::OK();
}
Status
MySQLMetaImpl::GetTableFlushLSN(const std::string& table_id, uint64_t& flush_lsn) {
return Status::OK();
}
Status
MySQLMetaImpl::GetTableFilesByFlushLSN(uint64_t flush_lsn, TableFilesSchema& table_files) {
table_files.clear();
try {
server::MetricCollector metric;
mysqlpp::StoreQueryResult res;
{
mysqlpp::ScopedConnection connectionPtr(*mysql_connection_pool_, safe_grab_);
if (connectionPtr == nullptr) {
return Status(DB_ERROR, "Failed to connect to meta server(mysql)");
}
mysqlpp::Query filesToIndexQuery = connectionPtr->query();
filesToIndexQuery << "SELECT id, table_id, segment_id, engine_type, file_id, file_type, file_size, "
"row_count, date, created_on"
<< " FROM " << META_TABLEFILES << " WHERE flush_lsn = " << flush_lsn << ";";
ENGINE_LOG_DEBUG << "MySQLMetaImpl::FilesToIndex: " << filesToIndexQuery.str();
res = filesToIndexQuery.store();
} // Scoped Connection
Status ret;
std::map<std::string, TableSchema> groups;
TableFileSchema table_file;
for (auto& resRow : res) {
table_file.id_ = resRow["id"]; // implicit conversion
resRow["table_id"].to_string(table_file.table_id_);
resRow["segment_id"].to_string(table_file.segment_id_);
table_file.engine_type_ = resRow["engine_type"];
resRow["file_id"].to_string(table_file.file_id_);
table_file.file_type_ = resRow["file_type"];
table_file.file_size_ = resRow["file_size"];
table_file.row_count_ = resRow["row_count"];
table_file.date_ = resRow["date"];
table_file.created_on_ = resRow["created_on"];
auto groupItr = groups.find(table_file.table_id_);
if (groupItr == groups.end()) {
TableSchema table_schema;
table_schema.table_id_ = table_file.table_id_;
auto status = DescribeTable(table_schema);
if (!status.ok()) {
return status;
}
groups[table_file.table_id_] = table_schema;
}
table_file.dimension_ = groups[table_file.table_id_].dimension_;
table_file.index_file_size_ = groups[table_file.table_id_].index_file_size_;
table_file.nlist_ = groups[table_file.table_id_].nlist_;
table_file.metric_type_ = groups[table_file.table_id_].metric_type_;
auto status = utils::GetTableFilePath(options_, table_file);
if (!status.ok()) {
ret = status;
}
table_files.push_back(table_file);
}
if (res.size() > 0) {
ENGINE_LOG_DEBUG << "Collect " << res.size() << " files with flush_lsn = " << flush_lsn;
}
return ret;
} catch (std::exception& e) {
return HandleException("GENERAL ERROR WHEN FINDING TABLE FILES BY LSN", e.what());
}
}
// ZR: this function assumes all fields in file_schema have value
Status
MySQLMetaImpl::UpdateTableFile(TableFileSchema& file_schema) {
@ -1213,7 +1342,8 @@ MySQLMetaImpl::DropTableIndex(const std::string& table_id) {
}
Status
MySQLMetaImpl::CreatePartition(const std::string& table_id, const std::string& partition_name, const std::string& tag) {
MySQLMetaImpl::CreatePartition(const std::string& table_id, const std::string& partition_name, const std::string& tag,
uint64_t lsn) {
server::MetricCollector metric;
TableSchema table_schema;
@ -1252,6 +1382,7 @@ MySQLMetaImpl::CreatePartition(const std::string& table_id, const std::string& p
table_schema.created_on_ = utils::GetMicroSecTimeStamp();
table_schema.owner_table_ = table_id;
table_schema.partition_tag_ = valid_tag;
table_schema.flush_lsn_ = lsn;
status = CreateTable(table_schema);
fiu_do_on("MySQLMetaImpl.CreatePartition.aleady_exist", status = Status(DB_ALREADY_EXIST, ""));
@ -1283,8 +1414,10 @@ MySQLMetaImpl::ShowPartitions(const std::string& table_id, std::vector<meta::Tab
}
mysqlpp::Query allPartitionsQuery = connectionPtr->query();
allPartitionsQuery << "SELECT table_id FROM " << META_TABLES << " WHERE owner_table = " << mysqlpp::quote
<< table_id << " AND state <> " << std::to_string(TableSchema::TO_DELETE) << ";";
allPartitionsQuery << "SELECT table_id, id, state, dimension, created_on, flag, index_file_size,"
<< " engine_type, nlist, metric_type, partition_tag, version FROM " << META_TABLES
<< " WHERE owner_table = " << mysqlpp::quote << table_id << " AND state <> "
<< std::to_string(TableSchema::TO_DELETE) << ";";
ENGINE_LOG_DEBUG << "MySQLMetaImpl::AllTables: " << allPartitionsQuery.str();
@ -1294,7 +1427,19 @@ MySQLMetaImpl::ShowPartitions(const std::string& table_id, std::vector<meta::Tab
for (auto& resRow : res) {
meta::TableSchema partition_schema;
resRow["table_id"].to_string(partition_schema.table_id_);
DescribeTable(partition_schema);
partition_schema.id_ = resRow["id"]; // implicit conversion
partition_schema.state_ = resRow["state"];
partition_schema.dimension_ = resRow["dimension"];
partition_schema.created_on_ = resRow["created_on"];
partition_schema.flag_ = resRow["flag"];
partition_schema.index_file_size_ = resRow["index_file_size"];
partition_schema.engine_type_ = resRow["engine_type"];
partition_schema.nlist_ = resRow["nlist"];
partition_schema.metric_type_ = resRow["metric_type"];
partition_schema.owner_table_ = table_id;
resRow["partition_tag"].to_string(partition_schema.partition_tag_);
resRow["version"].to_string(partition_schema.version_);
partition_schema_array.emplace_back(partition_schema);
}
} catch (std::exception& e) {
@ -1349,8 +1494,7 @@ MySQLMetaImpl::GetPartitionName(const std::string& table_id, const std::string&
}
Status
MySQLMetaImpl::FilesToSearch(const std::string& table_id, const std::vector<size_t>& ids, const DatesT& dates,
DatePartionedTableFilesSchema& files) {
MySQLMetaImpl::FilesToSearch(const std::string& table_id, const std::vector<size_t>& ids, TableFilesSchema& files) {
files.clear();
try {
@ -1367,19 +1511,9 @@ MySQLMetaImpl::FilesToSearch(const std::string& table_id, const std::vector<size
}
mysqlpp::Query filesToSearchQuery = connectionPtr->query();
filesToSearchQuery << "SELECT id, table_id, engine_type, file_id, file_type, file_size, row_count, date"
<< " FROM " << META_TABLEFILES << " WHERE table_id = " << mysqlpp::quote << table_id;
if (!dates.empty()) {
std::stringstream partitionListSS;
for (auto& date : dates) {
partitionListSS << std::to_string(date) << ", ";
}
std::string partitionListStr = partitionListSS.str();
partitionListStr = partitionListStr.substr(0, partitionListStr.size() - 2); // remove the last ", "
filesToSearchQuery << " AND date IN (" << partitionListStr << ")";
}
filesToSearchQuery
<< "SELECT id, table_id, segment_id, engine_type, file_id, file_type, file_size, row_count, date"
<< " FROM " << META_TABLEFILES << " WHERE table_id = " << mysqlpp::quote << table_id;
if (!ids.empty()) {
std::stringstream idSS;
@ -1410,10 +1544,11 @@ MySQLMetaImpl::FilesToSearch(const std::string& table_id, const std::vector<size
}
Status ret;
TableFileSchema table_file;
for (auto& resRow : res) {
TableFileSchema table_file;
table_file.id_ = resRow["id"]; // implicit conversion
resRow["table_id"].to_string(table_file.table_id_);
resRow["segment_id"].to_string(table_file.segment_id_);
table_file.index_file_size_ = table_schema.index_file_size_;
table_file.engine_type_ = resRow["engine_type"];
table_file.nlist_ = table_schema.nlist_;
@ -1430,12 +1565,7 @@ MySQLMetaImpl::FilesToSearch(const std::string& table_id, const std::vector<size
ret = status;
}
auto dateItr = files.find(table_file.date_);
if (dateItr == files.end()) {
files[table_file.date_] = TableFilesSchema();
}
files[table_file.date_].push_back(table_file);
files.emplace_back(table_file);
}
if (res.size() > 0) {
@ -1448,7 +1578,7 @@ MySQLMetaImpl::FilesToSearch(const std::string& table_id, const std::vector<size
}
Status
MySQLMetaImpl::FilesToMerge(const std::string& table_id, DatePartionedTableFilesSchema& files) {
MySQLMetaImpl::FilesToMerge(const std::string& table_id, TableFilesSchema& files) {
files.clear();
try {
@ -1474,10 +1604,11 @@ MySQLMetaImpl::FilesToMerge(const std::string& table_id, DatePartionedTableFiles
}
mysqlpp::Query filesToMergeQuery = connectionPtr->query();
filesToMergeQuery
<< "SELECT id, table_id, file_id, file_type, file_size, row_count, date, engine_type, created_on"
<< " FROM " << META_TABLEFILES << " WHERE table_id = " << mysqlpp::quote << table_id
<< " AND file_type = " << std::to_string(TableFileSchema::RAW) << " ORDER BY row_count DESC;";
filesToMergeQuery << "SELECT id, table_id, segment_id, file_id, file_type, file_size, row_count, date, "
"engine_type, created_on"
<< " FROM " << META_TABLEFILES << " WHERE table_id = " << mysqlpp::quote << table_id
<< " AND file_type = " << std::to_string(TableFileSchema::RAW)
<< " ORDER BY row_count DESC;";
ENGINE_LOG_DEBUG << "MySQLMetaImpl::FilesToMerge: " << filesToMergeQuery.str();
@ -1495,6 +1626,7 @@ MySQLMetaImpl::FilesToMerge(const std::string& table_id, DatePartionedTableFiles
table_file.id_ = resRow["id"]; // implicit conversion
resRow["table_id"].to_string(table_file.table_id_);
resRow["segment_id"].to_string(table_file.segment_id_);
resRow["file_id"].to_string(table_file.file_id_);
table_file.file_type_ = resRow["file_type"];
table_file.row_count_ = resRow["row_count"];
@ -1511,13 +1643,8 @@ MySQLMetaImpl::FilesToMerge(const std::string& table_id, DatePartionedTableFiles
ret = status;
}
auto dateItr = files.find(table_file.date_);
if (dateItr == files.end()) {
files[table_file.date_] = TableFilesSchema();
++to_merge_files;
}
files[table_file.date_].push_back(table_file);
files.emplace_back(table_file);
++to_merge_files;
}
if (to_merge_files > 0) {
@ -1547,10 +1674,10 @@ MySQLMetaImpl::FilesToIndex(TableFilesSchema& files) {
}
mysqlpp::Query filesToIndexQuery = connectionPtr->query();
filesToIndexQuery
<< "SELECT id, table_id, engine_type, file_id, file_type, file_size, row_count, date, created_on"
<< " FROM " << META_TABLEFILES << " WHERE file_type = " << std::to_string(TableFileSchema::TO_INDEX)
<< ";";
filesToIndexQuery << "SELECT id, table_id, segment_id, engine_type, file_id, file_type, file_size, "
"row_count, date, created_on"
<< " FROM " << META_TABLEFILES
<< " WHERE file_type = " << std::to_string(TableFileSchema::TO_INDEX) << ";";
ENGINE_LOG_DEBUG << "MySQLMetaImpl::FilesToIndex: " << filesToIndexQuery.str();
@ -1563,6 +1690,7 @@ MySQLMetaImpl::FilesToIndex(TableFilesSchema& files) {
for (auto& resRow : res) {
table_file.id_ = resRow["id"]; // implicit conversion
resRow["table_id"].to_string(table_file.table_id_);
resRow["segment_id"].to_string(table_file.segment_id_);
table_file.engine_type_ = resRow["engine_type"];
resRow["file_id"].to_string(table_file.file_id_);
table_file.file_type_ = resRow["file_type"];
@ -1610,6 +1738,8 @@ MySQLMetaImpl::FilesByType(const std::string& table_id, const std::vector<int>&
return Status(DB_ERROR, "file types array is empty");
}
Status ret = Status::OK();
try {
table_files.clear();
@ -1635,7 +1765,7 @@ MySQLMetaImpl::FilesByType(const std::string& table_id, const std::vector<int>&
mysqlpp::Query hasNonIndexFilesQuery = connectionPtr->query();
// since table_id is a unique column we just need to check whether it exists or not
hasNonIndexFilesQuery
<< "SELECT id, engine_type, file_id, file_type, file_size, row_count, date, created_on"
<< "SELECT id, segment_id, engine_type, file_id, file_type, file_size, row_count, date, created_on"
<< " FROM " << META_TABLEFILES << " WHERE table_id = " << mysqlpp::quote << table_id
<< " AND file_type in (" << types << ");";
@ -1644,6 +1774,13 @@ MySQLMetaImpl::FilesByType(const std::string& table_id, const std::vector<int>&
res = hasNonIndexFilesQuery.store();
} // Scoped Connection
TableSchema table_schema;
table_schema.table_id_ = table_id;
auto status = DescribeTable(table_schema);
if (!status.ok()) {
return status;
}
if (res.num_rows() > 0) {
int raw_count = 0, new_count = 0, new_merge_count = 0, new_index_count = 0;
int to_index_count = 0, index_count = 0, backup_count = 0;
@ -1651,6 +1788,7 @@ MySQLMetaImpl::FilesByType(const std::string& table_id, const std::vector<int>&
TableFileSchema file_schema;
file_schema.id_ = resRow["id"];
file_schema.table_id_ = table_id;
resRow["segment_id"].to_string(file_schema.segment_id_);
file_schema.engine_type_ = resRow["engine_type"];
resRow["file_id"].to_string(file_schema.file_id_);
file_schema.file_type_ = resRow["file_type"];
@ -1659,6 +1797,16 @@ MySQLMetaImpl::FilesByType(const std::string& table_id, const std::vector<int>&
file_schema.date_ = resRow["date"];
file_schema.created_on_ = resRow["created_on"];
file_schema.index_file_size_ = table_schema.index_file_size_;
file_schema.nlist_ = table_schema.nlist_;
file_schema.metric_type_ = table_schema.metric_type_;
file_schema.dimension_ = table_schema.dimension_;
auto status = utils::GetTableFilePath(options_, file_schema);
if (!status.ok()) {
ret = status;
}
table_files.emplace_back(file_schema);
int32_t file_type = resRow["file_type"];
@ -1723,7 +1871,7 @@ MySQLMetaImpl::FilesByType(const std::string& table_id, const std::vector<int>&
return HandleException("GENERAL ERROR WHEN GET FILE BY TYPE", e.what());
}
return Status::OK();
return ret;
}
// TODO(myh): Support swap to cloud storage
@ -1866,7 +2014,7 @@ MySQLMetaImpl::CleanUpShadowFiles() {
}
Status
MySQLMetaImpl::CleanUpFilesWithTTL(uint64_t seconds, CleanUpFilter* filter) {
MySQLMetaImpl::CleanUpFilesWithTTL(uint64_t seconds /*, CleanUpFilter* filter*/) {
auto now = utils::GetMicroSecTimeStamp();
std::set<std::string> table_ids;
@ -1886,7 +2034,7 @@ MySQLMetaImpl::CleanUpFilesWithTTL(uint64_t seconds, CleanUpFilter* filter) {
}
mysqlpp::Query query = connectionPtr->query();
query << "SELECT id, table_id, file_id, file_type, date"
query << "SELECT id, table_id, segment_id, engine_type, file_id, file_type, date"
<< " FROM " << META_TABLEFILES << " WHERE file_type IN ("
<< std::to_string(TableFileSchema::TO_DELETE) << "," << std::to_string(TableFileSchema::BACKUP) << ")"
<< " AND updated_time < " << std::to_string(now - seconds * US_PS) << ";";
@ -1902,12 +2050,14 @@ MySQLMetaImpl::CleanUpFilesWithTTL(uint64_t seconds, CleanUpFilter* filter) {
for (auto& resRow : res) {
table_file.id_ = resRow["id"]; // implicit conversion
resRow["table_id"].to_string(table_file.table_id_);
resRow["segment_id"].to_string(table_file.segment_id_);
table_file.engine_type_ = resRow["engine_type"];
resRow["file_id"].to_string(table_file.file_id_);
table_file.date_ = resRow["date"];
table_file.file_type_ = resRow["file_type"];
// check if the file can be deleted
if (filter && filter->IsIgnored(table_file)) {
if (OngoingFileChecker::GetInstance().IsIgnored(table_file)) {
ENGINE_LOG_DEBUG << "File:" << table_file.file_id_
<< " currently is in use, not able to delete now";
continue; // ignore this file, don't delete it
@ -1919,9 +2069,19 @@ MySQLMetaImpl::CleanUpFilesWithTTL(uint64_t seconds, CleanUpFilter* filter) {
server::CommonUtil::EraseFromCache(table_file.location_);
if (table_file.file_type_ == (int)TableFileSchema::TO_DELETE) {
// delete file from disk storage
utils::DeleteTableFilePath(options_, table_file);
ENGINE_LOG_DEBUG << "Remove file id:" << table_file.id_ << " location:" << table_file.location_;
// If we are deleting a raw table file, it means it's okay to delete the entire segment directory.
// Else, we can only delete the single file
// TODO(zhiru): We determine whether a table file is raw by its engine type. This is a bit hacky
if (table_file.engine_type_ == (int32_t)EngineType::FAISS_IDMAP ||
table_file.engine_type_ == (int32_t)EngineType::FAISS_BIN_IDMAP) {
utils::DeleteSegment(options_, table_file);
std::string segment_dir;
utils::GetParentPath(table_file.location_, segment_dir);
ENGINE_LOG_DEBUG << "Remove segment directory: " << segment_dir;
} else {
utils::DeleteTableFilePath(options_, table_file);
ENGINE_LOG_DEBUG << "Remove table file: " << table_file.location_;
}
idsToDelete.emplace_back(std::to_string(table_file.id_));
table_ids.insert(table_file.table_id_);
@ -2196,6 +2356,16 @@ MySQLMetaImpl::DiscardFiles(int64_t to_discard_size) {
}
}
Status
MySQLMetaImpl::SetGlobalLastLSN(uint64_t lsn) {
return Status::OK();
}
Status
MySQLMetaImpl::GetGlobalLastLSN(uint64_t& lsn) {
return Status::OK();
}
} // namespace meta
} // namespace engine
} // namespace milvus

View File

@ -11,16 +11,17 @@
#pragma once
#include "Meta.h"
#include "MySQLConnectionPool.h"
#include "db/Options.h"
#include <mysql++/mysql++.h>
#include <memory>
#include <mutex>
#include <string>
#include <vector>
#include "Meta.h"
#include "MySQLConnectionPool.h"
#include "db/Options.h"
namespace milvus {
namespace engine {
namespace meta {
@ -52,10 +53,10 @@ class MySQLMetaImpl : public Meta {
CreateTableFile(TableFileSchema& file_schema) override;
Status
DropDataByDate(const std::string& table_id, const DatesT& dates) override;
GetTableFiles(const std::string& table_id, const std::vector<size_t>& ids, TableFilesSchema& table_files) override;
Status
GetTableFiles(const std::string& table_id, const std::vector<size_t>& ids, TableFilesSchema& table_files) override;
GetTableFilesBySegmentId(const std::string& segment_id, TableFilesSchema& table_files) override;
Status
UpdateTableIndex(const std::string& table_id, const TableIndex& index) override;
@ -63,6 +64,15 @@ class MySQLMetaImpl : public Meta {
Status
UpdateTableFlag(const std::string& table_id, int64_t flag) override;
Status
UpdateTableFlushLSN(const std::string& table_id, uint64_t flush_lsn) override;
Status
GetTableFlushLSN(const std::string& table_id, uint64_t& flush_lsn) override;
Status
GetTableFilesByFlushLSN(uint64_t flush_lsn, TableFilesSchema& table_files) override;
Status
UpdateTableFile(TableFileSchema& file_schema) override;
@ -79,7 +89,8 @@ class MySQLMetaImpl : public Meta {
DropTableIndex(const std::string& table_id) override;
Status
CreatePartition(const std::string& table_id, const std::string& partition_name, const std::string& tag) override;
CreatePartition(const std::string& table_id, const std::string& partition_name, const std::string& tag,
uint64_t lsn) override;
Status
DropPartition(const std::string& partition_name) override;
@ -91,11 +102,10 @@ class MySQLMetaImpl : public Meta {
GetPartitionName(const std::string& table_id, const std::string& tag, std::string& partition_name) override;
Status
FilesToSearch(const std::string& table_id, const std::vector<size_t>& ids, const DatesT& dates,
DatePartionedTableFilesSchema& files) override;
FilesToSearch(const std::string& table_id, const std::vector<size_t>& ids, TableFilesSchema& files) override;
Status
FilesToMerge(const std::string& table_id, DatePartionedTableFilesSchema& files) override;
FilesToMerge(const std::string& table_id, TableFilesSchema& files) override;
Status
FilesToIndex(TableFilesSchema&) override;
@ -114,7 +124,7 @@ class MySQLMetaImpl : public Meta {
CleanUpShadowFiles() override;
Status
CleanUpFilesWithTTL(uint64_t seconds, CleanUpFilter* filter = nullptr) override;
CleanUpFilesWithTTL(uint64_t seconds /*, CleanUpFilter* filter = nullptr*/) override;
Status
DropAll() override;
@ -122,6 +132,12 @@ class MySQLMetaImpl : public Meta {
Status
Count(const std::string& table_id, uint64_t& result) override;
Status
SetGlobalLastLSN(uint64_t lsn) override;
Status
GetGlobalLastLSN(uint64_t& lsn) override;
private:
Status
NextFileId(std::string& file_id);

File diff suppressed because it is too large Load Diff

View File

@ -11,13 +11,13 @@
#pragma once
#include "Meta.h"
#include "db/Options.h"
#include <mutex>
#include <string>
#include <vector>
#include "Meta.h"
#include "db/Options.h"
namespace milvus {
namespace engine {
namespace meta {
@ -52,10 +52,10 @@ class SqliteMetaImpl : public Meta {
CreateTableFile(TableFileSchema& file_schema) override;
Status
DropDataByDate(const std::string& table_id, const DatesT& dates) override;
GetTableFiles(const std::string& table_id, const std::vector<size_t>& ids, TableFilesSchema& table_files) override;
Status
GetTableFiles(const std::string& table_id, const std::vector<size_t>& ids, TableFilesSchema& table_files) override;
GetTableFilesBySegmentId(const std::string& segment_id, TableFilesSchema& table_files) override;
Status
UpdateTableIndex(const std::string& table_id, const TableIndex& index) override;
@ -63,6 +63,15 @@ class SqliteMetaImpl : public Meta {
Status
UpdateTableFlag(const std::string& table_id, int64_t flag) override;
Status
UpdateTableFlushLSN(const std::string& table_id, uint64_t flush_lsn) override;
Status
GetTableFlushLSN(const std::string& table_id, uint64_t& flush_lsn) override;
Status
GetTableFilesByFlushLSN(uint64_t flush_lsn, TableFilesSchema& table_files) override;
Status
UpdateTableFile(TableFileSchema& file_schema) override;
@ -79,7 +88,8 @@ class SqliteMetaImpl : public Meta {
DropTableIndex(const std::string& table_id) override;
Status
CreatePartition(const std::string& table_id, const std::string& partition_name, const std::string& tag) override;
CreatePartition(const std::string& table_id, const std::string& partition_name, const std::string& tag,
uint64_t lsn) override;
Status
DropPartition(const std::string& partition_name) override;
@ -91,11 +101,10 @@ class SqliteMetaImpl : public Meta {
GetPartitionName(const std::string& table_id, const std::string& tag, std::string& partition_name) override;
Status
FilesToSearch(const std::string& table_id, const std::vector<size_t>& ids, const DatesT& dates,
DatePartionedTableFilesSchema& files) override;
FilesToSearch(const std::string& table_id, const std::vector<size_t>& ids, TableFilesSchema& files) override;
Status
FilesToMerge(const std::string& table_id, DatePartionedTableFilesSchema& files) override;
FilesToMerge(const std::string& table_id, TableFilesSchema& files) override;
Status
FilesToIndex(TableFilesSchema&) override;
@ -114,7 +123,7 @@ class SqliteMetaImpl : public Meta {
CleanUpShadowFiles() override;
Status
CleanUpFilesWithTTL(uint64_t seconds, CleanUpFilter* filter = nullptr) override;
CleanUpFilesWithTTL(uint64_t seconds /*, CleanUpFilter* filter = nullptr*/) override;
Status
DropAll() override;
@ -122,6 +131,12 @@ class SqliteMetaImpl : public Meta {
Status
Count(const std::string& table_id, uint64_t& result) override;
Status
SetGlobalLastLSN(uint64_t lsn) override;
Status
GetGlobalLastLSN(uint64_t& lsn) override;
private:
Status
NextFileId(std::string& file_id);

View File

@ -0,0 +1,420 @@
// Copyright (C) 2019-2020 Zilliz. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "db/wal/WalBuffer.h"
#include <cstring>
#include "db/wal/WalDefinations.h"
#include "utils/Log.h"
namespace milvus {
namespace engine {
namespace wal {
inline std::string
ToFileName(int32_t file_no) {
return std::to_string(file_no) + ".wal";
}
inline void
BuildLsn(uint32_t file_no, uint32_t offset, uint64_t& lsn) {
lsn = (uint64_t)file_no << 32 | offset;
}
inline void
ParserLsn(uint64_t lsn, uint32_t& file_no, uint32_t& offset) {
file_no = uint32_t(lsn >> 32);
offset = uint32_t(lsn & LSN_OFFSET_MASK);
}
MXLogBuffer::MXLogBuffer(const std::string& mxlog_path, const uint32_t buffer_size)
: mxlog_buffer_size_(buffer_size), mxlog_writer_(mxlog_path) {
if (mxlog_buffer_size_ < (uint32_t)WAL_BUFFER_MIN_SIZE) {
WAL_LOG_INFO << "config wal buffer size is too small " << mxlog_buffer_size_;
mxlog_buffer_size_ = (uint32_t)WAL_BUFFER_MIN_SIZE;
} else if (mxlog_buffer_size_ > (uint32_t)WAL_BUFFER_MAX_SIZE) {
WAL_LOG_INFO << "config wal buffer size is too larger " << mxlog_buffer_size_;
mxlog_buffer_size_ = (uint32_t)WAL_BUFFER_MAX_SIZE;
}
}
MXLogBuffer::~MXLogBuffer() {
}
/**
* alloc space for buffers
* @param buffer_size
* @return
*/
bool
MXLogBuffer::Init(uint64_t start_lsn, uint64_t end_lsn) {
WAL_LOG_DEBUG << "start_lsn " << start_lsn << " end_lsn " << end_lsn;
ParserLsn(start_lsn, mxlog_buffer_reader_.file_no, mxlog_buffer_reader_.buf_offset);
ParserLsn(end_lsn, mxlog_buffer_writer_.file_no, mxlog_buffer_writer_.buf_offset);
if (start_lsn == end_lsn) {
// no data need recovery, start a new file_no
if (mxlog_buffer_writer_.buf_offset != 0) {
mxlog_buffer_writer_.file_no++;
mxlog_buffer_writer_.buf_offset = 0;
mxlog_buffer_reader_.file_no++;
mxlog_buffer_reader_.buf_offset = 0;
}
} else {
// to check whether buffer_size is enough
MXLogFileHandler file_handler(mxlog_writer_.GetFilePath());
uint32_t buffer_size_need = 0;
for (auto i = mxlog_buffer_reader_.file_no; i < mxlog_buffer_writer_.file_no; i++) {
file_handler.SetFileName(ToFileName(i));
auto file_size = file_handler.GetFileSize();
if (file_size == 0) {
WAL_LOG_ERROR << "bad wal file " << i;
return false;
}
if (file_size > buffer_size_need) {
buffer_size_need = file_size;
}
}
if (mxlog_buffer_writer_.buf_offset > buffer_size_need) {
buffer_size_need = mxlog_buffer_writer_.buf_offset;
}
if (buffer_size_need > mxlog_buffer_size_) {
mxlog_buffer_size_ = buffer_size_need;
WAL_LOG_INFO << "recovery will need more buffer, buffer size changed " << mxlog_buffer_size_;
}
}
buf_[0] = BufferPtr(new char[mxlog_buffer_size_]);
buf_[1] = BufferPtr(new char[mxlog_buffer_size_]);
if (mxlog_buffer_reader_.file_no == mxlog_buffer_writer_.file_no) {
// read-write buffer
mxlog_buffer_reader_.buf_idx = 0;
mxlog_buffer_writer_.buf_idx = 0;
mxlog_writer_.SetFileName(ToFileName(mxlog_buffer_writer_.file_no));
if (mxlog_buffer_writer_.buf_offset == 0) {
mxlog_writer_.SetFileOpenMode("w");
} else {
mxlog_writer_.SetFileOpenMode("r+");
if (!mxlog_writer_.FileExists()) {
WAL_LOG_ERROR << "wal file not exist " << mxlog_buffer_writer_.file_no;
return false;
}
auto read_offset = mxlog_buffer_reader_.buf_offset;
auto read_size = mxlog_buffer_writer_.buf_offset - mxlog_buffer_reader_.buf_offset;
if (!mxlog_writer_.Load(buf_[0].get() + read_offset, read_offset, read_size)) {
WAL_LOG_ERROR << "load wal file error " << read_offset << " " << read_size;
return false;
}
}
} else {
// read buffer
mxlog_buffer_reader_.buf_idx = 0;
MXLogFileHandler file_handler(mxlog_writer_.GetFilePath());
file_handler.SetFileName(ToFileName(mxlog_buffer_reader_.file_no));
file_handler.SetFileOpenMode("r");
auto read_offset = mxlog_buffer_reader_.buf_offset;
auto read_size = file_handler.Load(buf_[0].get() + read_offset, read_offset);
mxlog_buffer_reader_.max_offset = read_size + read_offset;
file_handler.CloseFile();
// write buffer
mxlog_buffer_writer_.buf_idx = 1;
mxlog_writer_.SetFileName(ToFileName(mxlog_buffer_writer_.file_no));
mxlog_writer_.SetFileOpenMode("r+");
if (!mxlog_writer_.FileExists()) {
WAL_LOG_ERROR << "wal file not exist " << mxlog_buffer_writer_.file_no;
return false;
}
if (!mxlog_writer_.Load(buf_[1].get(), 0, mxlog_buffer_writer_.buf_offset)) {
WAL_LOG_ERROR << "load wal file error " << mxlog_buffer_writer_.file_no;
return false;
}
}
SetFileNoFrom(mxlog_buffer_reader_.file_no);
return true;
}
void
MXLogBuffer::Reset(uint64_t lsn) {
WAL_LOG_DEBUG << "reset lsn " << lsn;
buf_[0] = BufferPtr(new char[mxlog_buffer_size_]);
buf_[1] = BufferPtr(new char[mxlog_buffer_size_]);
ParserLsn(lsn, mxlog_buffer_writer_.file_no, mxlog_buffer_writer_.buf_offset);
if (mxlog_buffer_writer_.buf_offset != 0) {
mxlog_buffer_writer_.file_no++;
mxlog_buffer_writer_.buf_offset = 0;
}
mxlog_buffer_writer_.buf_idx = 0;
memcpy(&mxlog_buffer_reader_, &mxlog_buffer_writer_, sizeof(MXLogBufferHandler));
mxlog_writer_.CloseFile();
mxlog_writer_.SetFileName(ToFileName(mxlog_buffer_writer_.file_no));
mxlog_writer_.SetFileOpenMode("w");
SetFileNoFrom(mxlog_buffer_reader_.file_no);
}
uint32_t
MXLogBuffer::GetBufferSize() {
return mxlog_buffer_size_;
}
// buffer writer cares about surplus space of buffer
uint32_t
MXLogBuffer::SurplusSpace() {
return mxlog_buffer_size_ - mxlog_buffer_writer_.buf_offset;
}
uint32_t
MXLogBuffer::RecordSize(const MXLogRecord& record) {
return SizeOfMXLogRecordHeader + (uint32_t)record.table_id.size() + (uint32_t)record.partition_tag.size() +
record.length * (uint32_t)sizeof(IDNumber) + record.data_size;
}
ErrorCode
MXLogBuffer::Append(MXLogRecord& record) {
uint32_t record_size = RecordSize(record);
if (SurplusSpace() < record_size) {
// writer buffer has no space, switch wal file and write to a new buffer
std::unique_lock<std::mutex> lck(mutex_);
if (mxlog_buffer_writer_.buf_idx == mxlog_buffer_reader_.buf_idx) {
// swith writer buffer
mxlog_buffer_reader_.max_offset = mxlog_buffer_writer_.buf_offset;
mxlog_buffer_writer_.buf_idx ^= 1;
}
mxlog_buffer_writer_.file_no++;
mxlog_buffer_writer_.buf_offset = 0;
lck.unlock();
// Reborn means close old wal file and open new wal file
if (!mxlog_writer_.ReBorn(ToFileName(mxlog_buffer_writer_.file_no), "w")) {
WAL_LOG_ERROR << "ReBorn wal file error " << mxlog_buffer_writer_.file_no;
return WAL_FILE_ERROR;
}
}
// point to the offset of current record in wal file
char* current_write_buf = buf_[mxlog_buffer_writer_.buf_idx].get();
uint32_t current_write_offset = mxlog_buffer_writer_.buf_offset;
MXLogRecordHeader head;
BuildLsn(mxlog_buffer_writer_.file_no, mxlog_buffer_writer_.buf_offset + (uint32_t)record_size, head.mxl_lsn);
head.mxl_type = (uint8_t)record.type;
head.table_id_size = (uint16_t)record.table_id.size();
head.partition_tag_size = (uint16_t)record.partition_tag.size();
head.vector_num = record.length;
head.data_size = record.data_size;
memcpy(current_write_buf + current_write_offset, &head, SizeOfMXLogRecordHeader);
current_write_offset += SizeOfMXLogRecordHeader;
if (!record.table_id.empty()) {
memcpy(current_write_buf + current_write_offset, record.table_id.data(), record.table_id.size());
current_write_offset += record.table_id.size();
}
if (!record.partition_tag.empty()) {
memcpy(current_write_buf + current_write_offset, record.partition_tag.data(), record.partition_tag.size());
current_write_offset += record.partition_tag.size();
}
if (record.ids != nullptr && record.length > 0) {
memcpy(current_write_buf + current_write_offset, record.ids, record.length * sizeof(IDNumber));
current_write_offset += record.length * sizeof(IDNumber);
}
if (record.data != nullptr && record.data_size > 0) {
memcpy(current_write_buf + current_write_offset, record.data, record.data_size);
current_write_offset += record.data_size;
}
bool write_rst = mxlog_writer_.Write(current_write_buf + mxlog_buffer_writer_.buf_offset, record_size);
if (!write_rst) {
WAL_LOG_ERROR << "write wal file error";
return WAL_FILE_ERROR;
}
mxlog_buffer_writer_.buf_offset = current_write_offset;
record.lsn = head.mxl_lsn;
return WAL_SUCCESS;
}
ErrorCode
MXLogBuffer::Next(const uint64_t last_applied_lsn, MXLogRecord& record) {
// init output
record.type = MXLogType::None;
// reader catch up to writer, no next record, read fail
if (GetReadLsn() >= last_applied_lsn) {
return WAL_SUCCESS;
}
// otherwise, it means there must exists next record, in buffer or wal log
bool need_load_new = false;
std::unique_lock<std::mutex> lck(mutex_);
if (mxlog_buffer_reader_.file_no != mxlog_buffer_writer_.file_no) {
if (mxlog_buffer_reader_.buf_offset == mxlog_buffer_reader_.max_offset) { // last record
mxlog_buffer_reader_.file_no++;
mxlog_buffer_reader_.buf_offset = 0;
need_load_new = (mxlog_buffer_reader_.file_no != mxlog_buffer_writer_.file_no);
if (!need_load_new) {
// read reach write buffer
mxlog_buffer_reader_.buf_idx = mxlog_buffer_writer_.buf_idx;
}
}
}
lck.unlock();
if (need_load_new) {
MXLogFileHandler mxlog_reader(mxlog_writer_.GetFilePath());
mxlog_reader.SetFileName(ToFileName(mxlog_buffer_reader_.file_no));
mxlog_reader.SetFileOpenMode("r");
uint32_t file_size = mxlog_reader.Load(buf_[mxlog_buffer_reader_.buf_idx].get(), 0);
if (file_size == 0) {
WAL_LOG_ERROR << "load wal file error " << mxlog_buffer_reader_.file_no;
return WAL_FILE_ERROR;
}
mxlog_buffer_reader_.max_offset = file_size;
}
char* current_read_buf = buf_[mxlog_buffer_reader_.buf_idx].get();
uint64_t current_read_offset = mxlog_buffer_reader_.buf_offset;
MXLogRecordHeader* head = (MXLogRecordHeader*)(current_read_buf + current_read_offset);
record.type = (MXLogType)head->mxl_type;
record.lsn = head->mxl_lsn;
record.length = head->vector_num;
record.data_size = head->data_size;
current_read_offset += SizeOfMXLogRecordHeader;
if (head->table_id_size != 0) {
record.table_id.assign(current_read_buf + current_read_offset, head->table_id_size);
current_read_offset += head->table_id_size;
} else {
record.table_id = "";
}
if (head->partition_tag_size != 0) {
record.partition_tag.assign(current_read_buf + current_read_offset, head->partition_tag_size);
current_read_offset += head->partition_tag_size;
} else {
record.partition_tag = "";
}
if (head->vector_num != 0) {
record.ids = (IDNumber*)(current_read_buf + current_read_offset);
current_read_offset += head->vector_num * sizeof(IDNumber);
} else {
record.ids = nullptr;
}
if (record.data_size != 0) {
record.data = current_read_buf + current_read_offset;
} else {
record.data = nullptr;
}
mxlog_buffer_reader_.buf_offset = uint32_t(head->mxl_lsn & LSN_OFFSET_MASK);
return WAL_SUCCESS;
}
uint64_t
MXLogBuffer::GetReadLsn() {
uint64_t read_lsn;
BuildLsn(mxlog_buffer_reader_.file_no, mxlog_buffer_reader_.buf_offset, read_lsn);
return read_lsn;
}
bool
MXLogBuffer::ResetWriteLsn(uint64_t lsn) {
WAL_LOG_INFO << "reset write lsn " << lsn;
int32_t old_file_no = mxlog_buffer_writer_.file_no;
ParserLsn(lsn, mxlog_buffer_writer_.file_no, mxlog_buffer_writer_.buf_offset);
if (old_file_no == mxlog_buffer_writer_.file_no) {
WAL_LOG_DEBUG << "file No. is not changed";
return true;
}
std::unique_lock<std::mutex> lck(mutex_);
if (mxlog_buffer_writer_.file_no == mxlog_buffer_reader_.file_no) {
mxlog_buffer_writer_.buf_idx = mxlog_buffer_reader_.buf_idx;
WAL_LOG_DEBUG << "file No. is the same as reader";
return true;
}
lck.unlock();
if (!mxlog_writer_.ReBorn(ToFileName(mxlog_buffer_writer_.file_no), "r+")) {
WAL_LOG_ERROR << "reborn file error " << mxlog_buffer_writer_.file_no;
return false;
}
if (!mxlog_writer_.Load(buf_[mxlog_buffer_writer_.buf_idx].get(), 0, mxlog_buffer_writer_.buf_offset)) {
WAL_LOG_ERROR << "load file error";
return false;
}
return true;
}
void
MXLogBuffer::SetFileNoFrom(uint32_t file_no) {
file_no_from_ = file_no;
if (file_no > 0) {
// remove the files whose No. are less than file_no
MXLogFileHandler file_handler(mxlog_writer_.GetFilePath());
do {
file_handler.SetFileName(ToFileName(--file_no));
if (!file_handler.FileExists()) {
break;
}
WAL_LOG_INFO << "Delete wal file " << file_no;
file_handler.DeleteFile();
} while (file_no > 0);
}
}
void
MXLogBuffer::RemoveOldFiles(uint64_t flushed_lsn) {
uint32_t file_no;
uint32_t offset;
ParserLsn(flushed_lsn, file_no, offset);
if (file_no_from_ < file_no) {
MXLogFileHandler file_handler(mxlog_writer_.GetFilePath());
do {
file_handler.SetFileName(ToFileName(file_no_from_));
WAL_LOG_INFO << "Delete wal file " << file_no_from_;
file_handler.DeleteFile();
} while (++file_no_from_ < file_no);
}
}
} // namespace wal
} // namespace engine
} // namespace milvus

108
core/src/db/wal/WalBuffer.h Normal file
View File

@ -0,0 +1,108 @@
// Copyright (C) 2019-2020 Zilliz. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#pragma once
#include <atomic>
#include <memory>
#include <mutex>
#include <string>
#include "WalDefinations.h"
#include "WalFileHandler.h"
#include "WalMetaHandler.h"
#include "utils/Error.h"
namespace milvus {
namespace engine {
namespace wal {
#pragma pack(push)
#pragma pack(1)
struct MXLogRecordHeader {
uint64_t mxl_lsn; // log sequence number (high 32 bits: file No. inc by 1, low 32 bits: offset in file, max 4GB)
uint8_t mxl_type; // record type, insert/delete/update/flush...
uint16_t table_id_size;
uint16_t partition_tag_size;
uint32_t vector_num;
uint32_t data_size;
};
const uint32_t SizeOfMXLogRecordHeader = sizeof(MXLogRecordHeader);
#pragma pack(pop)
struct MXLogBufferHandler {
uint32_t max_offset;
uint32_t file_no;
uint32_t buf_offset;
uint8_t buf_idx;
};
using BufferPtr = std::shared_ptr<char>;
class MXLogBuffer {
public:
MXLogBuffer(const std::string& mxlog_path, const uint32_t buffer_size);
~MXLogBuffer();
bool
Init(uint64_t read_lsn, uint64_t write_lsn);
// ignore all old wal file
void
Reset(uint64_t lsn);
// Note: record.lsn will be set inner
ErrorCode
Append(MXLogRecord& record);
ErrorCode
Next(const uint64_t last_applied_lsn, MXLogRecord& record);
uint64_t
GetReadLsn();
bool
ResetWriteLsn(uint64_t lsn);
void
SetFileNoFrom(uint32_t file_no);
void
RemoveOldFiles(uint64_t flushed_lsn);
uint32_t
GetBufferSize();
uint32_t
SurplusSpace();
private:
uint32_t
RecordSize(const MXLogRecord& record);
private:
uint32_t mxlog_buffer_size_; // from config
BufferPtr buf_[2];
std::mutex mutex_;
uint32_t file_no_from_;
MXLogBufferHandler mxlog_buffer_reader_;
MXLogBufferHandler mxlog_buffer_writer_;
MXLogFileHandler mxlog_writer_;
};
using MXLogBufferPtr = std::shared_ptr<MXLogBuffer>;
} // namespace wal
} // namespace engine
} // namespace milvus

View File

@ -0,0 +1,53 @@
// Copyright (C) 2019-2020 Zilliz. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#pragma once
#include <memory>
#include <string>
#include <unordered_map>
#include "db/Types.h"
#include "db/meta/MetaTypes.h"
namespace milvus {
namespace engine {
namespace wal {
using TableSchemaPtr = std::shared_ptr<milvus::engine::meta::TableSchema>;
using TableMetaPtr = std::shared_ptr<std::unordered_map<std::string, TableSchemaPtr> >;
#define WAL_BUFFER_MAX_SIZE ((uint32_t)2 * 1024 * 1024 * 1024)
#define WAL_BUFFER_MIN_SIZE ((uint32_t)32 * 1024 * 1024)
#define LSN_OFFSET_MASK 0x00000000ffffffff
enum class MXLogType { InsertBinary, InsertVector, Delete, Update, Flush, None };
struct MXLogRecord {
uint64_t lsn;
MXLogType type;
std::string table_id;
std::string partition_tag;
uint32_t length;
const IDNumber* ids;
uint32_t data_size;
const void* data;
};
struct MXLogConfiguration {
bool recovery_error_ignore;
uint32_t buffer_size;
std::string mxlog_path;
};
} // namespace wal
} // namespace engine
} // namespace milvus

View File

@ -0,0 +1,139 @@
// Copyright (C) 2019-2020 Zilliz. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "db/wal/WalFileHandler.h"
#include <sys/stat.h>
#include <unistd.h>
namespace milvus {
namespace engine {
namespace wal {
MXLogFileHandler::MXLogFileHandler(const std::string& mxlog_path) : file_path_(mxlog_path), p_file_(nullptr) {
}
MXLogFileHandler::~MXLogFileHandler() {
CloseFile();
}
bool
MXLogFileHandler::OpenFile() {
if (p_file_ == nullptr) {
p_file_ = fopen((file_path_ + file_name_).c_str(), file_mode_.c_str());
}
return (p_file_ != nullptr);
}
uint32_t
MXLogFileHandler::Load(char* buf, uint32_t data_offset) {
uint32_t read_size = 0;
if (OpenFile()) {
uint32_t file_size = GetFileSize();
if (file_size > data_offset) {
read_size = file_size - data_offset;
fseek(p_file_, data_offset, SEEK_SET);
fread(buf, 1, read_size, p_file_);
}
}
return read_size;
}
bool
MXLogFileHandler::Load(char* buf, uint32_t data_offset, uint32_t data_size) {
if (OpenFile() && data_size != 0) {
auto file_size = GetFileSize();
if ((file_size < data_offset) || (file_size - data_offset < data_size)) {
return false;
}
fseek(p_file_, data_offset, SEEK_SET);
fread(buf, 1, data_size, p_file_);
}
return true;
}
bool
MXLogFileHandler::Write(char* buf, uint32_t data_size, bool is_sync) {
uint32_t written_size = 0;
if (OpenFile() && data_size != 0) {
written_size = fwrite(buf, 1, data_size, p_file_);
fflush(p_file_);
}
return (written_size == data_size);
}
bool
MXLogFileHandler::ReBorn(const std::string& file_name, const std::string& open_mode) {
CloseFile();
SetFileName(file_name);
SetFileOpenMode(open_mode);
return OpenFile();
}
bool
MXLogFileHandler::CloseFile() {
if (p_file_ != nullptr) {
fclose(p_file_);
p_file_ = nullptr;
}
return true;
}
std::string
MXLogFileHandler::GetFilePath() {
return file_path_;
}
std::string
MXLogFileHandler::GetFileName() {
return file_name_;
}
uint32_t
MXLogFileHandler::GetFileSize() {
struct stat statbuf;
if (0 == stat((file_path_ + file_name_).c_str(), &statbuf)) {
return (uint32_t)statbuf.st_size;
}
return 0;
}
void
MXLogFileHandler::DeleteFile() {
remove((file_path_ + file_name_).c_str());
file_name_ = "";
}
bool
MXLogFileHandler::FileExists() {
return access((file_path_ + file_name_).c_str(), 0) != -1;
}
void
MXLogFileHandler::SetFileOpenMode(const std::string& open_mode) {
file_mode_ = open_mode;
}
void
MXLogFileHandler::SetFileName(const std::string& file_name) {
file_name_ = file_name;
}
void
MXLogFileHandler::SetFilePath(const std::string& file_path) {
file_path_ = file_path;
}
} // namespace wal
} // namespace engine
} // namespace milvus

View File

@ -0,0 +1,65 @@
// Copyright (C) 2019-2020 Zilliz. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#pragma once
#include <string>
#include "WalDefinations.h"
namespace milvus {
namespace engine {
namespace wal {
class MXLogFileHandler {
public:
explicit MXLogFileHandler(const std::string& mxlog_path);
~MXLogFileHandler();
std::string
GetFilePath();
std::string
GetFileName();
bool
OpenFile();
bool
CloseFile();
uint32_t
Load(char* buf, uint32_t data_offset);
bool
Load(char* buf, uint32_t data_offset, uint32_t data_size);
bool
Write(char* buf, uint32_t data_size, bool is_sync = false);
bool
ReBorn(const std::string& file_name, const std::string& open_mode);
uint32_t
GetFileSize();
void
SetFileOpenMode(const std::string& open_mode);
void
SetFilePath(const std::string& file_path);
void
SetFileName(const std::string& file_name);
void
DeleteFile();
bool
FileExists();
private:
std::string file_path_;
std::string file_name_;
std::string file_mode_;
FILE* p_file_;
};
} // namespace wal
} // namespace engine
} // namespace milvus

View File

@ -0,0 +1,397 @@
// Copyright (C) 2019-2020 Zilliz. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "db/wal/WalManager.h"
#include <unistd.h>
#include <algorithm>
#include <memory>
#include "utils/CommonUtil.h"
#include "utils/Exception.h"
#include "utils/Log.h"
namespace milvus {
namespace engine {
namespace wal {
WalManager::WalManager(const MXLogConfiguration& config) {
mxlog_config_.recovery_error_ignore = config.recovery_error_ignore;
mxlog_config_.buffer_size = config.buffer_size * 1024 * 1024;
mxlog_config_.mxlog_path = config.mxlog_path;
// check the path end with '/'
if (mxlog_config_.mxlog_path.back() != '/') {
mxlog_config_.mxlog_path += '/';
}
// check path exist
auto status = server::CommonUtil::CreateDirectory(mxlog_config_.mxlog_path);
if (!status.ok()) {
std::string msg = "failed to create wal directory " + mxlog_config_.mxlog_path;
ENGINE_LOG_ERROR << msg;
throw Exception(WAL_PATH_ERROR, msg);
}
}
WalManager::~WalManager() {
}
ErrorCode
WalManager::Init(const meta::MetaPtr& meta) {
uint64_t applied_lsn = 0;
p_meta_handler_ = std::make_shared<MXLogMetaHandler>(mxlog_config_.mxlog_path);
if (p_meta_handler_ != nullptr) {
p_meta_handler_->GetMXLogInternalMeta(applied_lsn);
}
uint64_t recovery_start = 0;
if (meta != nullptr) {
meta->GetGlobalLastLSN(recovery_start);
std::vector<meta::TableSchema> table_schema_array;
auto status = meta->AllTables(table_schema_array);
if (!status.ok()) {
return WAL_META_ERROR;
}
if (!table_schema_array.empty()) {
// get min and max flushed lsn
uint64_t min_flused_lsn = table_schema_array[0].flush_lsn_;
uint64_t max_flused_lsn = table_schema_array[0].flush_lsn_;
for (size_t i = 1; i < table_schema_array.size(); i++) {
if (min_flused_lsn > table_schema_array[i].flush_lsn_) {
min_flused_lsn = table_schema_array[i].flush_lsn_;
} else if (max_flused_lsn < table_schema_array[i].flush_lsn_) {
max_flused_lsn = table_schema_array[i].flush_lsn_;
}
}
if (applied_lsn < max_flused_lsn) {
// a new WAL folder?
applied_lsn = max_flused_lsn;
}
if (recovery_start < min_flused_lsn) {
// not flush all yet
recovery_start = min_flused_lsn;
}
for (auto& schema : table_schema_array) {
TableLsn tb_lsn = {schema.flush_lsn_, applied_lsn};
tables_[schema.table_id_] = tb_lsn;
}
}
}
// all tables are droped and a new wal path?
if (applied_lsn < recovery_start) {
applied_lsn = recovery_start;
}
ErrorCode error_code = WAL_ERROR;
p_buffer_ = std::make_shared<MXLogBuffer>(mxlog_config_.mxlog_path, mxlog_config_.buffer_size);
if (p_buffer_ != nullptr) {
if (p_buffer_->Init(recovery_start, applied_lsn)) {
error_code = WAL_SUCCESS;
} else if (mxlog_config_.recovery_error_ignore) {
p_buffer_->Reset(applied_lsn);
error_code = WAL_SUCCESS;
} else {
error_code = WAL_FILE_ERROR;
}
}
// buffer size may changed
mxlog_config_.buffer_size = p_buffer_->GetBufferSize();
last_applied_lsn_ = applied_lsn;
return error_code;
}
ErrorCode
WalManager::GetNextRecovery(MXLogRecord& record) {
ErrorCode error_code = WAL_SUCCESS;
while (true) {
error_code = p_buffer_->Next(last_applied_lsn_, record);
if (error_code != WAL_SUCCESS) {
if (mxlog_config_.recovery_error_ignore) {
// reset and break recovery
p_buffer_->Reset(last_applied_lsn_);
record.type = MXLogType::None;
error_code = WAL_SUCCESS;
}
break;
}
if (record.type == MXLogType::None) {
break;
}
// background thread has not started.
// so, needn't lock here.
auto it = tables_.find(record.table_id);
if (it != tables_.end()) {
if (it->second.flush_lsn < record.lsn) {
break;
}
}
}
WAL_LOG_INFO << "record type " << (int32_t)record.type << " record lsn " << record.lsn << " error code "
<< error_code;
return error_code;
}
ErrorCode
WalManager::GetNextRecord(MXLogRecord& record) {
auto check_flush = [&]() -> bool {
std::lock_guard<std::mutex> lck(mutex_);
if (flush_info_.IsValid()) {
if (p_buffer_->GetReadLsn() >= flush_info_.lsn_) {
// can exec flush requirement
record.type = MXLogType::Flush;
record.table_id = flush_info_.table_id_;
record.lsn = flush_info_.lsn_;
flush_info_.Clear();
WAL_LOG_INFO << "record flush table " << record.table_id << " lsn " << record.lsn;
return true;
}
}
return false;
};
if (check_flush()) {
return WAL_SUCCESS;
}
ErrorCode error_code = WAL_SUCCESS;
while (WAL_SUCCESS == p_buffer_->Next(last_applied_lsn_, record)) {
if (record.type == MXLogType::None) {
if (check_flush()) {
return WAL_SUCCESS;
}
break;
}
std::lock_guard<std::mutex> lck(mutex_);
auto it = tables_.find(record.table_id);
if (it != tables_.end()) {
break;
}
}
WAL_LOG_INFO << "record type " << (int32_t)record.type << " table " << record.table_id << " lsn " << record.lsn;
return error_code;
}
uint64_t
WalManager::CreateTable(const std::string& table_id) {
WAL_LOG_INFO << "create table " << table_id << " " << last_applied_lsn_;
std::lock_guard<std::mutex> lck(mutex_);
uint64_t applied_lsn = last_applied_lsn_;
tables_[table_id] = {applied_lsn, applied_lsn};
return applied_lsn;
}
void
WalManager::DropTable(const std::string& table_id) {
WAL_LOG_INFO << "drop table " << table_id;
std::lock_guard<std::mutex> lck(mutex_);
tables_.erase(table_id);
}
void
WalManager::TableFlushed(const std::string& table_id, uint64_t lsn) {
std::unique_lock<std::mutex> lck(mutex_);
auto it = tables_.find(table_id);
if (it != tables_.end()) {
it->second.flush_lsn = lsn;
}
lck.unlock();
WAL_LOG_INFO << table_id << " is flushed by lsn " << lsn;
}
template <typename T>
bool
WalManager::Insert(const std::string& table_id, const std::string& partition_tag, const IDNumbers& vector_ids,
const std::vector<T>& vectors) {
MXLogType log_type;
if (std::is_same<T, float>::value) {
log_type = MXLogType::InsertVector;
} else if (std::is_same<T, uint8_t>::value) {
log_type = MXLogType::InsertBinary;
} else {
return false;
}
size_t vector_num = vector_ids.size();
if (vector_num == 0) {
WAL_LOG_ERROR << "The ids is empty.";
return false;
}
size_t dim = vectors.size() / vector_num;
size_t unit_size = dim * sizeof(T) + sizeof(IDNumber);
size_t head_size = SizeOfMXLogRecordHeader + table_id.length() + partition_tag.length();
MXLogRecord record;
record.type = log_type;
record.table_id = table_id;
record.partition_tag = partition_tag;
uint64_t new_lsn = 0;
for (size_t i = 0; i < vector_num; i += record.length) {
size_t surplus_space = p_buffer_->SurplusSpace();
size_t max_rcd_num = 0;
if (surplus_space >= head_size + unit_size) {
max_rcd_num = (surplus_space - head_size) / unit_size;
} else {
max_rcd_num = (mxlog_config_.buffer_size - head_size) / unit_size;
}
if (max_rcd_num == 0) {
WAL_LOG_ERROR << "Wal buffer size is too small " << mxlog_config_.buffer_size << " unit " << unit_size;
return false;
}
record.length = std::min(vector_num - i, max_rcd_num);
record.ids = vector_ids.data() + i;
record.data_size = record.length * dim * sizeof(T);
record.data = vectors.data() + i * dim;
auto error_code = p_buffer_->Append(record);
if (error_code != WAL_SUCCESS) {
p_buffer_->ResetWriteLsn(last_applied_lsn_);
return false;
}
new_lsn = record.lsn;
}
std::unique_lock<std::mutex> lck(mutex_);
last_applied_lsn_ = new_lsn;
auto it = tables_.find(table_id);
if (it != tables_.end()) {
it->second.wal_lsn = new_lsn;
}
lck.unlock();
WAL_LOG_INFO << table_id << " insert in part " << partition_tag << " with lsn " << new_lsn;
return p_meta_handler_->SetMXLogInternalMeta(new_lsn);
}
bool
WalManager::DeleteById(const std::string& table_id, const IDNumbers& vector_ids) {
size_t vector_num = vector_ids.size();
if (vector_num == 0) {
WAL_LOG_ERROR << "The ids is empty.";
return false;
}
size_t unit_size = sizeof(IDNumber);
size_t head_size = SizeOfMXLogRecordHeader + table_id.length();
MXLogRecord record;
record.type = MXLogType::Delete;
record.table_id = table_id;
record.partition_tag = "";
uint64_t new_lsn = 0;
for (size_t i = 0; i < vector_num; i += record.length) {
size_t surplus_space = p_buffer_->SurplusSpace();
size_t max_rcd_num = 0;
if (surplus_space >= head_size + unit_size) {
max_rcd_num = (surplus_space - head_size) / unit_size;
} else {
max_rcd_num = (mxlog_config_.buffer_size - head_size) / unit_size;
}
record.length = std::min(vector_num - i, max_rcd_num);
record.ids = vector_ids.data() + i;
record.data_size = 0;
record.data = nullptr;
auto error_code = p_buffer_->Append(record);
if (error_code != WAL_SUCCESS) {
p_buffer_->ResetWriteLsn(last_applied_lsn_);
return false;
}
new_lsn = record.lsn;
}
std::unique_lock<std::mutex> lck(mutex_);
last_applied_lsn_ = new_lsn;
auto it = tables_.find(table_id);
if (it != tables_.end()) {
it->second.wal_lsn = new_lsn;
}
lck.unlock();
WAL_LOG_INFO << table_id << " delete rows by id, lsn " << new_lsn;
return p_meta_handler_->SetMXLogInternalMeta(new_lsn);
}
uint64_t
WalManager::Flush(const std::string table_id) {
std::lock_guard<std::mutex> lck(mutex_);
// At most one flush requirement is waiting at any time.
// Otherwise, flush_info_ should be modified to a list.
__glibcxx_assert(!flush_info_.IsValid());
uint64_t lsn = 0;
if (table_id.empty()) {
// flush all tables
for (auto& it : tables_) {
if (it.second.wal_lsn > it.second.flush_lsn) {
lsn = last_applied_lsn_;
break;
}
}
} else {
// flush one table
auto it = tables_.find(table_id);
if (it != tables_.end()) {
if (it->second.wal_lsn > it->second.flush_lsn) {
lsn = it->second.wal_lsn;
}
}
}
if (lsn != 0) {
flush_info_.table_id_ = table_id;
flush_info_.lsn_ = lsn;
}
WAL_LOG_INFO << table_id << " want to be flush, lsn " << lsn;
return lsn;
}
void
WalManager::RemoveOldFiles(uint64_t flushed_lsn) {
if (p_buffer_ != nullptr) {
p_buffer_->RemoveOldFiles(flushed_lsn);
}
}
template bool
WalManager::Insert<float>(const std::string& table_id, const std::string& partition_tag, const IDNumbers& vector_ids,
const std::vector<float>& vectors);
template bool
WalManager::Insert<uint8_t>(const std::string& table_id, const std::string& partition_tag, const IDNumbers& vector_ids,
const std::vector<uint8_t>& vectors);
} // namespace wal
} // namespace engine
} // namespace milvus

View File

@ -0,0 +1,159 @@
// Copyright (C) 2019-2020 Zilliz. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#pragma once
#include <atomic>
#include <map>
#include <string>
#include <utility>
#include <vector>
#include "WalBuffer.h"
#include "WalDefinations.h"
#include "WalFileHandler.h"
#include "WalMetaHandler.h"
#include "utils/Error.h"
namespace milvus {
namespace engine {
namespace wal {
class WalManager {
public:
explicit WalManager(const MXLogConfiguration& config);
~WalManager();
/*
* init
* @param meta
* @retval error_code
*/
ErrorCode
Init(const meta::MetaPtr& meta);
/*
* Get next recovery
* @param record[out]: record
* @retval error_code
*/
ErrorCode
GetNextRecovery(MXLogRecord& record);
/*
* Get next record
* @param record[out]: record
* @retval error_code
*/
ErrorCode
GetNextRecord(MXLogRecord& record);
/*
* Create table
* @param table_id: table id
* @retval lsn
*/
uint64_t
CreateTable(const std::string& table_id);
/*
* Drop table
* @param table_id: table id
* @retval none
*/
void
DropTable(const std::string& table_id);
/*
* Table is flushed
* @param table_id: table id
* @param lsn: flushed lsn
*/
void
TableFlushed(const std::string& table_id, uint64_t lsn);
/*
* Insert
* @param table_id: table id
* @param table_id: partition tag
* @param vector_ids: vector ids
* @param vectors: vectors
*/
template <typename T>
bool
Insert(const std::string& table_id, const std::string& partition_tag, const IDNumbers& vector_ids,
const std::vector<T>& vectors);
/*
* Insert
* @param table_id: table id
* @param vector_ids: vector ids
*/
bool
DeleteById(const std::string& table_id, const IDNumbers& vector_ids);
/*
* Get flush lsn
* @param table_id: table id (empty means all tables)
* @retval if there is something not flushed, return lsn;
* else, return 0
*/
uint64_t
Flush(const std::string table_id = "");
void
RemoveOldFiles(uint64_t flushed_lsn);
private:
WalManager
operator=(WalManager&);
MXLogConfiguration mxlog_config_;
MXLogBufferPtr p_buffer_;
MXLogMetaHandlerPtr p_meta_handler_;
struct TableLsn {
uint64_t flush_lsn;
uint64_t wal_lsn;
};
std::mutex mutex_;
std::map<std::string, TableLsn> tables_;
std::atomic<uint64_t> last_applied_lsn_;
// if multi-thread call Flush(), use list
struct FlushInfo {
std::string table_id_;
uint64_t lsn_ = 0;
bool
IsValid() {
return (lsn_ != 0);
}
void
Clear() {
lsn_ = 0;
}
};
FlushInfo flush_info_;
};
extern template bool
WalManager::Insert<float>(const std::string& table_id, const std::string& partition_tag, const IDNumbers& vector_ids,
const std::vector<float>& vectors);
extern template bool
WalManager::Insert<uint8_t>(const std::string& table_id, const std::string& partition_tag, const IDNumbers& vector_ids,
const std::vector<uint8_t>& vectors);
} // namespace wal
} // namespace engine
} // namespace milvus

View File

@ -0,0 +1,70 @@
// Copyright (C) 2019-2020 Zilliz. All rights reserved.
//
// Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance
// with the License. You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software distributed under the License
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "db/wal/WalMetaHandler.h"
#include <cstring>
namespace milvus {
namespace engine {
namespace wal {
MXLogMetaHandler::MXLogMetaHandler(const std::string& internal_meta_file_path) {
std::string file_full_path = internal_meta_file_path + WAL_META_FILE_NAME;
wal_meta_fp_ = fopen(file_full_path.c_str(), "r+");
if (wal_meta_fp_ == nullptr) {
wal_meta_fp_ = fopen(file_full_path.c_str(), "w");
} else {
uint64_t all_wal_lsn[3] = {0, 0, 0};
auto rt_val = fread(&all_wal_lsn, sizeof(all_wal_lsn), 1, wal_meta_fp_);
if (rt_val == 1) {
if (all_wal_lsn[2] == all_wal_lsn[1]) {
latest_wal_lsn_ = all_wal_lsn[2];
} else {
latest_wal_lsn_ = all_wal_lsn[0];
}
}
}
}
MXLogMetaHandler::~MXLogMetaHandler() {
if (wal_meta_fp_ != nullptr) {
fclose(wal_meta_fp_);
wal_meta_fp_ = nullptr;
}
}
bool
MXLogMetaHandler::GetMXLogInternalMeta(uint64_t& wal_lsn) {
wal_lsn = latest_wal_lsn_;
return true;
}
bool
MXLogMetaHandler::SetMXLogInternalMeta(uint64_t wal_lsn) {
if (wal_meta_fp_ != nullptr) {
uint64_t all_wal_lsn[3] = {latest_wal_lsn_, wal_lsn, wal_lsn};
fseek(wal_meta_fp_, 0, SEEK_SET);
auto rt_val = fwrite(&all_wal_lsn, sizeof(all_wal_lsn), 1, wal_meta_fp_);
if (rt_val == 1) {
fflush(wal_meta_fp_);
latest_wal_lsn_ = wal_lsn;
return true;
}
}
return false;
}
} // namespace wal
} // namespace engine
} // namespace milvus

View File

@ -13,27 +13,38 @@
#include <memory>
#include <string>
#include <unordered_map>
#include "server/delivery/request/BaseRequest.h"
#include "db/meta/Meta.h"
#include "db/meta/MetaFactory.h"
#include "db/meta/MetaTypes.h"
#include "db/wal/WalDefinations.h"
#include "db/wal/WalFileHandler.h"
namespace milvus {
namespace server {
namespace engine {
namespace wal {
class DeleteByDateRequest : public BaseRequest {
static const char* WAL_META_FILE_NAME = "mxlog.meta";
class MXLogMetaHandler {
public:
static BaseRequestPtr
Create(const std::shared_ptr<Context>& context, const std::string& table_name, const Range& range);
explicit MXLogMetaHandler(const std::string& internal_meta_file_path);
~MXLogMetaHandler();
protected:
DeleteByDateRequest(const std::shared_ptr<Context>& context, const std::string& table_name, const Range& range);
bool
GetMXLogInternalMeta(uint64_t& wal_lsn);
Status
OnExecute() override;
bool
SetMXLogInternalMeta(uint64_t wal_lsn);
private:
const std::string table_name_;
const Range& range_;
FILE* wal_meta_fp_;
uint64_t latest_wal_lsn_ = 0;
};
} // namespace server
using MXLogMetaHandlerPtr = std::shared_ptr<MXLogMetaHandler>;
} // namespace wal
} // namespace engine
} // namespace milvus

View File

@ -25,6 +25,7 @@ static const char* MilvusService_method_names[] = {
"/milvus.grpc.MilvusService/DescribeTable",
"/milvus.grpc.MilvusService/CountTable",
"/milvus.grpc.MilvusService/ShowTables",
"/milvus.grpc.MilvusService/ShowTableInfo",
"/milvus.grpc.MilvusService/DropTable",
"/milvus.grpc.MilvusService/CreateIndex",
"/milvus.grpc.MilvusService/DescribeIndex",
@ -33,11 +34,16 @@ static const char* MilvusService_method_names[] = {
"/milvus.grpc.MilvusService/ShowPartitions",
"/milvus.grpc.MilvusService/DropPartition",
"/milvus.grpc.MilvusService/Insert",
"/milvus.grpc.MilvusService/GetVectorByID",
"/milvus.grpc.MilvusService/GetVectorIDs",
"/milvus.grpc.MilvusService/Search",
"/milvus.grpc.MilvusService/SearchByID",
"/milvus.grpc.MilvusService/SearchInFiles",
"/milvus.grpc.MilvusService/Cmd",
"/milvus.grpc.MilvusService/DeleteByDate",
"/milvus.grpc.MilvusService/DeleteByID",
"/milvus.grpc.MilvusService/PreloadTable",
"/milvus.grpc.MilvusService/Flush",
"/milvus.grpc.MilvusService/Compact",
};
std::unique_ptr< MilvusService::Stub> MilvusService::NewStub(const std::shared_ptr< ::grpc::ChannelInterface>& channel, const ::grpc::StubOptions& options) {
@ -52,19 +58,25 @@ MilvusService::Stub::Stub(const std::shared_ptr< ::grpc::ChannelInterface>& chan
, rpcmethod_DescribeTable_(MilvusService_method_names[2], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_CountTable_(MilvusService_method_names[3], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_ShowTables_(MilvusService_method_names[4], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_DropTable_(MilvusService_method_names[5], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_CreateIndex_(MilvusService_method_names[6], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_DescribeIndex_(MilvusService_method_names[7], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_DropIndex_(MilvusService_method_names[8], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_CreatePartition_(MilvusService_method_names[9], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_ShowPartitions_(MilvusService_method_names[10], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_DropPartition_(MilvusService_method_names[11], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_Insert_(MilvusService_method_names[12], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_Search_(MilvusService_method_names[13], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_SearchInFiles_(MilvusService_method_names[14], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_Cmd_(MilvusService_method_names[15], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_DeleteByDate_(MilvusService_method_names[16], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_PreloadTable_(MilvusService_method_names[17], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_ShowTableInfo_(MilvusService_method_names[5], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_DropTable_(MilvusService_method_names[6], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_CreateIndex_(MilvusService_method_names[7], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_DescribeIndex_(MilvusService_method_names[8], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_DropIndex_(MilvusService_method_names[9], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_CreatePartition_(MilvusService_method_names[10], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_ShowPartitions_(MilvusService_method_names[11], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_DropPartition_(MilvusService_method_names[12], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_Insert_(MilvusService_method_names[13], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_GetVectorByID_(MilvusService_method_names[14], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_GetVectorIDs_(MilvusService_method_names[15], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_Search_(MilvusService_method_names[16], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_SearchByID_(MilvusService_method_names[17], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_SearchInFiles_(MilvusService_method_names[18], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_Cmd_(MilvusService_method_names[19], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_DeleteByID_(MilvusService_method_names[20], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_PreloadTable_(MilvusService_method_names[21], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_Flush_(MilvusService_method_names[22], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
, rpcmethod_Compact_(MilvusService_method_names[23], ::grpc::internal::RpcMethod::NORMAL_RPC, channel)
{}
::grpc::Status MilvusService::Stub::CreateTable(::grpc::ClientContext* context, const ::milvus::grpc::TableSchema& request, ::milvus::grpc::Status* response) {
@ -207,6 +219,34 @@ void MilvusService::Stub::experimental_async::ShowTables(::grpc::ClientContext*
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::TableNameList>::Create(channel_.get(), cq, rpcmethod_ShowTables_, context, request, false);
}
::grpc::Status MilvusService::Stub::ShowTableInfo(::grpc::ClientContext* context, const ::milvus::grpc::TableName& request, ::milvus::grpc::TableInfo* response) {
return ::grpc::internal::BlockingUnaryCall(channel_.get(), rpcmethod_ShowTableInfo_, context, request, response);
}
void MilvusService::Stub::experimental_async::ShowTableInfo(::grpc::ClientContext* context, const ::milvus::grpc::TableName* request, ::milvus::grpc::TableInfo* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_ShowTableInfo_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::ShowTableInfo(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::TableInfo* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_ShowTableInfo_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::ShowTableInfo(::grpc::ClientContext* context, const ::milvus::grpc::TableName* request, ::milvus::grpc::TableInfo* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_ShowTableInfo_, context, request, response, reactor);
}
void MilvusService::Stub::experimental_async::ShowTableInfo(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::TableInfo* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_ShowTableInfo_, context, request, response, reactor);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::TableInfo>* MilvusService::Stub::AsyncShowTableInfoRaw(::grpc::ClientContext* context, const ::milvus::grpc::TableName& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::TableInfo>::Create(channel_.get(), cq, rpcmethod_ShowTableInfo_, context, request, true);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::TableInfo>* MilvusService::Stub::PrepareAsyncShowTableInfoRaw(::grpc::ClientContext* context, const ::milvus::grpc::TableName& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::TableInfo>::Create(channel_.get(), cq, rpcmethod_ShowTableInfo_, context, request, false);
}
::grpc::Status MilvusService::Stub::DropTable(::grpc::ClientContext* context, const ::milvus::grpc::TableName& request, ::milvus::grpc::Status* response) {
return ::grpc::internal::BlockingUnaryCall(channel_.get(), rpcmethod_DropTable_, context, request, response);
}
@ -431,6 +471,62 @@ void MilvusService::Stub::experimental_async::Insert(::grpc::ClientContext* cont
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::VectorIds>::Create(channel_.get(), cq, rpcmethod_Insert_, context, request, false);
}
::grpc::Status MilvusService::Stub::GetVectorByID(::grpc::ClientContext* context, const ::milvus::grpc::VectorIdentity& request, ::milvus::grpc::VectorData* response) {
return ::grpc::internal::BlockingUnaryCall(channel_.get(), rpcmethod_GetVectorByID_, context, request, response);
}
void MilvusService::Stub::experimental_async::GetVectorByID(::grpc::ClientContext* context, const ::milvus::grpc::VectorIdentity* request, ::milvus::grpc::VectorData* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_GetVectorByID_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::GetVectorByID(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::VectorData* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_GetVectorByID_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::GetVectorByID(::grpc::ClientContext* context, const ::milvus::grpc::VectorIdentity* request, ::milvus::grpc::VectorData* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_GetVectorByID_, context, request, response, reactor);
}
void MilvusService::Stub::experimental_async::GetVectorByID(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::VectorData* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_GetVectorByID_, context, request, response, reactor);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::VectorData>* MilvusService::Stub::AsyncGetVectorByIDRaw(::grpc::ClientContext* context, const ::milvus::grpc::VectorIdentity& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::VectorData>::Create(channel_.get(), cq, rpcmethod_GetVectorByID_, context, request, true);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::VectorData>* MilvusService::Stub::PrepareAsyncGetVectorByIDRaw(::grpc::ClientContext* context, const ::milvus::grpc::VectorIdentity& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::VectorData>::Create(channel_.get(), cq, rpcmethod_GetVectorByID_, context, request, false);
}
::grpc::Status MilvusService::Stub::GetVectorIDs(::grpc::ClientContext* context, const ::milvus::grpc::GetVectorIDsParam& request, ::milvus::grpc::VectorIds* response) {
return ::grpc::internal::BlockingUnaryCall(channel_.get(), rpcmethod_GetVectorIDs_, context, request, response);
}
void MilvusService::Stub::experimental_async::GetVectorIDs(::grpc::ClientContext* context, const ::milvus::grpc::GetVectorIDsParam* request, ::milvus::grpc::VectorIds* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_GetVectorIDs_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::GetVectorIDs(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::VectorIds* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_GetVectorIDs_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::GetVectorIDs(::grpc::ClientContext* context, const ::milvus::grpc::GetVectorIDsParam* request, ::milvus::grpc::VectorIds* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_GetVectorIDs_, context, request, response, reactor);
}
void MilvusService::Stub::experimental_async::GetVectorIDs(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::VectorIds* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_GetVectorIDs_, context, request, response, reactor);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::VectorIds>* MilvusService::Stub::AsyncGetVectorIDsRaw(::grpc::ClientContext* context, const ::milvus::grpc::GetVectorIDsParam& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::VectorIds>::Create(channel_.get(), cq, rpcmethod_GetVectorIDs_, context, request, true);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::VectorIds>* MilvusService::Stub::PrepareAsyncGetVectorIDsRaw(::grpc::ClientContext* context, const ::milvus::grpc::GetVectorIDsParam& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::VectorIds>::Create(channel_.get(), cq, rpcmethod_GetVectorIDs_, context, request, false);
}
::grpc::Status MilvusService::Stub::Search(::grpc::ClientContext* context, const ::milvus::grpc::SearchParam& request, ::milvus::grpc::TopKQueryResult* response) {
return ::grpc::internal::BlockingUnaryCall(channel_.get(), rpcmethod_Search_, context, request, response);
}
@ -459,6 +555,34 @@ void MilvusService::Stub::experimental_async::Search(::grpc::ClientContext* cont
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::TopKQueryResult>::Create(channel_.get(), cq, rpcmethod_Search_, context, request, false);
}
::grpc::Status MilvusService::Stub::SearchByID(::grpc::ClientContext* context, const ::milvus::grpc::SearchByIDParam& request, ::milvus::grpc::TopKQueryResult* response) {
return ::grpc::internal::BlockingUnaryCall(channel_.get(), rpcmethod_SearchByID_, context, request, response);
}
void MilvusService::Stub::experimental_async::SearchByID(::grpc::ClientContext* context, const ::milvus::grpc::SearchByIDParam* request, ::milvus::grpc::TopKQueryResult* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_SearchByID_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::SearchByID(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::TopKQueryResult* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_SearchByID_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::SearchByID(::grpc::ClientContext* context, const ::milvus::grpc::SearchByIDParam* request, ::milvus::grpc::TopKQueryResult* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_SearchByID_, context, request, response, reactor);
}
void MilvusService::Stub::experimental_async::SearchByID(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::TopKQueryResult* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_SearchByID_, context, request, response, reactor);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::TopKQueryResult>* MilvusService::Stub::AsyncSearchByIDRaw(::grpc::ClientContext* context, const ::milvus::grpc::SearchByIDParam& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::TopKQueryResult>::Create(channel_.get(), cq, rpcmethod_SearchByID_, context, request, true);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::TopKQueryResult>* MilvusService::Stub::PrepareAsyncSearchByIDRaw(::grpc::ClientContext* context, const ::milvus::grpc::SearchByIDParam& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::TopKQueryResult>::Create(channel_.get(), cq, rpcmethod_SearchByID_, context, request, false);
}
::grpc::Status MilvusService::Stub::SearchInFiles(::grpc::ClientContext* context, const ::milvus::grpc::SearchInFilesParam& request, ::milvus::grpc::TopKQueryResult* response) {
return ::grpc::internal::BlockingUnaryCall(channel_.get(), rpcmethod_SearchInFiles_, context, request, response);
}
@ -515,32 +639,32 @@ void MilvusService::Stub::experimental_async::Cmd(::grpc::ClientContext* context
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::StringReply>::Create(channel_.get(), cq, rpcmethod_Cmd_, context, request, false);
}
::grpc::Status MilvusService::Stub::DeleteByDate(::grpc::ClientContext* context, const ::milvus::grpc::DeleteByDateParam& request, ::milvus::grpc::Status* response) {
return ::grpc::internal::BlockingUnaryCall(channel_.get(), rpcmethod_DeleteByDate_, context, request, response);
::grpc::Status MilvusService::Stub::DeleteByID(::grpc::ClientContext* context, const ::milvus::grpc::DeleteByIDParam& request, ::milvus::grpc::Status* response) {
return ::grpc::internal::BlockingUnaryCall(channel_.get(), rpcmethod_DeleteByID_, context, request, response);
}
void MilvusService::Stub::experimental_async::DeleteByDate(::grpc::ClientContext* context, const ::milvus::grpc::DeleteByDateParam* request, ::milvus::grpc::Status* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_DeleteByDate_, context, request, response, std::move(f));
void MilvusService::Stub::experimental_async::DeleteByID(::grpc::ClientContext* context, const ::milvus::grpc::DeleteByIDParam* request, ::milvus::grpc::Status* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_DeleteByID_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::DeleteByDate(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::Status* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_DeleteByDate_, context, request, response, std::move(f));
void MilvusService::Stub::experimental_async::DeleteByID(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::Status* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_DeleteByID_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::DeleteByDate(::grpc::ClientContext* context, const ::milvus::grpc::DeleteByDateParam* request, ::milvus::grpc::Status* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_DeleteByDate_, context, request, response, reactor);
void MilvusService::Stub::experimental_async::DeleteByID(::grpc::ClientContext* context, const ::milvus::grpc::DeleteByIDParam* request, ::milvus::grpc::Status* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_DeleteByID_, context, request, response, reactor);
}
void MilvusService::Stub::experimental_async::DeleteByDate(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::Status* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_DeleteByDate_, context, request, response, reactor);
void MilvusService::Stub::experimental_async::DeleteByID(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::Status* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_DeleteByID_, context, request, response, reactor);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::Status>* MilvusService::Stub::AsyncDeleteByDateRaw(::grpc::ClientContext* context, const ::milvus::grpc::DeleteByDateParam& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::Status>::Create(channel_.get(), cq, rpcmethod_DeleteByDate_, context, request, true);
::grpc::ClientAsyncResponseReader< ::milvus::grpc::Status>* MilvusService::Stub::AsyncDeleteByIDRaw(::grpc::ClientContext* context, const ::milvus::grpc::DeleteByIDParam& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::Status>::Create(channel_.get(), cq, rpcmethod_DeleteByID_, context, request, true);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::Status>* MilvusService::Stub::PrepareAsyncDeleteByDateRaw(::grpc::ClientContext* context, const ::milvus::grpc::DeleteByDateParam& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::Status>::Create(channel_.get(), cq, rpcmethod_DeleteByDate_, context, request, false);
::grpc::ClientAsyncResponseReader< ::milvus::grpc::Status>* MilvusService::Stub::PrepareAsyncDeleteByIDRaw(::grpc::ClientContext* context, const ::milvus::grpc::DeleteByIDParam& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::Status>::Create(channel_.get(), cq, rpcmethod_DeleteByID_, context, request, false);
}
::grpc::Status MilvusService::Stub::PreloadTable(::grpc::ClientContext* context, const ::milvus::grpc::TableName& request, ::milvus::grpc::Status* response) {
@ -571,6 +695,62 @@ void MilvusService::Stub::experimental_async::PreloadTable(::grpc::ClientContext
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::Status>::Create(channel_.get(), cq, rpcmethod_PreloadTable_, context, request, false);
}
::grpc::Status MilvusService::Stub::Flush(::grpc::ClientContext* context, const ::milvus::grpc::FlushParam& request, ::milvus::grpc::Status* response) {
return ::grpc::internal::BlockingUnaryCall(channel_.get(), rpcmethod_Flush_, context, request, response);
}
void MilvusService::Stub::experimental_async::Flush(::grpc::ClientContext* context, const ::milvus::grpc::FlushParam* request, ::milvus::grpc::Status* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_Flush_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::Flush(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::Status* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_Flush_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::Flush(::grpc::ClientContext* context, const ::milvus::grpc::FlushParam* request, ::milvus::grpc::Status* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_Flush_, context, request, response, reactor);
}
void MilvusService::Stub::experimental_async::Flush(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::Status* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_Flush_, context, request, response, reactor);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::Status>* MilvusService::Stub::AsyncFlushRaw(::grpc::ClientContext* context, const ::milvus::grpc::FlushParam& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::Status>::Create(channel_.get(), cq, rpcmethod_Flush_, context, request, true);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::Status>* MilvusService::Stub::PrepareAsyncFlushRaw(::grpc::ClientContext* context, const ::milvus::grpc::FlushParam& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::Status>::Create(channel_.get(), cq, rpcmethod_Flush_, context, request, false);
}
::grpc::Status MilvusService::Stub::Compact(::grpc::ClientContext* context, const ::milvus::grpc::TableName& request, ::milvus::grpc::Status* response) {
return ::grpc::internal::BlockingUnaryCall(channel_.get(), rpcmethod_Compact_, context, request, response);
}
void MilvusService::Stub::experimental_async::Compact(::grpc::ClientContext* context, const ::milvus::grpc::TableName* request, ::milvus::grpc::Status* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_Compact_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::Compact(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::Status* response, std::function<void(::grpc::Status)> f) {
::grpc_impl::internal::CallbackUnaryCall(stub_->channel_.get(), stub_->rpcmethod_Compact_, context, request, response, std::move(f));
}
void MilvusService::Stub::experimental_async::Compact(::grpc::ClientContext* context, const ::milvus::grpc::TableName* request, ::milvus::grpc::Status* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_Compact_, context, request, response, reactor);
}
void MilvusService::Stub::experimental_async::Compact(::grpc::ClientContext* context, const ::grpc::ByteBuffer* request, ::milvus::grpc::Status* response, ::grpc::experimental::ClientUnaryReactor* reactor) {
::grpc_impl::internal::ClientCallbackUnaryFactory::Create(stub_->channel_.get(), stub_->rpcmethod_Compact_, context, request, response, reactor);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::Status>* MilvusService::Stub::AsyncCompactRaw(::grpc::ClientContext* context, const ::milvus::grpc::TableName& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::Status>::Create(channel_.get(), cq, rpcmethod_Compact_, context, request, true);
}
::grpc::ClientAsyncResponseReader< ::milvus::grpc::Status>* MilvusService::Stub::PrepareAsyncCompactRaw(::grpc::ClientContext* context, const ::milvus::grpc::TableName& request, ::grpc::CompletionQueue* cq) {
return ::grpc_impl::internal::ClientAsyncResponseReaderFactory< ::milvus::grpc::Status>::Create(channel_.get(), cq, rpcmethod_Compact_, context, request, false);
}
MilvusService::Service::Service() {
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[0],
@ -600,68 +780,98 @@ MilvusService::Service::Service() {
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[5],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::TableName, ::milvus::grpc::TableInfo>(
std::mem_fn(&MilvusService::Service::ShowTableInfo), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[6],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::TableName, ::milvus::grpc::Status>(
std::mem_fn(&MilvusService::Service::DropTable), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[6],
MilvusService_method_names[7],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::IndexParam, ::milvus::grpc::Status>(
std::mem_fn(&MilvusService::Service::CreateIndex), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[7],
MilvusService_method_names[8],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::TableName, ::milvus::grpc::IndexParam>(
std::mem_fn(&MilvusService::Service::DescribeIndex), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[8],
MilvusService_method_names[9],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::TableName, ::milvus::grpc::Status>(
std::mem_fn(&MilvusService::Service::DropIndex), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[9],
MilvusService_method_names[10],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::PartitionParam, ::milvus::grpc::Status>(
std::mem_fn(&MilvusService::Service::CreatePartition), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[10],
MilvusService_method_names[11],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::TableName, ::milvus::grpc::PartitionList>(
std::mem_fn(&MilvusService::Service::ShowPartitions), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[11],
MilvusService_method_names[12],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::PartitionParam, ::milvus::grpc::Status>(
std::mem_fn(&MilvusService::Service::DropPartition), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[12],
MilvusService_method_names[13],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::InsertParam, ::milvus::grpc::VectorIds>(
std::mem_fn(&MilvusService::Service::Insert), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[13],
MilvusService_method_names[14],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::VectorIdentity, ::milvus::grpc::VectorData>(
std::mem_fn(&MilvusService::Service::GetVectorByID), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[15],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::GetVectorIDsParam, ::milvus::grpc::VectorIds>(
std::mem_fn(&MilvusService::Service::GetVectorIDs), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[16],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::SearchParam, ::milvus::grpc::TopKQueryResult>(
std::mem_fn(&MilvusService::Service::Search), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[14],
MilvusService_method_names[17],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::SearchByIDParam, ::milvus::grpc::TopKQueryResult>(
std::mem_fn(&MilvusService::Service::SearchByID), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[18],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::SearchInFilesParam, ::milvus::grpc::TopKQueryResult>(
std::mem_fn(&MilvusService::Service::SearchInFiles), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[15],
MilvusService_method_names[19],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::Command, ::milvus::grpc::StringReply>(
std::mem_fn(&MilvusService::Service::Cmd), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[16],
MilvusService_method_names[20],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::DeleteByDateParam, ::milvus::grpc::Status>(
std::mem_fn(&MilvusService::Service::DeleteByDate), this)));
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::DeleteByIDParam, ::milvus::grpc::Status>(
std::mem_fn(&MilvusService::Service::DeleteByID), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[17],
MilvusService_method_names[21],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::TableName, ::milvus::grpc::Status>(
std::mem_fn(&MilvusService::Service::PreloadTable), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[22],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::FlushParam, ::milvus::grpc::Status>(
std::mem_fn(&MilvusService::Service::Flush), this)));
AddMethod(new ::grpc::internal::RpcServiceMethod(
MilvusService_method_names[23],
::grpc::internal::RpcMethod::NORMAL_RPC,
new ::grpc::internal::RpcMethodHandler< MilvusService::Service, ::milvus::grpc::TableName, ::milvus::grpc::Status>(
std::mem_fn(&MilvusService::Service::Compact), this)));
}
MilvusService::Service::~Service() {
@ -702,6 +912,13 @@ MilvusService::Service::~Service() {
return ::grpc::Status(::grpc::StatusCode::UNIMPLEMENTED, "");
}
::grpc::Status MilvusService::Service::ShowTableInfo(::grpc::ServerContext* context, const ::milvus::grpc::TableName* request, ::milvus::grpc::TableInfo* response) {
(void) context;
(void) request;
(void) response;
return ::grpc::Status(::grpc::StatusCode::UNIMPLEMENTED, "");
}
::grpc::Status MilvusService::Service::DropTable(::grpc::ServerContext* context, const ::milvus::grpc::TableName* request, ::milvus::grpc::Status* response) {
(void) context;
(void) request;
@ -758,6 +975,20 @@ MilvusService::Service::~Service() {
return ::grpc::Status(::grpc::StatusCode::UNIMPLEMENTED, "");
}
::grpc::Status MilvusService::Service::GetVectorByID(::grpc::ServerContext* context, const ::milvus::grpc::VectorIdentity* request, ::milvus::grpc::VectorData* response) {
(void) context;
(void) request;
(void) response;
return ::grpc::Status(::grpc::StatusCode::UNIMPLEMENTED, "");
}
::grpc::Status MilvusService::Service::GetVectorIDs(::grpc::ServerContext* context, const ::milvus::grpc::GetVectorIDsParam* request, ::milvus::grpc::VectorIds* response) {
(void) context;
(void) request;
(void) response;
return ::grpc::Status(::grpc::StatusCode::UNIMPLEMENTED, "");
}
::grpc::Status MilvusService::Service::Search(::grpc::ServerContext* context, const ::milvus::grpc::SearchParam* request, ::milvus::grpc::TopKQueryResult* response) {
(void) context;
(void) request;
@ -765,6 +996,13 @@ MilvusService::Service::~Service() {
return ::grpc::Status(::grpc::StatusCode::UNIMPLEMENTED, "");
}
::grpc::Status MilvusService::Service::SearchByID(::grpc::ServerContext* context, const ::milvus::grpc::SearchByIDParam* request, ::milvus::grpc::TopKQueryResult* response) {
(void) context;
(void) request;
(void) response;
return ::grpc::Status(::grpc::StatusCode::UNIMPLEMENTED, "");
}
::grpc::Status MilvusService::Service::SearchInFiles(::grpc::ServerContext* context, const ::milvus::grpc::SearchInFilesParam* request, ::milvus::grpc::TopKQueryResult* response) {
(void) context;
(void) request;
@ -779,7 +1017,7 @@ MilvusService::Service::~Service() {
return ::grpc::Status(::grpc::StatusCode::UNIMPLEMENTED, "");
}
::grpc::Status MilvusService::Service::DeleteByDate(::grpc::ServerContext* context, const ::milvus::grpc::DeleteByDateParam* request, ::milvus::grpc::Status* response) {
::grpc::Status MilvusService::Service::DeleteByID(::grpc::ServerContext* context, const ::milvus::grpc::DeleteByIDParam* request, ::milvus::grpc::Status* response) {
(void) context;
(void) request;
(void) response;
@ -793,6 +1031,20 @@ MilvusService::Service::~Service() {
return ::grpc::Status(::grpc::StatusCode::UNIMPLEMENTED, "");
}
::grpc::Status MilvusService::Service::Flush(::grpc::ServerContext* context, const ::milvus::grpc::FlushParam* request, ::milvus::grpc::Status* response) {
(void) context;
(void) request;
(void) response;
return ::grpc::Status(::grpc::StatusCode::UNIMPLEMENTED, "");
}
::grpc::Status MilvusService::Service::Compact(::grpc::ServerContext* context, const ::milvus::grpc::TableName* request, ::milvus::grpc::Status* response) {
(void) context;
(void) request;
(void) response;
return ::grpc::Status(::grpc::StatusCode::UNIMPLEMENTED, "");
}
} // namespace milvus
} // namespace grpc

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -61,21 +61,21 @@ static ::PROTOBUF_NAMESPACE_ID::Message const * const file_default_instances[] =
const char descriptor_table_protodef_status_2eproto[] PROTOBUF_SECTION_VARIABLE(protodesc_cold) =
"\n\014status.proto\022\013milvus.grpc\"D\n\006Status\022*\n"
"\nerror_code\030\001 \001(\0162\026.milvus.grpc.ErrorCod"
"e\022\016\n\006reason\030\002 \001(\t*\253\004\n\tErrorCode\022\013\n\007SUCCE"
"e\022\016\n\006reason\030\002 \001(\t*\230\004\n\tErrorCode\022\013\n\007SUCCE"
"SS\020\000\022\024\n\020UNEXPECTED_ERROR\020\001\022\022\n\016CONNECT_FA"
"ILED\020\002\022\025\n\021PERMISSION_DENIED\020\003\022\024\n\020TABLE_N"
"OT_EXISTS\020\004\022\024\n\020ILLEGAL_ARGUMENT\020\005\022\021\n\rILL"
"EGAL_RANGE\020\006\022\025\n\021ILLEGAL_DIMENSION\020\007\022\026\n\022I"
"LLEGAL_INDEX_TYPE\020\010\022\026\n\022ILLEGAL_TABLE_NAM"
"E\020\t\022\020\n\014ILLEGAL_TOPK\020\n\022\025\n\021ILLEGAL_ROWRECO"
"RD\020\013\022\025\n\021ILLEGAL_VECTOR_ID\020\014\022\031\n\025ILLEGAL_S"
"EARCH_RESULT\020\r\022\022\n\016FILE_NOT_FOUND\020\016\022\017\n\013ME"
"TA_FAILED\020\017\022\020\n\014CACHE_FAILED\020\020\022\030\n\024CANNOT_"
"CREATE_FOLDER\020\021\022\026\n\022CANNOT_CREATE_FILE\020\022\022"
"\030\n\024CANNOT_DELETE_FOLDER\020\023\022\026\n\022CANNOT_DELE"
"TE_FILE\020\024\022\025\n\021BUILD_INDEX_ERROR\020\025\022\021\n\rILLE"
"GAL_NLIST\020\026\022\027\n\023ILLEGAL_METRIC_TYPE\020\027\022\021\n\r"
"OUT_OF_MEMORY\020\030b\006proto3"
"OT_EXISTS\020\004\022\024\n\020ILLEGAL_ARGUMENT\020\005\022\025\n\021ILL"
"EGAL_DIMENSION\020\007\022\026\n\022ILLEGAL_INDEX_TYPE\020\010"
"\022\026\n\022ILLEGAL_TABLE_NAME\020\t\022\020\n\014ILLEGAL_TOPK"
"\020\n\022\025\n\021ILLEGAL_ROWRECORD\020\013\022\025\n\021ILLEGAL_VEC"
"TOR_ID\020\014\022\031\n\025ILLEGAL_SEARCH_RESULT\020\r\022\022\n\016F"
"ILE_NOT_FOUND\020\016\022\017\n\013META_FAILED\020\017\022\020\n\014CACH"
"E_FAILED\020\020\022\030\n\024CANNOT_CREATE_FOLDER\020\021\022\026\n\022"
"CANNOT_CREATE_FILE\020\022\022\030\n\024CANNOT_DELETE_FO"
"LDER\020\023\022\026\n\022CANNOT_DELETE_FILE\020\024\022\025\n\021BUILD_"
"INDEX_ERROR\020\025\022\021\n\rILLEGAL_NLIST\020\026\022\027\n\023ILLE"
"GAL_METRIC_TYPE\020\027\022\021\n\rOUT_OF_MEMORY\020\030b\006pr"
"oto3"
;
static const ::PROTOBUF_NAMESPACE_ID::internal::DescriptorTable*const descriptor_table_status_2eproto_deps[1] = {
};
@ -85,7 +85,7 @@ static ::PROTOBUF_NAMESPACE_ID::internal::SCCInfoBase*const descriptor_table_sta
static ::PROTOBUF_NAMESPACE_ID::internal::once_flag descriptor_table_status_2eproto_once;
static bool descriptor_table_status_2eproto_initialized = false;
const ::PROTOBUF_NAMESPACE_ID::internal::DescriptorTable descriptor_table_status_2eproto = {
&descriptor_table_status_2eproto_initialized, descriptor_table_protodef_status_2eproto, "status.proto", 663,
&descriptor_table_status_2eproto_initialized, descriptor_table_protodef_status_2eproto, "status.proto", 644,
&descriptor_table_status_2eproto_once, descriptor_table_status_2eproto_sccs, descriptor_table_status_2eproto_deps, 1, 0,
schemas, file_default_instances, TableStruct_status_2eproto::offsets,
file_level_metadata_status_2eproto, 1, file_level_enum_descriptors_status_2eproto, file_level_service_descriptors_status_2eproto,
@ -107,7 +107,6 @@ bool ErrorCode_IsValid(int value) {
case 3:
case 4:
case 5:
case 6:
case 7:
case 8:
case 9:

View File

@ -75,7 +75,6 @@ enum ErrorCode : int {
PERMISSION_DENIED = 3,
TABLE_NOT_EXISTS = 4,
ILLEGAL_ARGUMENT = 5,
ILLEGAL_RANGE = 6,
ILLEGAL_DIMENSION = 7,
ILLEGAL_INDEX_TYPE = 8,
ILLEGAL_TABLE_NAME = 9,

View File

@ -11,13 +11,6 @@ message TableName {
string table_name = 1;
}
/**
* @brief Partition name
*/
message PartitionName {
string partition_name = 1;
}
/**
* @brief Table name list
*/
@ -42,8 +35,7 @@ message TableSchema {
*/
message PartitionParam {
string table_name = 1;
string partition_name = 2;
string tag = 3;
string tag = 2;
}
/**
@ -51,15 +43,7 @@ message PartitionParam {
*/
message PartitionList {
Status status = 1;
repeated PartitionParam partition_array = 2;
}
/**
* @brief Range schema
*/
message Range {
string start_value = 1;
string end_value = 2;
repeated string partition_tag_array = 2;
}
/**
@ -94,10 +78,9 @@ message VectorIds {
message SearchParam {
string table_name = 1;
repeated RowRecord query_record_array = 2;
repeated Range query_range_array = 3;
int64 topk = 4;
int64 nprobe = 5;
repeated string partition_tag_array = 6;
int64 topk = 3;
int64 nprobe = 4;
repeated string partition_tag_array = 5;
}
/**
@ -108,6 +91,17 @@ message SearchInFilesParam {
SearchParam search_param = 2;
}
/**
* @brief Params for searching vector by ID
*/
message SearchByIDParam {
string table_name = 1;
int64 id = 2;
int64 topk = 3;
int64 nprobe = 4;
repeated string partition_tag_array = 5;
}
/**
* @brief Query result params
*/
@ -169,11 +163,70 @@ message IndexParam {
}
/**
* @brief table name and range for DeleteByDate
* @brief Flush params
*/
message DeleteByDateParam {
Range range = 1;
string table_name = 2;
message FlushParam {
repeated string table_name_array = 1;
}
/**
* @brief Flush params
*/
message DeleteByIDParam {
string table_name = 1;
repeated int64 id_array = 2;
}
/**
* @brief segment statistics
*/
message SegmentStat {
string segment_name = 1;
int64 row_count = 2;
string index_name = 3;
int64 data_size = 4;
}
/**
* @brief table statistics
*/
message PartitionStat {
string tag = 1;
int64 total_row_count = 2;
repeated SegmentStat segments_stat = 3;
}
/**
* @brief table information
*/
message TableInfo {
Status status = 1;
int64 total_row_count = 2;
repeated PartitionStat partitions_stat = 3;
}
/**
* @brief vector identity
*/
message VectorIdentity {
string table_name = 1;
int64 id = 2;
}
/**
* @brief vector data
*/
message VectorData {
Status status = 1;
RowRecord vector_data = 2;
}
/**
* @brief get vector ids from a segment parameters
*/
message GetVectorIDsParam {
string table_name = 1;
string segment_name = 2;
}
service MilvusService {
@ -222,6 +275,15 @@ service MilvusService {
*/
rpc ShowTables(Command) returns (TableNameList) {}
/**
* @brief This method is used to get table detail information.
*
* @param TableName, target table name.
*
* @return TableInfo
*/
rpc ShowTableInfo(TableName) returns (TableInfo) {}
/**
* @brief This method is used to delete table.
*
@ -294,6 +356,24 @@ service MilvusService {
*/
rpc Insert(InsertParam) returns (VectorIds) {}
/**
* @brief This method is used to get vector data by id.
*
* @param VectorIdentity, target vector id.
*
* @return VectorData
*/
rpc GetVectorByID(VectorIdentity) returns (VectorData) {}
/**
* @brief This method is used to get vector ids from a segment
*
* @param GetVectorIDsParam, target table and segment
*
* @return VectorIds
*/
rpc GetVectorIDs(GetVectorIDsParam) returns (VectorIds) {}
/**
* @brief This method is used to query vector in table.
*
@ -303,6 +383,15 @@ service MilvusService {
*/
rpc Search(SearchParam) returns (TopKQueryResult) {}
/**
* @brief This method is used to query vector by id.
*
* @param SearchByIDParam, search parameters.
*
* @return TopKQueryResult
*/
rpc SearchByID(SearchByIDParam) returns (TopKQueryResult) {}
/**
* @brief This method is used to query vector in specified files.
*
@ -321,21 +410,39 @@ service MilvusService {
*/
rpc Cmd(Command) returns (StringReply) {}
/**
* @brief This method is used to delete vector by date range
/**
* @brief This method is used to delete vector by id
*
* @param DeleteByDateParam, delete parameters.
* @param DeleteByIDParam, delete parameters.
*
* @return status
*/
rpc DeleteByDate(DeleteByDateParam) returns (Status) {}
rpc DeleteByID(DeleteByIDParam) returns (Status) {}
/**
* @brief This method is used to preload table
*
* @param TableName, target table name.
*
* @return Status
*/
rpc PreloadTable(TableName) returns (Status) {}
/**
* @brief This method is used to preload table
*
* @param TableName, target table name.
*
* @return Status
*/
rpc PreloadTable(TableName) returns (Status) {}
/**
* @brief This method is used to flush buffer into storage.
*
* @param FlushParam, flush parameters
*
* @return Status
*/
rpc Flush(FlushParam) returns (Status) {}
/**
* @brief This method is used to compact table
*
* @param TableName, target table name.
*
* @return Status
*/
rpc Compact(TableName) returns (Status) {}
}

View File

@ -9,7 +9,6 @@ enum ErrorCode {
PERMISSION_DENIED = 3;
TABLE_NOT_EXISTS = 4;
ILLEGAL_ARGUMENT = 5;
ILLEGAL_RANGE = 6;
ILLEGAL_DIMENSION = 7;
ILLEGAL_INDEX_TYPE = 8;
ILLEGAL_TABLE_NAME = 9;

View File

@ -9,14 +9,14 @@
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "knowhere/index/vector_index/IndexBinaryIDMAP.h"
#include <faiss/IndexBinaryFlat.h>
#include <faiss/MetaIndexes.h>
#include <faiss/index_factory.h>
#include "knowhere/adapter/VectorAdapter.h"
#include "knowhere/common/Exception.h"
#include "knowhere/index/vector_index/IndexBinaryIDMAP.h"
namespace knowhere {
@ -72,7 +72,7 @@ void
BinaryIDMAP::search_impl(int64_t n, const uint8_t* data, int64_t k, float* distances, int64_t* labels,
const Config& cfg) {
int32_t* pdistances = (int32_t*)distances;
index_->search(n, (uint8_t*)data, k, pdistances, labels);
index_->search(n, (uint8_t*)data, k, pdistances, labels, bitset_);
}
void
@ -137,4 +137,97 @@ BinaryIDMAP::Seal() {
// do nothing
}
void
BinaryIDMAP::AddWithoutId(const DatasetPtr& dataset, const Config& config) {
if (!index_) {
KNOWHERE_THROW_MSG("index not initialize");
}
std::lock_guard<std::mutex> lk(mutex_);
GETBINARYTENSOR(dataset)
std::vector<int64_t> new_ids(rows);
for (int i = 0; i < rows; ++i) {
new_ids[i] = i;
}
index_->add_with_ids(rows, (uint8_t*)p_data, new_ids.data());
}
DatasetPtr
BinaryIDMAP::GetVectorById(const DatasetPtr& dataset, const Config& config) {
if (!index_) {
KNOWHERE_THROW_MSG("index not initialize");
}
// GETBINARYTENSOR(dataset)
// auto rows = dataset->Get<int64_t>(meta::ROWS);
auto p_data = dataset->Get<const int64_t*>(meta::IDS);
auto elems = dataset->Get<int64_t>(meta::DIM);
size_t p_x_size = sizeof(uint8_t) * elems;
auto p_x = (uint8_t*)malloc(p_x_size);
index_->get_vector_by_id(1, p_data, p_x, bitset_);
auto ret_ds = std::make_shared<Dataset>();
ret_ds->Set(meta::TENSOR, p_x);
return ret_ds;
}
DatasetPtr
BinaryIDMAP::SearchById(const DatasetPtr& dataset, const Config& config) {
if (!index_) {
KNOWHERE_THROW_MSG("index not initialize");
}
// auto search_cfg = std::dynamic_pointer_cast<BinIDMAPCfg>(config);
// if (search_cfg == nullptr) {
// KNOWHERE_THROW_MSG("not support this kind of config");
// }
// GETBINARYTENSOR(dataset)
auto dim = dataset->Get<int64_t>(meta::DIM);
auto rows = dataset->Get<int64_t>(meta::ROWS);
auto p_data = dataset->Get<const int64_t*>(meta::IDS);
auto elems = rows * config->k;
size_t p_id_size = sizeof(int64_t) * elems;
size_t p_dist_size = sizeof(float) * elems;
auto p_id = (int64_t*)malloc(p_id_size);
auto p_dist = (float*)malloc(p_dist_size);
auto* pdistances = (int32_t*)p_dist;
// index_->searchById(rows, (uint8_t*)p_data, config->k, pdistances, p_id, bitset_);
// auto blacklist = dataset->Get<faiss::ConcurrentBitsetPtr>("bitset");
index_->search_by_id(rows, p_data, config->k, pdistances, p_id, bitset_);
auto ret_ds = std::make_shared<Dataset>();
if (index_->metric_type == faiss::METRIC_Hamming) {
auto pf_dist = (float*)malloc(p_dist_size);
int32_t* pi_dist = (int32_t*)p_dist;
for (int i = 0; i < elems; i++) {
*(pf_dist + i) = (float)(*(pi_dist + i));
}
ret_ds->Set(meta::IDS, p_id);
ret_ds->Set(meta::DISTANCE, pf_dist);
free(p_dist);
} else {
ret_ds->Set(meta::IDS, p_id);
ret_ds->Set(meta::DISTANCE, p_dist);
}
return ret_ds;
}
void
BinaryIDMAP::SetBlacklist(faiss::ConcurrentBitsetPtr list) {
bitset_ = std::move(list);
}
void
BinaryIDMAP::GetBlacklist(faiss::ConcurrentBitsetPtr& list) {
list = bitset_;
}
} // namespace knowhere

View File

@ -11,6 +11,7 @@
#pragma once
#include <faiss/utils/ConcurrentBitset.h>
#include <memory>
#include <mutex>
#include <utility>
@ -41,6 +42,9 @@ class BinaryIDMAP : public VectorIndex, public FaissBaseBinaryIndex {
void
Add(const DatasetPtr& dataset, const Config& config) override;
void
AddWithoutId(const DatasetPtr& dataset, const Config& config);
void
Train(const Config& config);
@ -59,12 +63,27 @@ class BinaryIDMAP : public VectorIndex, public FaissBaseBinaryIndex {
const int64_t*
GetRawIds();
DatasetPtr
GetVectorById(const DatasetPtr& dataset, const Config& config) override;
DatasetPtr
SearchById(const DatasetPtr& dataset, const Config& config) override;
void
SetBlacklist(faiss::ConcurrentBitsetPtr list);
void
GetBlacklist(faiss::ConcurrentBitsetPtr& list);
protected:
virtual void
search_impl(int64_t n, const uint8_t* data, int64_t k, float* distances, int64_t* labels, const Config& cfg);
protected:
std::mutex mutex_;
private:
faiss::ConcurrentBitsetPtr bitset_ = nullptr;
};
using BinaryIDMAPPtr = std::shared_ptr<BinaryIDMAP>;

View File

@ -9,14 +9,15 @@
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include "knowhere/index/vector_index/IndexBinaryIVF.h"
#include <faiss/IndexBinaryFlat.h>
#include <faiss/IndexBinaryIVF.h>
#include <chrono>
#include "knowhere/adapter/VectorAdapter.h"
#include "knowhere/common/Exception.h"
#include "knowhere/index/vector_index/IndexBinaryIVF.h"
#include <chrono>
namespace knowhere {
@ -91,7 +92,10 @@ BinaryIVF::search_impl(int64_t n, const uint8_t* data, int64_t k, float* distanc
ivf_index->nprobe = params->nprobe;
int32_t* pdistances = (int32_t*)distances;
stdclock::time_point before = stdclock::now();
ivf_index->search(n, (uint8_t*)data, k, pdistances, labels);
// todo: remove static cast (zhiru)
static_cast<faiss::IndexBinary*>(index_.get())->search(n, (uint8_t*)data, k, pdistances, labels, bitset_);
stdclock::time_point after = stdclock::now();
double search_cost = (std::chrono::duration<double, std::micro>(after - before)).count();
KNOWHERE_LOG_DEBUG << "IVF search cost: " << search_cost
@ -153,4 +157,92 @@ BinaryIVF::Seal() {
// do nothing
}
DatasetPtr
BinaryIVF::GetVectorById(const DatasetPtr& dataset, const Config& config) {
if (!index_ || !index_->is_trained) {
KNOWHERE_THROW_MSG("index not initialize or trained");
}
// GETBINARYTENSOR(dataset)
// auto rows = dataset->Get<int64_t>(meta::ROWS);
auto p_data = dataset->Get<const int64_t*>(meta::IDS);
auto elems = dataset->Get<int64_t>(meta::DIM);
try {
size_t p_x_size = sizeof(uint8_t) * elems;
auto p_x = (uint8_t*)malloc(p_x_size);
index_->get_vector_by_id(1, p_data, p_x, bitset_);
auto ret_ds = std::make_shared<Dataset>();
ret_ds->Set(meta::TENSOR, p_x);
return ret_ds;
} catch (faiss::FaissException& e) {
KNOWHERE_THROW_MSG(e.what());
} catch (std::exception& e) {
KNOWHERE_THROW_MSG(e.what());
}
}
DatasetPtr
BinaryIVF::SearchById(const DatasetPtr& dataset, const Config& config) {
if (!index_ || !index_->is_trained) {
KNOWHERE_THROW_MSG("index not initialize or trained");
}
// auto search_cfg = std::dynamic_pointer_cast<IVFBinCfg>(config);
// if (search_cfg == nullptr) {
// KNOWHERE_THROW_MSG("not support this kind of config");
// }
// GETBINARYTENSOR(dataset)
auto rows = dataset->Get<int64_t>(meta::ROWS);
auto p_data = dataset->Get<const int64_t*>(meta::IDS);
try {
auto elems = rows * config->k;
size_t p_id_size = sizeof(int64_t) * elems;
size_t p_dist_size = sizeof(float) * elems;
auto p_id = (int64_t*)malloc(p_id_size);
auto p_dist = (float*)malloc(p_dist_size);
int32_t* pdistances = (int32_t*)p_dist;
// auto blacklist = dataset->Get<faiss::ConcurrentBitsetPtr>("bitset");
// index_->searchById(rows, (uint8_t*)p_data, config->k, pdistances, p_id, blacklist);
index_->search_by_id(rows, p_data, config->k, pdistances, p_id, bitset_);
auto ret_ds = std::make_shared<Dataset>();
if (index_->metric_type == faiss::METRIC_Hamming) {
auto pf_dist = (float*)malloc(p_dist_size);
int32_t* pi_dist = (int32_t*)p_dist;
for (int i = 0; i < elems; i++) {
*(pf_dist + i) = (float)(*(pi_dist + i));
}
ret_ds->Set(meta::IDS, p_id);
ret_ds->Set(meta::DISTANCE, pf_dist);
free(p_dist);
} else {
ret_ds->Set(meta::IDS, p_id);
ret_ds->Set(meta::DISTANCE, p_dist);
}
return ret_ds;
} catch (faiss::FaissException& e) {
KNOWHERE_THROW_MSG(e.what());
} catch (std::exception& e) {
KNOWHERE_THROW_MSG(e.what());
}
}
void
BinaryIVF::SetBlacklist(faiss::ConcurrentBitsetPtr list) {
bitset_ = std::move(list);
}
void
BinaryIVF::GetBlacklist(faiss::ConcurrentBitsetPtr& list) {
list = bitset_;
}
} // namespace knowhere

View File

@ -16,6 +16,7 @@
#include <utility>
#include <vector>
#include <faiss/utils/ConcurrentBitset.h>
#include "FaissBaseBinaryIndex.h"
#include "VectorIndex.h"
#include "faiss/IndexIVF.h"
@ -54,6 +55,18 @@ class BinaryIVF : public VectorIndex, public FaissBaseBinaryIndex {
int64_t
Dimension() override;
DatasetPtr
GetVectorById(const DatasetPtr& dataset, const Config& config);
DatasetPtr
SearchById(const DatasetPtr& dataset, const Config& config);
void
SetBlacklist(faiss::ConcurrentBitsetPtr list);
void
GetBlacklist(faiss::ConcurrentBitsetPtr& list);
protected:
virtual std::shared_ptr<faiss::IVFSearchParameters>
GenParams(const Config& config);
@ -63,6 +76,9 @@ class BinaryIVF : public VectorIndex, public FaissBaseBinaryIndex {
protected:
std::mutex mutex_;
private:
faiss::ConcurrentBitsetPtr bitset_ = nullptr;
};
using BinaryIVFIndexPtr = std::shared_ptr<BinaryIVF>;

View File

@ -27,6 +27,14 @@
namespace knowhere {
void
normalize_vector(float* data, float* norm_array, size_t dim) {
float norm = 0.0f;
for (int i = 0; i < dim; i++) norm += data[i] * data[i];
norm = 1.0f / (sqrtf(norm) + 1e-30f);
for (int i = 0; i < dim; i++) norm_array[i] = data[i] * norm;
}
BinarySet
IndexHNSW::Serialize() {
if (!index_) {
@ -59,6 +67,8 @@ IndexHNSW::Load(const BinarySet& index_binary) {
hnswlib::SpaceInterface<float>* space;
index_ = std::make_shared<hnswlib::HierarchicalNSW<float>>(space);
index_->loadIndex(reader);
normalize = index_->metric_type_ == 1 ? true : false; // 1 == InnerProduct
} catch (std::exception& e) {
KNOWHERE_THROW_MSG(e.what());
}
@ -69,6 +79,13 @@ IndexHNSW::Search(const DatasetPtr& dataset, const Config& config) {
if (!index_) {
KNOWHERE_THROW_MSG("index not initialize or trained");
}
auto search_cfg = std::dynamic_pointer_cast<HNSWCfg>(config);
if (search_cfg == nullptr) {
KNOWHERE_THROW_MSG("search conf is null");
}
index_->setEf(search_cfg->ef);
GETTENSOR(dataset)
size_t id_size = sizeof(int64_t) * config->k;
@ -77,18 +94,34 @@ IndexHNSW::Search(const DatasetPtr& dataset, const Config& config) {
auto p_dist = (float*)malloc(dist_size * rows);
using P = std::pair<float, int64_t>;
auto compare = [](P& v1, P& v2) { return v1.first < v2.first; };
auto compare = [](const P& v1, const P& v2) { return v1.first < v2.first; };
#pragma omp parallel for
for (unsigned int i = 0; i < rows; ++i) {
const float* single_query = p_data + i * dim;
std::vector<std::pair<float, int64_t>> ret = index_->searchKnn(single_query, config->k, compare);
std::vector<P> ret;
const float* single_query = p_data + i * Dimension();
// if (normalize) {
// std::vector<float> norm_vector(Dimension());
// normalize_vector((float*)(single_query), norm_vector.data(), Dimension());
// ret = index_->searchKnn((float*)(norm_vector.data()), config->k, compare);
// } else {
// ret = index_->searchKnn((float*)single_query, config->k, compare);
// }
ret = index_->searchKnn((float*)single_query, config->k, compare);
while (ret.size() < config->k) {
ret.push_back(std::make_pair(-1, -1));
}
std::vector<float> dist;
std::vector<int64_t> ids;
std::transform(ret.begin(), ret.end(), std::back_inserter(dist),
[](const std::pair<float, int64_t>& e) { return e.first; });
if (normalize) {
std::transform(ret.begin(), ret.end(), std::back_inserter(dist),
[](const std::pair<float, int64_t>& e) { return float(1 - e.first); });
} else {
std::transform(ret.begin(), ret.end(), std::back_inserter(dist),
[](const std::pair<float, int64_t>& e) { return e.first; });
}
std::transform(ret.begin(), ret.end(), std::back_inserter(ids),
[](const std::pair<float, int64_t>& e) { return e.second; });
@ -105,8 +138,8 @@ IndexHNSW::Search(const DatasetPtr& dataset, const Config& config) {
IndexModelPtr
IndexHNSW::Train(const DatasetPtr& dataset, const Config& config) {
auto build_cfg = std::dynamic_pointer_cast<HNSWCfg>(config);
if (build_cfg != nullptr) {
build_cfg->CheckValid(); // throw exception
if (build_cfg == nullptr) {
KNOWHERE_THROW_MSG("build conf is null");
}
GETTENSOR(dataset)
@ -116,6 +149,7 @@ IndexHNSW::Train(const DatasetPtr& dataset, const Config& config) {
space = new hnswlib::L2Space(dim);
} else if (config->metric_type == METRICTYPE::IP) {
space = new hnswlib::InnerProductSpace(dim);
normalize = true;
}
index_ = std::make_shared<hnswlib::HierarchicalNSW<float>>(space, rows, build_cfg->M, build_cfg->ef);
@ -133,12 +167,28 @@ IndexHNSW::Add(const DatasetPtr& dataset, const Config& config) {
GETTENSOR(dataset)
auto p_ids = dataset->Get<const int64_t*>(meta::IDS);
for (int i = 0; i < 1; i++) {
index_->addPoint((void*)(p_data + dim * i), p_ids[i]);
}
// if (normalize) {
// std::vector<float> ep_norm_vector(Dimension());
// normalize_vector((float*)(p_data), ep_norm_vector.data(), Dimension());
// index_->addPoint((void*)(ep_norm_vector.data()), p_ids[0]);
// #pragma omp parallel for
// for (int i = 1; i < rows; ++i) {
// std::vector<float> norm_vector(Dimension());
// normalize_vector((float*)(p_data + Dimension() * i), norm_vector.data(), Dimension());
// index_->addPoint((void*)(norm_vector.data()), p_ids[i]);
// }
// } else {
// index_->addPoint((void*)(p_data), p_ids[0]);
// #pragma omp parallel for
// for (int i = 1; i < rows; ++i) {
// index_->addPoint((void*)(p_data + Dimension() * i), p_ids[i]);
// }
// }
index_->addPoint((void*)(p_data), p_ids[0]);
#pragma omp parallel for
for (int i = 1; i < rows; i++) {
index_->addPoint((void*)(p_data + dim * i), p_ids[i]);
for (int i = 1; i < rows; ++i) {
index_->addPoint((void*)(p_data + Dimension() * i), p_ids[i]);
}
}

View File

@ -56,6 +56,7 @@ class IndexHNSW : public VectorIndex {
Dimension() override;
private:
bool normalize = false;
std::mutex mutex_;
std::shared_ptr<hnswlib::HierarchicalNSW<float>> index_;
};

View File

@ -9,10 +9,9 @@
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#include <faiss/AutoTune.h>
#include <faiss/IndexFlat.h>
#include <faiss/MetaIndexes.h>
#include <faiss/AutoTune.h>
#include <faiss/clone_index.h>
#include <faiss/index_factory.h>
#include <faiss/index_io.h>
@ -78,7 +77,7 @@ IDMAP::Search(const DatasetPtr& dataset, const Config& config) {
void
IDMAP::search_impl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& cfg) {
index_->search(n, (float*)data, k, distances, labels);
index_->search(n, (float*)data, k, distances, labels, bitset_);
}
void
@ -101,7 +100,8 @@ IDMAP::AddWithoutId(const DatasetPtr& dataset, const Config& config) {
}
std::lock_guard<std::mutex> lk(mutex_);
GETTENSOR(dataset)
auto rows = dataset->Get<int64_t>(meta::ROWS);
auto p_data = dataset->Get<const float*>(meta::TENSOR);
std::vector<int64_t> new_ids(rows);
for (int i = 0; i < rows; ++i) {
@ -185,4 +185,60 @@ IDMAP::Seal() {
// do nothing
}
DatasetPtr
IDMAP::GetVectorById(const DatasetPtr& dataset, const Config& config) {
if (!index_) {
KNOWHERE_THROW_MSG("index not initialize");
}
// GETTENSOR(dataset)
// auto rows = dataset->Get<int64_t>(meta::ROWS);
auto p_data = dataset->Get<const int64_t*>(meta::IDS);
auto elems = dataset->Get<int64_t>(meta::DIM);
size_t p_x_size = sizeof(float) * elems;
auto p_x = (float*)malloc(p_x_size);
index_->get_vector_by_id(1, p_data, p_x, bitset_);
auto ret_ds = std::make_shared<Dataset>();
ret_ds->Set(meta::TENSOR, p_x);
return ret_ds;
}
DatasetPtr
IDMAP::SearchById(const DatasetPtr& dataset, const Config& config) {
if (!index_) {
KNOWHERE_THROW_MSG("index not initialize");
}
// GETTENSOR(dataset)
auto rows = dataset->Get<int64_t>(meta::ROWS);
auto p_data = dataset->Get<const int64_t*>(meta::IDS);
auto elems = rows * config->k;
size_t p_id_size = sizeof(int64_t) * elems;
size_t p_dist_size = sizeof(float) * elems;
auto p_id = (int64_t*)malloc(p_id_size);
auto p_dist = (float*)malloc(p_dist_size);
// todo: enable search by id (zhiru)
// auto blacklist = dataset->Get<faiss::ConcurrentBitsetPtr>("bitset");
// index_->searchById(rows, (float*)p_data, config->k, p_dist, p_id, blacklist);
index_->search_by_id(rows, p_data, config->k, p_dist, p_id, bitset_);
auto ret_ds = std::make_shared<Dataset>();
ret_ds->Set(meta::IDS, p_id);
ret_ds->Set(meta::DISTANCE, p_dist);
return ret_ds;
}
void
IDMAP::SetBlacklist(faiss::ConcurrentBitsetPtr list) {
bitset_ = std::move(list);
}
void
IDMAP::GetBlacklist(faiss::ConcurrentBitsetPtr& list) {
list = bitset_;
}
} // namespace knowhere

View File

@ -13,6 +13,7 @@
#include "IndexIVF.h"
#include <faiss/utils/ConcurrentBitset.h>
#include <memory>
#include <utility>
@ -55,6 +56,7 @@ class IDMAP : public VectorIndex, public FaissBaseIndex {
VectorIndexPtr
CopyCpuToGpu(const int64_t& device_id, const Config& config);
void
Seal() override;
@ -64,12 +66,27 @@ class IDMAP : public VectorIndex, public FaissBaseIndex {
virtual const int64_t*
GetRawIds();
DatasetPtr
GetVectorById(const DatasetPtr& dataset, const Config& config);
DatasetPtr
SearchById(const DatasetPtr& dataset, const Config& config);
void
SetBlacklist(faiss::ConcurrentBitsetPtr list);
void
GetBlacklist(faiss::ConcurrentBitsetPtr& list);
protected:
virtual void
search_impl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& cfg);
protected:
std::mutex mutex_;
private:
faiss::ConcurrentBitsetPtr bitset_ = nullptr;
};
using IDMAPPtr = std::shared_ptr<IDMAP>;

View File

@ -217,8 +217,10 @@ IVF::GenGraph(const float* data, const int64_t& k, Graph& graph, const Config& c
void
IVF::search_impl(int64_t n, const float* data, int64_t k, float* distances, int64_t* labels, const Config& cfg) {
auto params = GenParams(cfg);
auto ivf_index = dynamic_cast<faiss::IndexIVF*>(index_.get());
ivf_index->nprobe = params->nprobe;
stdclock::time_point before = stdclock::now();
faiss::ivflib::search_with_parameters(index_.get(), n, (float*)data, k, distances, labels, params.get());
ivf_index->search(n, (float*)data, k, distances, labels, bitset_);
stdclock::time_point after = stdclock::now();
double search_cost = (std::chrono::duration<double, std::micro>(after - before)).count();
KNOWHERE_LOG_DEBUG << "IVF search cost: " << search_cost
@ -271,6 +273,99 @@ IVF::Seal() {
SealImpl();
}
DatasetPtr
IVF::GetVectorById(const DatasetPtr& dataset, const Config& config) {
if (!index_ || !index_->is_trained) {
KNOWHERE_THROW_MSG("index not initialize or trained");
}
auto search_cfg = std::dynamic_pointer_cast<IVFCfg>(config);
if (search_cfg == nullptr) {
KNOWHERE_THROW_MSG("not support this kind of config");
}
// auto rows = dataset->Get<int64_t>(meta::ROWS);
auto p_data = dataset->Get<const int64_t*>(meta::IDS);
auto elems = dataset->Get<int64_t>(meta::DIM);
try {
size_t p_x_size = sizeof(float) * elems;
auto p_x = (float*)malloc(p_x_size);
auto index_ivf = std::static_pointer_cast<faiss::IndexIVF>(index_);
index_ivf->get_vector_by_id(1, p_data, p_x, bitset_);
auto ret_ds = std::make_shared<Dataset>();
ret_ds->Set(meta::TENSOR, p_x);
return ret_ds;
} catch (faiss::FaissException& e) {
KNOWHERE_THROW_MSG(e.what());
} catch (std::exception& e) {
KNOWHERE_THROW_MSG(e.what());
}
}
DatasetPtr
IVF::SearchById(const DatasetPtr& dataset, const Config& config) {
if (!index_ || !index_->is_trained) {
KNOWHERE_THROW_MSG("index not initialize or trained");
}
auto search_cfg = std::dynamic_pointer_cast<IVFCfg>(config);
if (search_cfg == nullptr) {
KNOWHERE_THROW_MSG("not support this kind of config");
}
auto rows = dataset->Get<int64_t>(meta::ROWS);
auto p_data = dataset->Get<const int64_t*>(meta::IDS);
try {
auto elems = rows * search_cfg->k;
size_t p_id_size = sizeof(int64_t) * elems;
size_t p_dist_size = sizeof(float) * elems;
auto p_id = (int64_t*)malloc(p_id_size);
auto p_dist = (float*)malloc(p_dist_size);
// todo: enable search by id (zhiru)
// auto blacklist = dataset->Get<faiss::ConcurrentBitsetPtr>("bitset");
auto index_ivf = std::static_pointer_cast<faiss::IndexIVF>(index_);
index_ivf->search_by_id(rows, p_data, search_cfg->k, p_dist, p_id, bitset_);
// std::stringstream ss_res_id, ss_res_dist;
// for (int i = 0; i < 10; ++i) {
// printf("%llu", res_ids[i]);
// printf("\n");
// printf("%.6f", res_dis[i]);
// printf("\n");
// ss_res_id << res_ids[i] << " ";
// ss_res_dist << res_dis[i] << " ";
// }
// std::cout << std::endl << "after search: " << std::endl;
// std::cout << ss_res_id.str() << std::endl;
// std::cout << ss_res_dist.str() << std::endl << std::endl;
auto ret_ds = std::make_shared<Dataset>();
ret_ds->Set(meta::IDS, p_id);
ret_ds->Set(meta::DISTANCE, p_dist);
return ret_ds;
} catch (faiss::FaissException& e) {
KNOWHERE_THROW_MSG(e.what());
} catch (std::exception& e) {
KNOWHERE_THROW_MSG(e.what());
}
}
void
IVF::SetBlacklist(faiss::ConcurrentBitsetPtr list) {
bitset_ = std::move(list);
}
void
IVF::GetBlacklist(faiss::ConcurrentBitsetPtr& list) {
list = bitset_;
}
IVFIndexModel::IVFIndexModel(std::shared_ptr<faiss::Index> index) : FaissBaseIndex(std::move(index)) {
}

View File

@ -19,6 +19,7 @@
#include "FaissBaseIndex.h"
#include "VectorIndex.h"
#include "faiss/IndexIVF.h"
#include "faiss/utils/ConcurrentBitset.h"
namespace knowhere {
@ -71,6 +72,18 @@ class IVF : public VectorIndex, public FaissBaseIndex {
virtual VectorIndexPtr
CopyCpuToGpu(const int64_t& device_id, const Config& config);
DatasetPtr
GetVectorById(const DatasetPtr& dataset, const Config& config) override;
DatasetPtr
SearchById(const DatasetPtr& dataset, const Config& config) override;
void
SetBlacklist(faiss::ConcurrentBitsetPtr list);
void
GetBlacklist(faiss::ConcurrentBitsetPtr& list);
protected:
virtual std::shared_ptr<faiss::IVFSearchParameters>
GenParams(const Config& config);
@ -83,6 +96,9 @@ class IVF : public VectorIndex, public FaissBaseIndex {
protected:
std::mutex mutex_;
private:
faiss::ConcurrentBitsetPtr bitset_ = nullptr;
};
using IVFIndexPtr = std::shared_ptr<IVF>;

View File

@ -12,12 +12,14 @@
#pragma once
#include <memory>
#include <vector>
#include "knowhere/common/Config.h"
#include "knowhere/common/Dataset.h"
#include "knowhere/index/Index.h"
#include "knowhere/index/preprocessor/Preprocessor.h"
#include "knowhere/index/vector_index/helpers/IndexParameter.h"
#include "segment/Types.h"
namespace knowhere {
@ -36,6 +38,16 @@ class VectorIndex : public Index {
return nullptr;
}
virtual DatasetPtr
GetVectorById(const DatasetPtr& dataset, const Config& config) {
return nullptr;
}
virtual DatasetPtr
SearchById(const DatasetPtr& dataset, const Config& config) {
return nullptr;
}
virtual void
Add(const DatasetPtr& dataset, const Config& config) = 0;
@ -51,6 +63,20 @@ class VectorIndex : public Index {
virtual int64_t
Dimension() = 0;
virtual const std::vector<milvus::segment::doc_id_t>&
GetUids() const {
return uids_;
}
virtual void
SetUids(std::vector<milvus::segment::doc_id_t>& uids) {
uids_.clear();
uids_.swap(uids);
}
private:
std::vector<milvus::segment::doc_id_t> uids_;
};
} // namespace knowhere

View File

@ -9,8 +9,6 @@
// is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express
// or implied. See the License for the specific language governing permissions and limitations under the License.
#pragma once
#include <immintrin.h>
#include "knowhere/index/vector_index/nsg/Distance.h"

View File

@ -41,13 +41,19 @@ void Index::assign (idx_t n, const float * x, idx_t * labels, idx_t k)
search (n, x, k, distances, labels);
}
void Index::add_with_ids(
idx_t /*n*/,
const float* /*x*/,
const idx_t* /*xids*/) {
void Index::add_with_ids(idx_t n, const float* x, const idx_t* xids) {
FAISS_THROW_MSG ("add_with_ids not implemented for this type of index");
}
void Index::get_vector_by_id (idx_t n, const idx_t *xid, float *x, ConcurrentBitsetPtr bitset) {
FAISS_THROW_MSG ("get_vector_by_id not implemented for this type of index");
}
void Index::search_by_id (idx_t n, const idx_t *xid, idx_t k, float *distances, idx_t *labels,
ConcurrentBitsetPtr bitset) {
FAISS_THROW_MSG ("search_by_id not implemented for this type of index");
}
size_t Index::remove_ids(const IDSelector& /*sel*/) {
FAISS_THROW_MSG ("remove_ids not implemented for this type of index");
return -1;

View File

@ -16,6 +16,8 @@
#include <string>
#include <sstream>
#include <faiss/utils/ConcurrentBitset.h>
#define FAISS_VERSION_MAJOR 1
#define FAISS_VERSION_MINOR 6
#define FAISS_VERSION_PATCH 0
@ -132,9 +134,34 @@ struct Index {
* @param x input vectors to search, size n * d
* @param labels output labels of the NNs, size n*k
* @param distances output pairwise distances, size n*k
* @param bitset flags to check the validity of vectors
*/
virtual void search (idx_t n, const float *x, idx_t k,
float *distances, idx_t *labels) const = 0;
virtual void search (idx_t n, const float *x, idx_t k, float *distances, idx_t *labels,
ConcurrentBitsetPtr bitset = nullptr) const = 0;
/** query n raw vectors from the index by ids.
*
* return n raw vectors.
*
* @param n input num of xid
* @param xid input labels of the NNs, size n
* @param x output raw vectors, size n * d
* @param bitset flags to check the validity of vectors
*/
virtual void get_vector_by_id (idx_t n, const idx_t *xid, float *x, ConcurrentBitsetPtr bitset = nullptr);
/** query n vectors of dimension d to the index by ids.
*
* return at most k vectors. If there are not enough results for a
* query, the result array is padded with -1s.
*
* @param xid input ids to search, size n
* @param labels output labels of the NNs, size n*k
* @param distances output pairwise distances, size n*k
* @param bitset flags to check the validity of vectors
*/
virtual void search_by_id (idx_t n, const idx_t *xid, idx_t k, float *distances, idx_t *labels,
ConcurrentBitsetPtr bitset = nullptr);
/** query n vectors of dimension d to the index.
*

View File

@ -166,7 +166,8 @@ void Index2Layer::search(
const float* /*x*/,
idx_t /*k*/,
float* /*distances*/,
idx_t* /*labels*/) const {
idx_t* /*labels*/,
ConcurrentBitsetPtr bitset) const {
FAISS_THROW_MSG("not implemented");
}

View File

@ -60,7 +60,8 @@ struct Index2Layer: Index {
const float* x,
idx_t k,
float* distances,
idx_t* labels) const override;
idx_t* labels,
ConcurrentBitsetPtr bitset = nullptr) const override;
void reconstruct_n(idx_t i0, idx_t ni, float* recons) const override;

View File

@ -35,6 +35,15 @@ void IndexBinary::add_with_ids(idx_t, const uint8_t *, const idx_t *) {
FAISS_THROW_MSG("add_with_ids not implemented for this type of index");
}
void IndexBinary::get_vector_by_id (idx_t n, const idx_t *xid, uint8_t *x, ConcurrentBitsetPtr bitset) {
FAISS_THROW_MSG("get_vector_by_id not implemented for this type of index");
}
void IndexBinary::search_by_id (idx_t n, const idx_t *xid, idx_t k, int32_t *distances, idx_t *labels,
ConcurrentBitsetPtr bitset) {
FAISS_THROW_MSG("search_by_id not implemented for this type of index");
}
size_t IndexBinary::remove_ids(const IDSelector&) {
FAISS_THROW_MSG("remove_ids not implemented for this type of index");
return 0;

View File

@ -93,11 +93,37 @@ struct IndexBinary {
* @param x input vectors to search, size n * d / 8
* @param labels output labels of the NNs, size n*k
* @param distances output pairwise distances, size n*k
* @param bitset flags to check the validity of vectors
*/
virtual void search(idx_t n, const uint8_t *x, idx_t k,
int32_t *distances, idx_t *labels) const = 0;
virtual void search (idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels,
ConcurrentBitsetPtr bitset = nullptr) const = 0;
/** Query n vectors of dimension d to the index.
/** query n raw vectors from the index by ids.
*
* return n raw vectors.
*
* @param n input num of xid
* @param xid input labels of the NNs, size n
* @param x output raw vectors, size n * d
* @param bitset flags to check the validity of vectors
*/
virtual void get_vector_by_id (idx_t n, const idx_t *xid, uint8_t *x, ConcurrentBitsetPtr bitset = nullptr);
/** query n vectors of dimension d to the index by ids.
*
* return at most k vectors. If there are not enough results for a
* query, the result array is padded with -1s.
*
* @param xid input ids to search, size n
* @param labels output labels of the NNs, size n*k
* @param distances output pairwise distances, size n*k
* @param bitset flags to check the validity of vectors
*/
virtual void search_by_id (idx_t n, const idx_t *xid, idx_t k, int32_t *distances, idx_t *labels,
ConcurrentBitsetPtr bitset = nullptr);
/** Query n vectors of dimension d to the index.
*
* return all vectors with distance < radius. Note that many
* indexes do not implement the range_search (only the k-NN search

View File

@ -39,57 +39,57 @@ void IndexBinaryFlat::reset() {
}
void IndexBinaryFlat::search(idx_t n, const uint8_t *x, idx_t k,
int32_t *distances, idx_t *labels) const {
const idx_t block_size = query_batch_size;
if (metric_type == METRIC_Jaccard || metric_type == METRIC_Tanimoto) {
float *D = new float[k * n];
for (idx_t s = 0; s < n; s += block_size) {
idx_t nn = block_size;
if (s + block_size > n) {
nn = n - s;
}
int32_t *distances, idx_t *labels, ConcurrentBitsetPtr bitset) const {
const idx_t block_size = query_batch_size;
if (metric_type == METRIC_Jaccard || metric_type == METRIC_Tanimoto) {
float *D = new float[k * n];
for (idx_t s = 0; s < n; s += block_size) {
idx_t nn = block_size;
if (s + block_size > n) {
nn = n - s;
}
if (use_heap) {
// We see the distances and labels as heaps.
if (use_heap) {
// We see the distances and labels as heaps.
float_maxheap_array_t res = {
size_t(nn), size_t(k), labels + s * k, D + s * k
};
float_maxheap_array_t res = {
size_t(nn), size_t(k), labels + s * k, D + s * k
};
jaccard_knn_hc(&res, x + s * code_size, xb.data(), ntotal, code_size,
/* ordered = */ true);
jaccard_knn_hc(&res, x + s * code_size, xb.data(), ntotal, code_size,
/* ordered = */ true, bitset);
} else {
FAISS_THROW_MSG("tanimoto_knn_mc not implemented");
}
}
if (metric_type == METRIC_Tanimoto) {
for (int i = 0; i < k * n; i++) {
D[i] = -log2(1-D[i]);
}
}
memcpy(distances, D, sizeof(float) * n * k);
delete [] D;
} else {
for (idx_t s = 0; s < n; s += block_size) {
idx_t nn = block_size;
if (s + block_size > n) {
nn = n - s;
}
if (use_heap) {
// We see the distances and labels as heaps.
int_maxheap_array_t res = {
size_t(nn), size_t(k), labels + s * k, distances + s * k
};
} else {
FAISS_THROW_MSG("tanimoto_knn_mc not implemented");
}
}
if (metric_type == METRIC_Tanimoto) {
for (int i = 0; i < k * n; i++) {
D[i] = -log2(1-D[i]);
}
}
memcpy(distances, D, sizeof(float) * n * k);
delete [] D;
} else {
for (idx_t s = 0; s < n; s += block_size) {
idx_t nn = block_size;
if (s + block_size > n) {
nn = n - s;
}
if (use_heap) {
// We see the distances and labels as heaps.
int_maxheap_array_t res = {
size_t(nn), size_t(k), labels + s * k, distances + s * k
};
hammings_knn_hc(&res, x + s * code_size, xb.data(), ntotal, code_size,
/* ordered = */ true);
} else {
hammings_knn_mc(x + s * code_size, xb.data(), nn, ntotal, k, code_size,
distances + s * k, labels + s * k);
}
}
}
hammings_knn_hc(&res, x + s * code_size, xb.data(), ntotal, code_size,
/* ordered = */ true, bitset);
} else {
hammings_knn_mc(x + s * code_size, xb.data(), nn, ntotal, k, code_size,
distances + s * k, labels + s * k, bitset);
}
}
}
}
size_t IndexBinaryFlat::remove_ids(const IDSelector& sel) {

View File

@ -37,8 +37,8 @@ struct IndexBinaryFlat : IndexBinary {
void reset() override;
void search(idx_t n, const uint8_t *x, idx_t k,
int32_t *distances, idx_t *labels) const override;
void search (idx_t n, const uint8_t *x, idx_t k,
int32_t *distances, idx_t *labels, ConcurrentBitsetPtr bitset = nullptr) const override;
void reconstruct(idx_t key, uint8_t *recons) const override;

View File

@ -50,7 +50,7 @@ void IndexBinaryFromFloat::reset() {
}
void IndexBinaryFromFloat::search(idx_t n, const uint8_t *x, idx_t k,
int32_t *distances, idx_t *labels) const {
int32_t *distances, idx_t *labels, ConcurrentBitsetPtr bitset) const {
constexpr idx_t bs = 32768;
std::unique_ptr<float[]> xf(new float[bs * d]);
std::unique_ptr<float[]> df(new float[bs * k]);

View File

@ -41,7 +41,7 @@ struct IndexBinaryFromFloat : IndexBinary {
void reset() override;
void search(idx_t n, const uint8_t *x, idx_t k,
int32_t *distances, idx_t *labels) const override;
int32_t *distances, idx_t *labels, ConcurrentBitsetPtr bitset = nullptr) const override;
void train(idx_t n, const uint8_t *x) override;
};

View File

@ -196,7 +196,7 @@ void IndexBinaryHNSW::train(idx_t n, const uint8_t *x)
}
void IndexBinaryHNSW::search(idx_t n, const uint8_t *x, idx_t k,
int32_t *distances, idx_t *labels) const
int32_t *distances, idx_t *labels, ConcurrentBitsetPtr bitset) const
{
#pragma omp parallel
{

View File

@ -45,7 +45,7 @@ struct IndexBinaryHNSW : IndexBinary {
/// entry point for search
void search(idx_t n, const uint8_t *x, idx_t k,
int32_t *distances, idx_t *labels) const override;
int32_t *distances, idx_t *labels, ConcurrentBitsetPtr bitset = nullptr) const override;
void reconstruct(idx_t key, uint8_t* recons) const override;

View File

@ -146,8 +146,8 @@ void IndexBinaryIVF::make_direct_map(bool new_maintain_direct_map) {
maintain_direct_map = new_maintain_direct_map;
}
void IndexBinaryIVF::search(idx_t n, const uint8_t *x, idx_t k,
int32_t *distances, idx_t *labels) const {
void IndexBinaryIVF::search(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels,
ConcurrentBitsetPtr bitset) const {
std::unique_ptr<idx_t[]> idx(new idx_t[n * nprobe]);
std::unique_ptr<int32_t[]> coarse_dis(new int32_t[n * nprobe]);
@ -159,10 +159,40 @@ void IndexBinaryIVF::search(idx_t n, const uint8_t *x, idx_t k,
invlists->prefetch_lists(idx.get(), n * nprobe);
search_preassigned(n, x, k, idx.get(), coarse_dis.get(),
distances, labels, false);
distances, labels, false, nullptr, bitset);
indexIVF_stats.search_time += getmillisecs() - t0;
}
void IndexBinaryIVF::get_vector_by_id(idx_t n, const idx_t *xid, uint8_t *x, ConcurrentBitsetPtr bitset) {
if (!maintain_direct_map) {
make_direct_map(true);
}
/* only get vector by 1 id */
FAISS_ASSERT(n == 1);
if (!bitset || !bitset->test(xid[0])) {
reconstruct(xid[0], x + 0 * d);
} else {
memset(x, UINT8_MAX, d * sizeof(uint8_t));
}
}
void IndexBinaryIVF::search_by_id (idx_t n, const idx_t *xid, idx_t k, int32_t *distances, idx_t *labels,
ConcurrentBitsetPtr bitset) {
if (!maintain_direct_map) {
make_direct_map(true);
}
auto x = new uint8_t[n * d];
for (idx_t i = 0; i < n; ++i) {
reconstruct(xid[i], x + i * d);
}
search(n, x, k, distances, labels, bitset);
delete []x;
}
void IndexBinaryIVF::reconstruct(idx_t key, uint8_t *recons) const {
FAISS_THROW_IF_NOT_MSG(direct_map.size() == ntotal,
"direct map is not initialized");
@ -376,18 +406,22 @@ struct IVFBinaryScannerL2: BinaryInvertedListScanner {
const uint8_t *codes,
const idx_t *ids,
int32_t *simi, idx_t *idxi,
size_t k) const override
size_t k,
ConcurrentBitsetPtr bitset) const override
{
using C = CMax<int32_t, idx_t>;
size_t nup = 0;
for (size_t j = 0; j < n; j++) {
uint32_t dis = hc.hamming (codes);
if (dis < simi[0]) {
heap_pop<C> (k, simi, idxi);
idx_t id = store_pairs ? (list_no << 32 | j) : ids[j];
heap_push<C> (k, simi, idxi, dis, id);
nup++;
if (!bitset || !bitset->test(ids[j])) {
uint32_t dis = hc.hamming (codes);
if (dis < simi[0]) {
heap_pop<C> (k, simi, idxi);
idx_t id = store_pairs ? (list_no << 32 | j) : ids[j];
heap_push<C> (k, simi, idxi, dis, id);
nup++;
}
}
codes += code_size;
}
@ -422,18 +456,22 @@ struct IVFBinaryScannerJaccard: BinaryInvertedListScanner {
const uint8_t *codes,
const idx_t *ids,
int32_t *simi, idx_t *idxi,
size_t k) const override
size_t k,
ConcurrentBitsetPtr bitset = nullptr) const override
{
using C = CMax<float, idx_t>;
float* psimi = (float*)simi;
size_t nup = 0;
for (size_t j = 0; j < n; j++) {
float dis = hc.jaccard (codes);
if (dis < psimi[0]) {
heap_pop<C> (k, psimi, idxi);
idx_t id = store_pairs ? (list_no << 32 | j) : ids[j];
heap_push<C> (k, psimi, idxi, dis, id);
nup++;
if(!bitset || !bitset->test(ids[j])){
float dis = hc.jaccard (codes);
if (dis < psimi[0]) {
heap_pop<C> (k, psimi, idxi);
idx_t id = store_pairs ? (list_no << 32 | j) : ids[j];
heap_push<C> (k, psimi, idxi, dis, id);
nup++;
}
}
codes += code_size;
}
@ -496,7 +534,8 @@ void search_knn_hamming_heap(const IndexBinaryIVF& ivf,
const int32_t * coarse_dis,
int32_t *distances, idx_t *labels,
bool store_pairs,
const IVFSearchParameters *params)
const IVFSearchParameters *params,
ConcurrentBitsetPtr bitset = nullptr)
{
long nprobe = params ? params->nprobe : ivf.nprobe;
long max_codes = params ? params->max_codes : ivf.max_codes;
@ -556,7 +595,7 @@ void search_knn_hamming_heap(const IndexBinaryIVF& ivf,
}
nheap += scanner->scan_codes (list_size, scodes.get(),
ids, simi, idxi, k);
ids, simi, idxi, k, bitset);
nscan += list_size;
if (max_codes && nscan >= max_codes)
@ -588,7 +627,8 @@ void search_knn_jaccard_heap(const IndexBinaryIVF& ivf,
const float * coarse_dis,
float *distances, idx_t *labels,
bool store_pairs,
const IVFSearchParameters *params)
const IVFSearchParameters *params,
ConcurrentBitsetPtr bitset = nullptr)
{
long nprobe = params ? params->nprobe : ivf.nprobe;
long max_codes = params ? params->max_codes : ivf.max_codes;
@ -643,7 +683,7 @@ void search_knn_jaccard_heap(const IndexBinaryIVF& ivf,
}
nheap += scanner->scan_codes (list_size, scodes.get(),
ids, (int32_t*)simi, idxi, k);
ids, (int32_t*)simi, idxi, k, bitset);
nscan += list_size;
if (max_codes && nscan >= max_codes)
@ -671,7 +711,8 @@ void search_knn_hamming_count(const IndexBinaryIVF& ivf,
int k,
int32_t *distances,
idx_t *labels,
const IVFSearchParameters *params) {
const IVFSearchParameters *params,
ConcurrentBitsetPtr bitset = nullptr) {
const int nBuckets = ivf.d + 1;
std::vector<int> all_counters(nx * nBuckets, 0);
std::unique_ptr<idx_t[]> all_ids_per_dis(new idx_t[nx * nBuckets * k]);
@ -719,10 +760,12 @@ void search_knn_hamming_count(const IndexBinaryIVF& ivf,
: ivf.invlists->get_ids(key);
for (size_t j = 0; j < list_size; j++) {
const uint8_t * yj = list_vecs + ivf.code_size * j;
if(!bitset || !bitset->test(ids[j])){
const uint8_t * yj = list_vecs + ivf.code_size * j;
idx_t id = store_pairs ? (key << 32 | j) : ids[j];
csi.update_counter(yj, id);
idx_t id = store_pairs ? (key << 32 | j) : ids[j];
csi.update_counter(yj, id);
}
}
if (ids)
ivf.invlists->release_ids (key, ids);
@ -764,12 +807,13 @@ void search_knn_hamming_count_1 (
int k,
int32_t *distances,
idx_t *labels,
const IVFSearchParameters *params) {
const IVFSearchParameters *params,
ConcurrentBitsetPtr bitset = nullptr) {
switch (ivf.code_size) {
#define HANDLE_CS(cs) \
case cs: \
search_knn_hamming_count<HammingComputer ## cs, store_pairs>( \
ivf, nx, x, keys, k, distances, labels, params); \
ivf, nx, x, keys, k, distances, labels, params, bitset); \
break;
HANDLE_CS(4);
HANDLE_CS(8);
@ -781,13 +825,13 @@ void search_knn_hamming_count_1 (
default:
if (ivf.code_size % 8 == 0) {
search_knn_hamming_count<HammingComputerM8, store_pairs>
(ivf, nx, x, keys, k, distances, labels, params);
(ivf, nx, x, keys, k, distances, labels, params, bitset);
} else if (ivf.code_size % 4 == 0) {
search_knn_hamming_count<HammingComputerM4, store_pairs>
(ivf, nx, x, keys, k, distances, labels, params);
(ivf, nx, x, keys, k, distances, labels, params, bitset);
} else {
search_knn_hamming_count<HammingComputerDefault, store_pairs>
(ivf, nx, x, keys, k, distances, labels, params);
(ivf, nx, x, keys, k, distances, labels, params, bitset);
}
break;
}
@ -821,7 +865,8 @@ void IndexBinaryIVF::search_preassigned(idx_t n, const uint8_t *x, idx_t k,
const int32_t * coarse_dis,
int32_t *distances, idx_t *labels,
bool store_pairs,
const IVFSearchParameters *params
const IVFSearchParameters *params,
ConcurrentBitsetPtr bitset
) const {
if (metric_type == METRIC_Jaccard || metric_type == METRIC_Tanimoto) {
@ -831,7 +876,7 @@ void IndexBinaryIVF::search_preassigned(idx_t n, const uint8_t *x, idx_t k,
memcpy(c_dis, coarse_dis, sizeof(float) * n * nprobe);
search_knn_jaccard_heap (*this, n, x, k, idx, c_dis ,
D, labels, store_pairs,
params);
params, bitset);
if (metric_type == METRIC_Tanimoto) {
for (int i = 0; i < k * n; i++) {
D[i] = -log2(1-D[i]);
@ -847,14 +892,14 @@ void IndexBinaryIVF::search_preassigned(idx_t n, const uint8_t *x, idx_t k,
if (use_heap) {
search_knn_hamming_heap (*this, n, x, k, idx, coarse_dis,
distances, labels, store_pairs,
params);
params, bitset);
} else {
if (store_pairs) {
search_knn_hamming_count_1<true>
(*this, n, x, idx, k, distances, labels, params);
(*this, n, x, idx, k, distances, labels, params, bitset);
} else {
search_knn_hamming_count_1<false>
(*this, n, x, idx, k, distances, labels, params);
(*this, n, x, idx, k, distances, labels, params, bitset);
}
}
}

View File

@ -105,7 +105,8 @@ struct IndexBinaryIVF : IndexBinary {
const int32_t *centroid_dis,
int32_t *distances, idx_t *labels,
bool store_pairs,
const IVFSearchParameters *params=nullptr
const IVFSearchParameters *params=nullptr,
ConcurrentBitsetPtr bitset = nullptr
) const;
virtual BinaryInvertedListScanner *get_InvertedListScanner (
@ -115,8 +116,14 @@ struct IndexBinaryIVF : IndexBinary {
bool store_pairs=false) const;
/** assign the vectors, then call search_preassign */
virtual void search(idx_t n, const uint8_t *x, idx_t k,
int32_t *distances, idx_t *labels) const override;
void search(idx_t n, const uint8_t *x, idx_t k, int32_t *distances, idx_t *labels,
ConcurrentBitsetPtr bitset = nullptr) const override;
/** get raw vectors by ids */
void get_vector_by_id(idx_t n, const idx_t *xid, uint8_t *x, ConcurrentBitsetPtr bitset = nullptr) override;
void search_by_id (idx_t n, const idx_t *xid, idx_t k, int32_t *distances, idx_t *labels,
ConcurrentBitsetPtr bitset = nullptr) override;
void reconstruct(idx_t key, uint8_t *recons) const override;
@ -204,7 +211,8 @@ struct BinaryInvertedListScanner {
const uint8_t *codes,
const idx_t *ids,
int32_t *distances, idx_t *labels,
size_t k) const = 0;
size_t k,
ConcurrentBitsetPtr bitset = nullptr) const = 0;
virtual ~BinaryInvertedListScanner () {}

View File

@ -38,30 +38,29 @@ void IndexFlat::reset() {
ntotal = 0;
}
void IndexFlat::search (idx_t n, const float *x, idx_t k,
float *distances, idx_t *labels) const
void IndexFlat::search(idx_t n, const float* x, idx_t k, float* distances, idx_t* labels,
ConcurrentBitsetPtr bitset) const
{
// we see the distances and labels as heaps
if (metric_type == METRIC_INNER_PRODUCT) {
float_minheap_array_t res = {
size_t(n), size_t(k), labels, distances};
knn_inner_product (x, xb.data(), d, n, ntotal, &res);
size_t(n), size_t(k), labels, distances};
knn_inner_product (x, xb.data(), d, n, ntotal, &res, bitset);
} else if (metric_type == METRIC_L2) {
float_maxheap_array_t res = {
size_t(n), size_t(k), labels, distances};
knn_L2sqr (x, xb.data(), d, n, ntotal, &res);
size_t(n), size_t(k), labels, distances};
knn_L2sqr (x, xb.data(), d, n, ntotal, &res, bitset);
} else if (metric_type == METRIC_Jaccard) {
float_maxheap_array_t res = {
size_t(n), size_t(k), labels, distances};
knn_jaccard (x, xb.data(), d, n, ntotal, &res);
knn_jaccard (x, xb.data(), d, n, ntotal, &res, bitset);
} else {
float_maxheap_array_t res = {
size_t(n), size_t(k), labels, distances};
size_t(n), size_t(k), labels, distances};
knn_extra_metrics (x, xb.data(), d, n, ntotal,
metric_type, metric_arg,
&res);
&res, bitset);
}
}
@ -245,7 +244,8 @@ void IndexFlatL2BaseShift::search (
const float *x,
idx_t k,
float *distances,
idx_t *labels) const
idx_t *labels,
ConcurrentBitsetPtr bitset) const
{
FAISS_THROW_IF_NOT (shift.size() == ntotal);
@ -328,7 +328,8 @@ static void reorder_2_heaps (
void IndexRefineFlat::search (
idx_t n, const float *x, idx_t k,
float *distances, idx_t *labels) const
float *distances, idx_t *labels,
ConcurrentBitsetPtr bitset) const
{
FAISS_THROW_IF_NOT (is_trained);
idx_t k_base = idx_t (k * k_factor);
@ -421,7 +422,8 @@ void IndexFlat1D::search (
const float *x,
idx_t k,
float *distances,
idx_t *labels) const
idx_t *labels,
ConcurrentBitsetPtr bitset) const
{
FAISS_THROW_IF_NOT_MSG (perm.size() == ntotal,
"Call update_permutation before search");

View File

@ -33,7 +33,8 @@ struct IndexFlat: Index {
const float* x,
idx_t k,
float* distances,
idx_t* labels) const override;
idx_t* labels,
ConcurrentBitsetPtr bitset = nullptr) const override;
void range_search(
idx_t n,
@ -103,7 +104,8 @@ struct IndexFlatL2BaseShift: IndexFlatL2 {
const float* x,
idx_t k,
float* distances,
idx_t* labels) const override;
idx_t* labels,
ConcurrentBitsetPtr bitset = nullptr) const override;
};
@ -138,7 +140,8 @@ struct IndexRefineFlat: Index {
const float* x,
idx_t k,
float* distances,
idx_t* labels) const override;
idx_t* labels,
ConcurrentBitsetPtr bitset = nullptr) const override;
~IndexRefineFlat() override;
};
@ -166,7 +169,8 @@ struct IndexFlat1D:IndexFlatL2 {
const float* x,
idx_t k,
float* distances,
idx_t* labels) const override;
idx_t* labels,
ConcurrentBitsetPtr bitset = nullptr) const override;
};

View File

@ -242,7 +242,7 @@ void IndexHNSW::train(idx_t n, const float* x)
}
void IndexHNSW::search (idx_t n, const float *x, idx_t k,
float *distances, idx_t *labels) const
float *distances, idx_t *labels, ConcurrentBitsetPtr bitset) const
{
FAISS_THROW_IF_NOT_MSG(storage,
@ -961,7 +961,7 @@ int search_from_candidates_2(const HNSW & hnsw,
} // namespace
void IndexHNSW2Level::search (idx_t n, const float *x, idx_t k,
float *distances, idx_t *labels) const
float *distances, idx_t *labels, ConcurrentBitsetPtr bitset) const
{
if (dynamic_cast<const Index2Layer*>(storage)) {
IndexHNSW::search (n, x, k, distances, labels);

View File

@ -91,7 +91,7 @@ struct IndexHNSW : Index {
/// entry point for search
void search (idx_t n, const float *x, idx_t k,
float *distances, idx_t *labels) const override;
float *distances, idx_t *labels, ConcurrentBitsetPtr bitset = nullptr) const override;
void reconstruct(idx_t key, float* recons) const override;
@ -162,7 +162,7 @@ struct IndexHNSW2Level : IndexHNSW {
/// entry point for search
void search (idx_t n, const float *x, idx_t k,
float *distances, idx_t *labels) const override;
float *distances, idx_t *labels, ConcurrentBitsetPtr bitset = nullptr) const override;
};

View File

@ -297,10 +297,8 @@ void IndexIVF::make_direct_map (bool new_maintain_direct_map)
maintain_direct_map = new_maintain_direct_map;
}
void IndexIVF::search (idx_t n, const float *x, idx_t k,
float *distances, idx_t *labels) const
{
void IndexIVF::search (idx_t n, const float *x, idx_t k, float *distances, idx_t *labels,
ConcurrentBitsetPtr bitset) const {
std::unique_ptr<idx_t[]> idx(new idx_t[n * nprobe]);
std::unique_ptr<float[]> coarse_dis(new float[n * nprobe]);
@ -312,18 +310,47 @@ void IndexIVF::search (idx_t n, const float *x, idx_t k,
invlists->prefetch_lists (idx.get(), n * nprobe);
search_preassigned (n, x, k, idx.get(), coarse_dis.get(),
distances, labels, false);
distances, labels, false, nullptr,bitset);
indexIVF_stats.search_time += getmillisecs() - t0;
}
void IndexIVF::get_vector_by_id (idx_t n, const idx_t *xid, float *x, ConcurrentBitsetPtr bitset) {
if (!maintain_direct_map) {
make_direct_map(true);
}
/* only get vector by 1 id */
FAISS_ASSERT(n == 1);
if (!bitset || !bitset->test(xid[0])) {
reconstruct(xid[0], x + 0 * d);
} else {
memset(x, UINT8_MAX, d * sizeof(float));
}
}
void IndexIVF::search_by_id (idx_t n, const idx_t *xid, idx_t k, float *distances, idx_t *labels,
ConcurrentBitsetPtr bitset) {
if (!maintain_direct_map) {
make_direct_map(true);
}
auto x = new float[n * d];
for (idx_t i = 0; i < n; ++i) {
reconstruct(xid[i], x + i * d);
}
search(n, x, k, distances, labels, bitset);
delete []x;
}
void IndexIVF::search_preassigned (idx_t n, const float *x, idx_t k,
const idx_t *keys,
const float *coarse_dis ,
float *distances, idx_t *labels,
bool store_pairs,
const IVFSearchParameters *params) const
const IVFSearchParameters *params,
ConcurrentBitsetPtr bitset) const
{
long nprobe = params ? params->nprobe : this->nprobe;
long max_codes = params ? params->max_codes : this->max_codes;
@ -373,7 +400,7 @@ void IndexIVF::search_preassigned (idx_t n, const float *x, idx_t k,
// single list scan using the current scanner (with query
// set porperly) and storing results in simi and idxi
auto scan_one_list = [&] (idx_t key, float coarse_dis_i,
float *simi, idx_t *idxi) {
float *simi, idx_t *idxi, ConcurrentBitsetPtr bitset) {
if (key < 0) {
// not enough centroids for multiprobe
@ -405,7 +432,7 @@ void IndexIVF::search_preassigned (idx_t n, const float *x, idx_t k,
}
nheap += scanner->scan_codes (list_size, scodes.get(),
ids, simi, idxi, k);
ids, simi, idxi, k, bitset);
return list_size;
};
@ -438,7 +465,7 @@ void IndexIVF::search_preassigned (idx_t n, const float *x, idx_t k,
nscan += scan_one_list (
keys [i * nprobe + ik],
coarse_dis[i * nprobe + ik],
simi, idxi
simi, idxi, bitset
);
if (max_codes && nscan >= max_codes) {
@ -467,7 +494,7 @@ void IndexIVF::search_preassigned (idx_t n, const float *x, idx_t k,
ndis += scan_one_list
(keys [i * nprobe + ik],
coarse_dis[i * nprobe + ik],
local_dis.data(), local_idx.data());
local_dis.data(), local_idx.data(), bitset);
// can't do the test on max_codes
}

Some files were not shown because too many files have changed in this diff Show More