5 Migration
Jael Gu edited this page 2021-10-18 15:51:42 +08:00

Migration

You can follow this guide to migrate data from/to Milvus with following options:

You will need to install MilvusDM first.

Milvus to HDF5

You can save data in Milvus as HDF5 files using MilvusDM.

1. Download M2H.yaml:

wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/M2H.yaml

2. Set parameters:

  • source_milvus_path: working directory of Milvus
  • mysql_parameter: MySQL settings for Milvus (if MySQL is not used, set this parameter as '')
  • source_collection: names of the collection and its partitions in Milvus
  • data_dir: directory to save HDF5 files

Example:

M2H:
  milvus_version: 2.x
  source_milvus_path: '/home/user/milvus'
  mysql_parameter:
    host: '127.0.0.1'
    user: 'root'
    port: 3306
    password: '123456'
    database: 'milvus'
  source_collection: # specify the 'partition_1' and 'partition_2' partitions of the 'test' collection.
    test:
      - 'partition_1'
      - 'partition_2'
  data_dir: '/home/user/data'   

3. Run MilvusDM:

$ milvusdm --yaml M2H.yaml

Sample Code:

  1. Read the data under milvus/db on your local drive, and retrieve vectors and their corresponding IDs from Milvus according to the metadata of the specified collection or partitions:
collection_parameter, version = milvus_meta.get_collection_info(collection_name)
r_vectors, r_ids, r_rows = milvusdb.read_milvus_file(self.milvus_meta, collection_name, partition_tag)
  1. Save the retrieved data as HDF5 files:
data_save.save_yaml(collection_name, partition_tag, collection_parameter, version, save_hdf5_name)

HDF5 to Milvus

You can migrate HDF5 files to Milvus using MilvusDM.

1. Download H2M.yaml:

wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/H2M.yaml

2. Set parameters:

  • data_path: path to the HDF5 file
  • data_dir: directory of the HDF5 files
  • dest_host: Milvus server address
  • dest_port: Milvus server port
  • mode: mode of migration
    • Skip: skip data migration if the specified collection or partition already exists
    • Append: append data if the specified collection or partition already exists
    • Overwrite: delete existing data before insertion if the specified collection or partition already exists
  • dest_collection_name: name of the collection to import data to
  • dest_partition_name: name of the partition to import data to
  • collection_parameter: collection-specific information such as vector dimension, index file size, and similarity metric

Note:

Set either data_path or data_dir. Do not set both. Use data_path to specify multiple file paths, or data_dir to specify the directory holding your HDF5 files.

Example:

H2M:
  milvus-version: 2.x
  data_path:
    - /Users/zilliz/float_1.h5
    - /Users/zilliz/float_2.h5
  data_dir:
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'overwrite'        # 'skip/append/overwrite'
  dest_collection_name: 'test_float'
  dest_partition_name: 'partition_1'
  collection_parameter:
    dimension: 128
    index_file_size: 1024
    metric_type: 'L2'

3. Run MilvusDM:

$ milvusdm --yaml H2M.yaml

Sample Code:

  1. Read the HDF5 files to retrieve vectors and their corresponding IDs:
vectors, ids = self.file.read_hdf5_data()
  1. Insert the retrieved data into Milvus:
ids = insert_milvus.insert_data(vectors, self.c_name, self.c_param, self.mode, ids,self.p_name)

Faiss to Milvus

You can migrate data from Faiss to Milvus using MilvusDM.

1. Download F2M.yaml:

wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/F2M.yaml

2. Set parameters:

  • data_path: path to the data in Faiss
  • dest_host: Milvus server address
  • dest_port: Milvus server port
  • mode: mode of migration
    • Skip: skip data migration if the specified collection or partition already exists
    • Append: append data if the specified collection or partition already exists
    • Overwrite: delete existing data before insertion if the specified collection or partition already exists
  • dest_collection_name: name of the collection to import data to
  • dest_partition_name: name of the partition to import data to
  • collection_parameter: Collection-specific information such as vector dimension, index file size, and similarity metric

Example:

F2M:
  milvus_version: 2.x
  data_path: '/home/data/faiss.index'
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'append'        # 'skip/append/overwrite'
  dest_collection_name: 'test'
  dest_partition_name: ''
  collection_parameter:
    dimension: 256
    index_file_size: 1024
    metric_type: 'L2'

3. Run MilvusDM:

$ milvusdm --yaml F2M.yaml

Sample Code:

  1. Read Faiss data files to retrieve vectors and their corresponding IDs:
ids, vectors = faiss_data.read_faiss_data()
  1. Insert the retrieved data into Milvus:
insert_milvus.insert_data(vectors, self.dest_collection_name, self.collection_parameter, self.mode, ids, self.dest_partition_name)

Milvus 1.x to 2.0

You can use MilvusDM for Milvus version migration from 1.x to 2.0.

Note:

MilvusDM does not support migrating data from Milvus 2.0 standalone to Milvus 2.0 cluster.

To upgrade Milvus 2.0 (eg. from 2.0-rc4 to 2.0-rc5), refer to Upgrade Milvus using Helm Chart

1. Download M2M.yaml:

wget https://raw.githubusercontent.com/milvus-io/milvus-tools/main/yamls/M2M.yaml

2. Set parameters:

  • source_milvus_path: working directory of the source Milvus
  • mysql_parameter: MySQL settings for the source Milvus (if MySQL is not used, set this parameter as '')
  • source_collection: names of the collection and its partitions in the source Milvus
  • dest_host: target Milvus server address
  • dest_port: target Milvus server port
  • mode: mode of migration
    • Skip: skip data migration if the specified collection or partition already exists
    • Append: append data if the specified collection or partition already exists
    • Overwrite: delete existing data before insertion if the specified collection or partition already exists.

Example:

M2M:
  milvus_version: 2.x
  source_milvus_path: '/home/user/milvus'
  mysql_parameter:
    host: '127.0.0.1'
    user: 'root'
    port: 3306
    password: '123456'
    database: 'milvus'
  source_collection:
    test:
      - 'partition_1'
      - 'partition_2'
  dest_host: '127.0.0.1'
  dest_port: 19530
  mode: 'skip' # 'skip/append/overwrite'

3. Run MilvusDM:

$ milvusdm --yaml M2M.yaml

Sample Code:

  1. Read the data under milvus/db on your local drive, and retrieve vectors and their corresponding IDs from the source Milvus according to the metadata of the specified collections or partitions:
collection_parameter, _ = milvus_meta.get_collection_info(collection_name)
r_vectors, r_ids, r_rows = milvusdb.read_milvus_file(self.milvus_meta, collection_name, partition_tag) 
  1. Insert the retrieved vectors and the corresponding IDs into the target Milvus:
milvus_insert.insert_data(r_vectors, collection_name, collection_parameter, self.mode, r_ids, partition_tag)