如何通过Logstash将自建Elasticsearch数据全量或增量迁移至阿里云_检索分析服务 Elasticsearch版(ES)-阿里云帮助中心

注意事项

自建 Logstash 所在的 ECS 需要与阿里云 Elasticsearch 集群在同一专有网络下，同时该 Logstash 需要能够同时访问源 Elasticsearch 集群（自建）和目标 Elasticsearch 集群（阿里云）。
数据迁移可以全量迁移或增量迁移。如果业务侧时刻存在写入更新，首次迁移时，需先全量迁移，再通过时间标识字段（或其他可标识增量的字段）进行增量迁移，否则迁移后新数据极易被旧数据覆盖。如果已有全量数据，可以只通过标识字段实现增量数据迁移。

操作流程

步骤一：准备环境与实例

开通阿里云 Elasticsearch 服务，在 ECS 服务器部署自建 Elasticsearch、准备待迁移数据和部署自建 Logstash。
（可选）步骤二：迁移索引元数据（设置和映射）

在 ECS 服务器运行 Python 脚本迁移索引元数据。
步骤三：迁移全量数据

通过 Logstash 管道配置功能，将自建 Elasticsearch 中的全量数据迁移至阿里云 Elasticsearch 中。
步骤四：迁移增量数据
步骤五：查看数据迁移结果

数据架构

步骤一：准备环境与实例

创建阿里云 Elasticsearch 实例。

具体操作请参见创建阿里云 Elasticsearch 实例。本文使用的测试环境如下。

环境项	环境信息
地域	华东 1（杭州）。
版本	通用商业版 7.10.0。
实例规格配置	3 个可用区、3 个数据节点、单节点 4 核 CPU、16 GB 内存、100 GB ESSD 云盘。

创建 ECS 实例，用于部署自建 Elasticsearch、自建 Kibana 和自建 Logstash。

具体操作请参见自定义购买实例。本文使用的测试环境如下。

环境项	环境信息
地域	华东 1（杭州）。
实例规格	4 vCPU 16 GiB 内存。
镜像	公共镜像、CentOS 7.9 64 位。
存储	系统盘、ESSD 云盘、100 GiB。
网络	与阿里云 Elasticsearch 相同的专有网络，选中分配公网 IPv4 地址，并按使用流量计费，带宽峰值为 100 Mbps。
安全组	入方向添加 5601 端口（即 Kibana 端口），在授权对象中添加您客户端的 IP 地址。

部署自建 Elasticsearch。

本文使用的自建 Elasticsearch 版本为 7.6.2，1 个数据节点，具体操作步骤如下：
1. 连接 ECS 服务器。
  
  具体操作，请参见通过密码或密钥认证登录 Linux 实例。
2. 使用 root 用户权限创建 elastic 用户。
```
useradd elastic
```
3. 设置 elastic 用户的密码。
```
passwd elastic
```
  系统将提示您输入和确认 elastic 用户的密码。
4. 将 root 用户切换为 elastic 用户。
```
su -l elastic
```
5. 下载 Elasticsearch 软件安装包并解压缩。
```
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.6.2-linux-x86_64.tar.gz
tar -zvxf elasticsearch-7.6.2-linux-x86_64.tar.gz
```
6. 启动 Elasticsearch。
  进入 Elasticsearch 的安装目录下，启动服务。
```
cd elasticsearch-7.6.2
./bin/elasticsearch -d
```
7. 验证 Elasticsearch 服务是否正常运行。
```
cd ~ 
curl localhost:9200
```
  正常情况下，返回结果中会显示 Elasticsearch 版本号和 "You Know, for Search" 。
部署自建 Kibana，并准备测试数据。

本文使用的自建 Kibana 版本为 7.6.2，1 个数据节点，具体操作步骤如下：
1. 连接 ECS 服务器。
  
  具体操作请参见通过密码或密钥认证登录 Linux 实例。
2. ```
wget https://artifacts.elastic.co/downloads/kibana/kibana-7.6.2-linux-x86_64.tar.gz
tar -zvxf kibana-7.6.2-linux-x86_64.tar.gz
```
3. ```
cd kibana-7.6.2-linux-x86_64
vi config/kibana.yml
```
4. ```
sudo nohup ./bin/kibana &
```
2. ```
cd ~
wget https://artifacts.elastic.co/downloads/logstash/logstash-7.10.0-linux-x86_64.tar.gz
tar -zvxf logstash-7.10.0-linux-x86_64.tar.gz
```
3. ```
cd logstash-7.10.0
sudo vi config/jvm.options
```
4. ```
vi config/pipelines.yml
```
5. 1. ```
  bin/logstash -e 'input { stdin { } } output { stdout {} }'
```

```
sudo vi indiceCreate.py
```

#!/usr/bin/python
# -*- coding: UTF-8 -*-
# 文件名：indiceCreate.py
import sys
import base64
import time
import httplib
import json
## 源集群host。
oldClusterHost = "localhost:9200"
## 源集群用户名，可为空。
oldClusterUserName = "elastic"
## 源集群密码，可为空。
oldClusterPassword = "xxxxxx"
## 目标集群host，可在阿里云Elasticsearch实例的基本信息页面获取。
newClusterHost = "es-cn-zvp2m4bko0009****.elasticsearch.aliyuncs.com:9200"
## 目标集群用户名。
newClusterUser = "elastic"
## 目标集群密码。
newClusterPassword = "xxxxxx"
DEFAULT_REPLICAS = 0
def httpRequest(method, host, endpoint, params="", username="", password=""):
    conn = httplib.HTTPConnection(host)
    headers = {}
    if (username != "") :
        'Hello {name}, your age is {age} !'.format(name = 'Tom', age = '20')
        base64string = base64.encodestring('{username}:{password}'.format(username = username, password = password)).replace('\n', '')
        headers["Authorization"] = "Basic %s" % base64string;
    if "GET" == method:
        headers["Content-Type"] = "application/x-www-form-urlencoded"
        conn.request(method=method, url=endpoint, headers=headers)
    else :
        headers["Content-Type"] = "application/json"
        conn.request(method=method, url=endpoint, body=params, headers=headers)
    response = conn.getresponse()
    res = response.read()
    return res
def httpGet(host, endpoint, username="", password=""):
    return httpRequest("GET", host, endpoint, "", username, password)
def httpPost(host, endpoint, params, username="", password=""):
    return httpRequest("POST", host, endpoint, params, username, password)
def httpPut(host, endpoint, params, username="", password=""):
    return httpRequest("PUT", host, endpoint, params, username, password)
def getIndices(host, username="", password=""):
    endpoint = "/_cat/indices"
    indicesResult = httpGet(oldClusterHost, endpoint, oldClusterUserName, oldClusterPassword)
    indicesList = indicesResult.split("\n")
    indexList = []
    for indices in indicesList:
        if (indices.find("open") > 0):
            indexList.append(indices.split()[2])
    return indexList
def getSettings(index, host, username="", password=""):
    endpoint = "/" + index + "/_settings"
    indexSettings = httpGet(host, endpoint, username, password)
    print (index + "  原始settings如下：\n" + indexSettings)
    settingsDict = json.loads(indexSettings)
    ## 分片数默认和源集群索引保持一致。
    number_of_shards = settingsDict[index]["settings"]["index"]["number_of_shards"]
    ## 副本数默认为0。
    number_of_replicas = DEFAULT_REPLICAS
    newSetting = "\"settings\": {\"number_of_shards\": %s, \"number_of_replicas\": %s}" % (number_of_shards, number_of_replicas)
    return newSetting
def getMapping(index, host, username="", password=""):
    endpoint = "/" + index + "/_mapping"
    indexMapping = httpGet(host, endpoint, username, password)
    print (index + " 原始mapping如下：\n" + indexMapping)
    mappingDict = json.loads(indexMapping)
    mappings = json.dumps(mappingDict[index]["mappings"])
    newMapping = "\"mappings\" : " + mappings
    return newMapping
def createIndexStatement(oldIndexName):
    settingStr = getSettings(oldIndexName, oldClusterHost, oldClusterUserName, oldClusterPassword)
    mappingStr = getMapping(oldIndexName, oldClusterHost, oldClusterUserName, oldClusterPassword)
    createstatement = "{\n" + str(settingStr) + ",\n" + str(mappingStr) + "\n}"
    return createstatement
def createIndex(oldIndexName, newIndexName=""):
    if (newIndexName == "") :
        newIndexName = oldIndexName
    createstatement = createIndexStatement(oldIndexName)
    print ("新索引 " + newIndexName + " 的setting和mapping如下：\n" + createstatement)
    endpoint = "/" + newIndexName
    createResult = httpPut(newClusterHost, endpoint, createstatement, newClusterUser, newClusterPassword)
    print ("新索引 " + newIndexName + " 创建结果：" + createResult)
## main
indexList = getIndices(oldClusterHost, oldClusterUserName, oldClusterPassword)
systemIndex = []
for index in indexList:
    if (index.startswith(".")):
        systemIndex.append(index)
    else :
        createIndex(index, index)
if (len(systemIndex) > 0) :
    for index in systemIndex:
        print (index + " 或许是系统索引，不会重新创建，如有需要，请单独处理～")

```
sudo /usr/bin/python indiceCreate.py
```
```
GET /_cat/indices?v
```

```
cd logstash-7.10.0/config
vi es2es_all.conf
```

input{
    elasticsearch{
        # 源端ES地址。
        hosts =>  ["http://localhost:9200"]
        # 安全集群配置登录用户名密码。
        user => "xxxxxx"
        password => "xxxxxx"
        # 需要迁移的索引列表，多个索引以英文以逗号（,）分隔。
        index => "kibana_sample_data_*"
        # 以下三项保持默认即可，包含线程数和迁移数据大小和Logstash JVM配置相关。
        docinfo=>true
        slices => 5
        size => 5000
filter {
  # 去掉一些Logstash自己加的字段。
  mutate {
    remove_field => ["@timestamp", "@version"]
output{
    elasticsearch{
        # 目标端ES地址，可在阿里云Elasticsearch实例的基本信息页面获取。
        hosts => ["http://es-cn-zvp2m4bko0009****.elasticsearch.aliyuncs.com:9200"]
        # 安全集群配置登录用户名密码。
        user => "elastic"
        password => "xxxxxx"
        # 目标端索引名称，以下配置表示索引与源端保持一致。
        index => "%{[@metadata][_index]}"
        # 目标端索引type，以下配置表示索引类型与源端保持一致。
        document_type => "%{[@metadata][_type]}"
        # 目标端数据的id，如果不需要保留原id，可以删除以下这行，删除后性能会更好。
        document_id => "%{[@metadata][_id]}"
        ilm_enabled => false
        manage_template => false
}

input{
    elasticsearch{
        # 源端ES地址。
        hosts =>  ["http://es-cn-uqm3811160002***.elasticsearch.aliyuncs.com:9200"]
        # 安全集群配置登录用户名密码。
        user => "elastic"
        password => ""
        # 需要迁移的索引列表，多个索引以英文以逗号（,）分隔。
        index => "test_ecommerce"
        # 以下三项保持默认即可，包含线程数和迁移数据大小和Logstash JVM配置相关。
        docinfo => true
        size => 10000
        docinfo_target => "[@metadata]"
filter {
  # 去掉一些Logstash自己加的字段。
  mutate {
    remove_field => ["@timestamp","@version"]
output{
    elasticsearch{
        # 目标端ES地址，可在阿里云Elasticsearch实例的基本信息页面获取。
        hosts => ["http://es-cn-nwy38aixp0001****.elasticsearch.aliyuncs.com:9200"]
        # 安全集群配置登录用户名密码。
        user => "elastic"
        password => ""
        # 目标端索引名称，以下配置表示索引与源端保持一致。
        index => "%{[@metadata][_index]}"
        # 目标端数据的id，如果不需要保留原id，可以删除以下这行，删除后性能会更好。
        document_id => "%{[@metadata][_id]}"
        ilm_enabled => false
        manage_template => false
}

schedule => "20 13 5 3 *"

```
cd ~/logstash-7.10.0
```

nohup bin/logstash -f config/es2es_all.conf >/dev/null 2>&1 &

```
cd config
vi es2es_kibana_sample_data_logs.conf
```

input{
    elasticsearch{
        # 源端ES地址。
        hosts =>  ["http://localhost:9200"]
        # 安全集群配置登录用户名密码。
        user => "xxxxxx"
        password => "xxxxxx"
        # 需要迁移的索引列表，多个索引使用英文逗号（,）分隔。
        index => "kibana_sample_data_logs"
        # 按时间范围查询增量数据，以下配置表示查询最近5分钟的数据。
        query => '{"query":{"range":{"@timestamp":{"gte":"now-5m","lte":"now/m"}}}}'
        # 定时任务，以下配置表示每分钟执行一次。
        schedule => "* * * * *"
        scroll => "5m"
        docinfo=>true
        size => 5000
filter {
  # 去掉一些Logstash自己加的字段.
  mutate {
    remove_field => ["@timestamp", "@version"]
output{
    elasticsearch{
        # 目标端ES地址，可在阿里云Elasticsearch实例的基本信息页面获取。
        hosts => ["http://es-cn-zvp2m4bko0009****.elasticsearch.aliyuncs.com:9200"]
        # 安全集群配置登录用户名密码.
        user => "elastic"
        password => "xxxxxx"
        # 目标端索引名称，以下配置表示索引与源端保持一致。
        index => "%{[@metadata][_index]}"
        # 目标端索引type，以下配置表示索引类型与源端保持一致。
        document_type => "%{[@metadata][_type]}"
        # 目标端数据的id，如果不需要保留原id，可以删除以下这行，删除后性能会更好。
        document_id => "%{[@metadata][_id]}"
        ilm_enabled => false
        manage_template => false
}

```
cd ~/logstash-7.10.0
```

sudo nohup bin/logstash -f config/es2es_kibana_sample_data_logs.conf >/dev/null 2>&1 &

GET kibana_sample_data_logs/_search
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-5m",
        "lte": "now/m"
  "sort": [
      "@timestamp": {
        "order": "desc"

1. ```
GET _cat/indices?v
```
```
GET kibana_sample_data_logs/_search
  "query": {
    "range": {
      "@timestamp": {
        "gte": "now-5m",
        "lte": "now/m"
  "sort": [
      "@timestamp": {
        "order": "desc"
}
```

通过Logstash将自建Elasticsearch数据全量或增量迁移至阿里云

注意事项

操作流程

步骤一：准备环境与实例

（可选）步骤二：迁移索引元数据（设置和映射）

步骤三：迁移全量数据

7.10.0 版本

8.5.1 版本

步骤四：迁移增量数据

步骤五：查看数据迁移结果