tonglin0325的个人主页

ElasticSearch学习笔记——ik分词添加词库

前置条件是安装ik分词,请参考

Elasticsearch学习笔记——分词

1.在ik分词的config下添加词库文件

1
2
3
~/software/apache/elasticsearch-6.2.4/config/analysis-ik$ ls | grep mydic.dic
mydic.dic

内容为

1
2
我给祖国献石油

2.配置词库路径,编辑IKAnalyzer.cfg.xml配置文件,添加新增的词库

3.重启es

4.测试

data.json

1
2
3
4
5
{
"analyzer":"ik_max_word",
"text": "我给祖国献石油"
}

添加之后的ik分词结果

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
curl -H 'Content-Type: application/json' http://localhost:9200/_analyze?pretty=true -d@data.json
{
"tokens" : [
{
"token" : "我",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "给",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 1
},
{
"token" : "祖国",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "献",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 3
},
{
"token" : "石油",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 4
}
]
}

添加之后的ik分词结果,分词结果的tokens中增加了 “我给祖国献石油”

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
curl -H 'Content-Type: application/json' http://localhost:9200/_analyze?pretty=true -d@data.json
{
"tokens" : [
{
"token" : "我给祖国献石油",
"start_offset" : 0,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 0
},
{
"token" : "祖国",
"start_offset" : 2,
"end_offset" : 4,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "献",
"start_offset" : 4,
"end_offset" : 5,
"type" : "CN_CHAR",
"position" : 2
},
{
"token" : "石油",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
}
]
}