Elasticsearch安装analysis-ik中文分词插件

环境:Elasticsearch 2.3.2和analysis-ik 1.9.3为例

一开始我下载了个最新版本的ik结果安装后启动提示版本不兼容。

/etc/init.d/elasticsearch start
Starting elasticsearch: Exception in thread “main” java.lang.IllegalArgumentException: Plugin [analysis-ik] is incompatible with Elasticsearch [2.3.2]. Was designed for version [5.0.0]

重新查找后很简单也不用mvn重新编译打包

到https://github.com/medcl/elasticsearch-analysis-ik/releases对应下载一个zip包,解压放到usr/share/elasticsearch/plugins/ik下即可。

配置词库(ik自带搜狗词库)
配置:/usr/share/elasticsearch/plugins/ik/config/ik/IKAnalyzer.cfg.xml

<entry key="ext_dict">custom/mydict.dic;custom/single_word_low_freq.dic;custom/sougou.dic</entry>

打开ES_HOME/config/elasticsearch.yml文件

在文件最后加入如下内容:

index:
  analysis:                   
    analyzer:      
      ik:
          alias: [ik_analyzer]
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_max_word:
          type: ik
          use_smart: false
      ik_smart:
          type: ik
          use_smart: true
index.analysis.analyzer.default.type: ik

重启elasticsearch
service elasticsearch restart

测试

http://localhost:9200/随便一个索引名/_analyze?analyzer=ik&pretty=true&text=深圳热销限时促销优惠600元

{
  "tokens" : [ {
    "token" : "深圳",
    "start_offset" : 0,
    "end_offset" : 2,
    "type" : "CN_WORD",
    "position" : 0
  }, {
    "token" : "圳",
    "start_offset" : 1,
    "end_offset" : 2,
    "type" : "CN_WORD",
    "position" : 1
  }, {
    "token" : "热销",
    "start_offset" : 2,
    "end_offset" : 4,
    "type" : "CN_WORD",
    "position" : 2
  }, {
    "token" : "热",
    "start_offset" : 2,
    "end_offset" : 3,
    "type" : "CN_WORD",
    "position" : 3
  }, {
    "token" : "销",
    "start_offset" : 3,
    "end_offset" : 4,
    "type" : "CN_WORD",
    "position" : 4
  }, {
    "token" : "限时",
    "start_offset" : 4,
    "end_offset" : 6,
    "type" : "CN_WORD",
    "position" : 5
  }, {
    "token" : "促销",
    "start_offset" : 6,
    "end_offset" : 8,
    "type" : "CN_WORD",
    "position" : 6
  }, {
    "token" : "促",
    "start_offset" : 6,
    "end_offset" : 7,
    "type" : "CN_WORD",
    "position" : 7
  }, {
    "token" : "销",
    "start_offset" : 7,
    "end_offset" : 8,
    "type" : "CN_WORD",
    "position" : 8
  }, {
    "token" : "优惠",
    "start_offset" : 8,
    "end_offset" : 10,
    "type" : "CN_WORD",
    "position" : 9
  }, {
    "token" : "惠",
    "start_offset" : 9,
    "end_offset" : 10,
    "type" : "CN_WORD",
    "position" : 10
  }, {
    "token" : "600",
    "start_offset" : 10,
    "end_offset" : 13,
    "type" : "ARABIC",
    "position" : 11
  }, {
    "token" : "元",
    "start_offset" : 13,
    "end_offset" : 14,
    "type" : "COUNT",
    "position" : 12
  } ]
}
此条目发表在开源代码分类目录,贴了标签。将固定链接加入收藏夹。

发表回复