索引你的数据
前面我们已经讲述了如何部署Elasticsearch集群并使Elasticsearch集群正常运行。那接下来我们就来讲讲如何通过Elasticsearch REST API来索引数据、删除数据以及检索数据。
Elasticsearch官方提供了很多种语言的客户端,详见https://www.elastic.co/guide/index.html
创建索引
当我们在Elasticsearch集群创建我们的第一个document的时候,我们不需要完全关心整个创建索引的过程。我们仅仅只需要使用如下命令即可:
curl -XPUT http://localhost:9200/blog/
可以看到我们使用命令行curl来与Elasticsearch通信,通过9200端口由HTTP向Elasticsearch的RESTful API传送json。
如果上述脚本中的index不存在,Elasticsearch会为我们自动创建这个jindex。这样我们就仅仅告诉Elasticsearch我们只想创建一个名为blog的index。如果一切正常,你可以看到Elasticsearch会返回如下响应:
{"acknowledged":true}
配置新创建的index
我们手动创建的index同样需要我们去设定一些配置参数,比如分片数、副本数,例如:
curl -XPUT http://localhost:9200/blog/ -d '{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 2
}
}'
这样命令执行后会创建一个只有一个分片,2份副本的名为blog的index,事实上这样一共创建3份物理的Lucene索引。 另外其他的一些参数也可以通过这样的方式来进行设置。
这样,我们已经拥有我们自己的全新的index了。但是这里有一个问题,我们忘记提供描述index结构的mappings。那我们可以怎么做呢?对于我们完全没有任何数据的index,我们可以简单地delete the index。我们只需要执行如下命令:
curl –XDELETE http://localhost:9200/posts
和先前执行的结果一样,你可以看到:
{"acknowledged":true}
在index中创建新的Document
curl -XPUT http://localhost:9200/blog/article/1 -d '{"title": "New
version of Elasticsearch released!", "content": "...", "tags":
["announce", "elasticsearch", "release"] }'
Mappings 配置
如果你使用过SQL数据库,你就会知道在我们往数据库插入数据前,你先要创建schema用来描述你的数据到底是什么样的。尽管Elasticsearch是schema-less的搜索引擎,但我们还是可以按照我们自己的方式来定义data structure。
Index structure mapping
schema mapping是用来定义索引结构的,其实无非就是设置document包含哪些field,然后对每一个field个性化的设置类型、是否存储,以及设置索引分析器和查询使用的分析器。
剖析mapping
一个mapping由一个或多个analyzer组成, 一个analyzer又由一个或多个filter组成的。当ES索引文档的时候,它把字段中的内容传递给相应的analyzer,analyzer再传递给各自的filters。
filter的功能很容易理解:一个filter就是一个转换数据的方法, 输入一个字符串,这个方法返回另一个字符串,比如一个将字符串转为小写的方法就是一个filter很好的例子。
一个analyzer由一组顺序排列的filter组成,执行分析的过程就是按顺序一个filter一个filter依次调用, ES存储和索引最后得到的结果。
总结来说, mapping的作用就是执行一系列的指令将输入的数据转成可搜索的索引项。
Core types
Elasticsearch中每个filed的type都可以指定,Elasticsearch包含的core types包含以下几种:
- String
- Number
- Date
- Boolean
- Binary
公共属性
每种Elasticsearch的type都会有不同的属性,但有一些通用的属性是所有type都具备的。
- index_name:
This defines the name of the field that will be stored in the index. If this is not defined, the name will be set to the name of the object that the field is defined with.
- index:
This can take the values analyzed and no. Also, for string-based fields, it can also be set to not_analyzed. If set to analyzed, the field will be indexed and thus searchable. If set to no, you won't be able to search on such a field. The default value is analyzed. In the case of string-based fields, there is an additional option, not_analyzed. This, when set, will mean that the field will be indexed but not analyzed. So, the field is written in the index as it was sent to Elasticsearch and only a perfect match will be counted during a search. Setting the index property to no will result in the disabling of the include_in_all property of such a field.
- store:
This can take the values yes and no and specifies if the original value of the field should be written into the index. The default value is no, which means that you can't return that field in the results (although, if you use the _source field, you can return the value even if it is not stored), but if you have it indexed, you can still search the data on the basis of it.
- boost:
The default value of this attribute is 1. Basically, it defines how important the field is inside the document; the higher the boost, the more important the values in the field.
- null_value:
This attribute specifies a value that should be written into the index in case that field is not a part of an indexed document. The default behavior will just omit that field.
- copy_to:
This attribute specifies a field to which all field values will be copied.
- include_in_all:
This attribute specifies if the field should be included in the _all field. By default, if the _all field is used, all the fields will be included in it.
默认analyzer
mappings: {
item: {
properties: {
description: {
type: string
}
name: {
type: string
}
}
}
}
ES猜测description字段是string类型,于是默认创建一个string类型的mapping,它使用默认的全局analyzer, 默认的analyzer是标准analyzer, 这个标准analyzer有三个filter:token filter, lowercase filter和stop token filter。 我们可以在做查询的时候键入_analyze关键字查看分析的过程。使用以下指令查看description字段的转换过程:
curl -X GET "http://localhost:9200/test/_analyze?analyzer=standard&pretty=true" -d "A Pretty cool guy."
{
"tokens" : [ {
"token" : "pretty",
"start_offset" : 2,
"end_offset" : 8,
"type" : "<ALPHANUM>",
"position" : 2
}, {
"token" : "cool",
"start_offset" : 9,
"end_offset" : 13,
"type" : "<ALPHANUM>",
"position" : 3
}, {
"token" : "guy",
"start_offset" : 14,
"end_offset" : 17,
"type" : "<ALPHANUM>",
"position" : 4
} ]
看看以单词a来搜索的结果:
$ curl -X GET "http://localhost:9200/test/_search?pretty=true" -d '{
"query" : {
"text" : { "description": "a" }
}
}'
{
"took" : 29,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 0,
"max_score" : null,
"hits" : [ ]
}
}
配置mapping方式
有两种添加mapping的方法,一种是定义在配置文件中,一种是运行是手动提交。
- 配置文件修改方式
把[mapping名].json文件放到config/mappings/[索引名]目录下,这个目录要自己创建,一个mapping和一个索引对应,你也可以定义一个默认的mapping,把自己定义的default-mapping.json放到config目录下就行
- 通过Java API手动提交mapping方式
XContentBuilder mapping = XContentFactory.jsonBuilder()
.startObject().startObject("userindex")
.startObject("_source").field("enabled", "false").endObject()
.startObject("properties")
.startObject("appId").field("type","string").field("index", "not_analyzed").endObject()
.startObject("cell").field("type","string").field("index", "not_analyzed").endObject()
.startObject("phoneType").field("type","string").field("index", "not_analyzed").endObject()
.startObject("province").field("type","string").field("index", "not_analyzed").endObject()
.startObject("tags").field("type","string").field("index_analyzer", "whitespace").field("search_analyzer", "whitespace").endObject()
.startObject("city").field("type","string").field("index", "not_analyzed").endObject()
.startObject("sex").field("type","double").field("index", "analyzed").endObject()
.startObject("price").field("type","double").field("index", "analyzed").endObject()
.startObject("usertype").field("type","string").field("index_analyzer", "whitespace").field("search_analyzer", "whitespace").endObject()
.startObject("mobiler").field("type","string").field("index", "analyzed").endObject()
.endObject().endObject().endObject();
PutMappingRequest mappingRequest = Requests.putMappingRequest("userindex")
.type("Aaaa").source(mapping);
client.admin().indices().putMapping(mappingRequest).actionGet();
Elasticsearch mapping示例
{
"_default_":{
"properties":{
"appId":{
"type":"string",
"index":"not_analyzed"
},
"cell":{
"type":"string",
"index":"not_analyzed"
},
"phoneType":{
"type":"string",
"index":"not_analyzed"
},
"province":{
"type":"string",
"index":"not_analyzed"
},
"tags":{
"type":"string",
"index_analyzer":"whitespace",
"search_analyzer":"whitespace"
},
"city":{
"type":"string",
"index":"not_analyzed"
},
"sex":{
"type":"double",
"index":"analyzed"
},
"price":{
"type":"double",
"index":"analyzed"
},
"usertype":{
"type":"string",
"index_analyzer":"whitespace",
"search_analyzer":"whitespace"
},
"mobiler":{
"type":"string",
"index":"analyzed"
},
"userTag":{
"type":"string",
"index_analyzer":"whitespace",
"search_analyzer":"whitespace"
}
}
}
}