一、Field datatype(字段数据类型)
1.1string类型
ELasticsearch 5.X之后的字段类型不再支持string,由text或keyword取代。 如果仍使用string,会给出警告。
PUT my_index{ "mappings": { "my_type": { "properties": { "title": { "type": "string" } } } }}
#! Deprecation: The [string] field is deprecated, please use [text] or [keyword] instead on [title]{ "acknowledged": true, "shards_acknowledged": true}
1.2 text类型
text取代了string,当一个字段是要被全文搜索的,比如Email内容、产品描述,应该使用text类型。设置text类型以后,字段内容会被分析,在生成倒排索引以前,字符串会被分析器分成一个一个词项。text类型的字段不用于排序,很少用于聚合(termsAggregation除外)。
把full_name字段设为text类型的Mapping如下:
PUT my_index{ "mappings": { "my_type": { "properties": { "full_name": { "type": "text" } } } }}
1.3 keyword类型
keyword类型适用于索引结构化的字段,比如email地址、主机名、状态码和标签。如果字段需要进行过滤(比如查找已发布博客中status属性为published的文章)、排序、聚合。keyword类型的字段只能通过精确值搜索到。
1.4 数字类型
对于数字类型,ELasticsearch支持以下几种:
类型 | 取值范围 |
---|---|
long | -2^63至2^63-1 |
integer | -2^31至2^31-1 |
short | -32,768至32768 |
byte | -128至127 |
double | 64位双精度IEEE 754浮点类型 |
float | 32位单精度IEEE 754浮点类型 |
half_float | 16位半精度IEEE 754浮点类型 |
scaled_float | 缩放类型的的浮点数(比如价格只需要精确到分,price为57.34的字段缩放因子为100,存起来就是5734) scaled缩放 [skeɪld] |
对于float、half_float和scaled_float,-0.0和+0.0是不同的值,使用term查询查找-0.0不会匹配+0.0,同样range查询中上边界是-0.0不会匹配+0.0,下边界是+0.0不会匹配-0.0。
对于数字类型的数据,选择以上数据类型的注意事项:
- 在满足需求的情况下,尽可能选择范围小的数据类型。比如,某个字段的取值最大值不会超过100,那么选择byte类型即可。迄今为止吉尼斯记录的人类的年龄的最大值为134岁,对于年龄字段,short足矣。字段的长度越短,索引和搜索的效率越高。
- 优先考虑使用带缩放因子的浮点类型。
例子:
POST my_index{ "mappings": { "my_type": { "properties": { "number_of_bytes": { "type": "integer" }, "time_in_seconds": { "type": "float" }, "price": { "type": "scaled_float", "scaling_factor": 100 } } } }}
1.5 Object类型
JSON天生具有层级关系,文档会包含嵌套的对象:
POST my_index/my_type/1{ "region": "US", "manager": { "age": 30, "name": { "first": "John", "last": "Smith" } }}
上面的文档中,整体是一个JSON,JSON中包含一个manager,manager又包含一个name。最终,文档会被索引成一平的key-value对:
{ "region": "US", "manager.age": 30, "manager.name.first": "John", "manager.name.last": "Smith"}
上面文档结构的Mapping如下:
PUT my_index{ "mappings": { "my_type": { "properties": { "region": { "type": "keyword" }, "manager": { "properties": { "age": { "type": "integer" }, "name": { "properties": { "first": { "type": "text" }, "last": { "type": "text" } } } } } } } }}
1.6 date类型
JSON中没有日期类型,所以在ELasticsearch中,日期类型可以是以下几种:
- 日期格式的字符串:e.g. “2015-01-01” or “2015/01/01 12:10:30”.
- long类型的毫秒数( milliseconds-since-the-epoch)
- integer的秒数(seconds-since-the-epoch)
日期格式可以自定义,如果没有自定义,默认格式如下:
"strict_date_optional_time||epoch_millis"
PUT my_index{ "mappings": { "my_type": { "properties": { "date": { "type": "date" } } } }}POST my_index/my_type/1{ "date": "2015-01-01" } POST my_index/my_type/2{ "date": "2015-01-01T12:10:30Z" } POST my_index/my_type/3{ "date": 1420070400001 } POST my_index/_search{ "sort": { "date": "asc"} }
查看三个日期类型:
{ "took": 0, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": 1, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": 1, "_source": { "date": "2015-01-01T12:10:30Z" } }, { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "date": "2015-01-01" } }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": 1, "_source": { "date": 1420070400001 } } ] }}
排序结果:
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 3, "max_score": null, "hits": [ { "_index": "my_index", "_type": "my_type", "_id": "1", "_score": null, "_source": { "date": "2015-01-01" }, "sort": [ 1420070400000 ] }, { "_index": "my_index", "_type": "my_type", "_id": "3", "_score": null, "_source": { "date": 1420070400001 }, "sort": [ 1420070400001 ] }, { "_index": "my_index", "_type": "my_type", "_id": "2", "_score": null, "_source": { "date": "2015-01-01T12:10:30Z" }, "sort": [ 1420114230000 ] } ] }}
1.7 Array类型
ELasticsearch没有专用的数组类型,默认情况下任何字段都可以包含一个或者多个值,但是一个数组中的值要是同一种类型。例如:
- 字符数组: [ “one”, “two” ]
- 整型数组:[1,3]
- 嵌套数组:[1,[2,3]],等价于[1,2,3]
- 对象数组:[ { “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]
注意事项:
- 动态添加数据时,数组的第一个值的类型决定整个数组的类型
- 混合数组类型是不支持的,比如:[1,”abc”]
- 数组可以包含null值,空数组[ ]会被当做missing field对待。
1.8 binary类型
binary类型接受base64编码的字符串,默认不存储也不可搜索。
PUT my_index{ "mappings": { "my_type": { "properties": { "name": { "type": "text" }, "blob": { "type": "binary" } } } }}POST my_index/my_type/1{ "name": "Some binary blob", "blob": "U29tZSBiaW5hcnkgYmxvYg==" }
搜索blog字段:
POST my_index/_search{ "query": { "match": { "blob": "test" } }}返回结果:{ "error": { "root_cause": [ { "type": "query_shard_exception", "reason": "Binary fields do not support searching", "index_uuid": "fgA7UM5XSS-56JO4F4fYug", "index": "my_index" } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": true, "failed_shards": [ { "shard": 0, "index": "my_index", "node": "3dQd1RRVTMiKdTckM68nPQ", "reason": { "type": "query_shard_exception", "reason": "Binary fields do not support searching", "index_uuid": "fgA7UM5XSS-56JO4F4fYug", "index": "my_index" } } ] }, "status": 400}
Base64加密、解码工具:
1.9 ip类型
ip类型的字段用于存储IPV4或者IPV6的地址。
PUT my_index{ "mappings": { "my_type": { "properties": { "ip_addr": { "type": "ip" } } } }}POST my_index/my_type/1{ "ip_addr": "192.168.1.1"}POST my_index/_search{ "query": { "term": { "ip_addr": "192.168.0.0" } }}
1.10 range类型
range类型支持以下几种:
类型 | 范围 |
---|---|
integer_range | -2^31至2^31-1 |
float_range | 32-bit IEEE 754 |
long_range | -2^63至2^63-1 |
double_range | 64-bit IEEE 754 |
date_range | 64位整数,毫秒计时 |
range类型的使用场景:比如前端的时间选择表单、年龄范围选择表单等。
例子:PUT range_index{ "mappings": { "my_type": { "properties": { "expected_attendees": { "type": "integer_range" }, "time_frame": { "type": "date_range", "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" } } } }}POST range_index/my_type/1{ "expected_attendees" : { "gte" : 10, "lte" : 20 }, "time_frame" : { "gte" : "2015-10-31 12:00:00", "lte" : "2015-11-01" }}
上面代码创建了一个range_index索引,expected_attendees的人数为10到20,时间是2015-10-31 12:00:00至2015-11-01。
查询:
"query" : { "range" : {
"time_frame" : { "gte" : "2015-08-01", "lte" : "2015-12-01", "relation" : "within" } } }POST range_index/_search{ "query" : { "range" : { "time_frame" : { "gte" : "2015-08-01", "lte" : "2015-12-01", "relation" : "within" } } }}
查询结果:
{ "took": 2, "timed_out": false, "_shards": { "total": 5, "successful": 5, "failed": 0 }, "hits": { "total": 1, "max_score": 1, "hits": [ { "_index": "range_index", "_type": "my_type", "_id": "1", "_score": 1, "_source": { "expected_attendees": { "gte": 10, "lte": 20 }, "time_frame": { "gte": "2015-10-31 12:00:00", "lte": "2015-11-01" } } } ] }}
1.11 nested类型
nested嵌套类型是object中的一个特例,可以让array类型的Object独立索引和查询。 使用Object类型有时会出现问题,比如文档 my_index/my_type/1的结构如下:
POST my_index/my_type/1{ "group" : "fans", "user" : [ { "first" : "John", "last" : "Smith" }, { "first" : "Alice", "last" : "White" } ]}
user字段会被动态添加为Object类型。
最后会被转换为以下平整的形式:{ "group" : "fans", "user.first" : [ "alice", "john" ], "user.last" : [ "smith", "white" ]}
user.first和user.last会被平铺为多值字段,Alice和White之间的关联关系会消失。上面的文档会不正确的匹配以下查询(虽然能搜索到,实际上不存在Alice Smith):
POST my_index/_search{ "query": { "bool": { "must": [ { "match": { "user.first": "Alice" }}, { "match": { "user.last": "Smith" }} ] } }}
使用nested字段类型解决Object类型的不足:
PUT my_index{ "mappings": { "my_type": { "properties": { "user": { "type": "nested" } } } }}POST my_index/my_type/1{ "group" : "fans", "user" : [ { "first" : "John", "last" : "Smith" }, { "first" : "Alice", "last" : "White" } ]}POST my_index/_search{ "query": { "nested": { "path": "user", "query": { "bool": { "must": [ { "match": { "user.first": "Alice" }}, { "match": { "user.last": "Smith" }} ] } } } }}POST my_index/_search{ "query": { "nested": { "path": "user", "query": { "bool": { "must": [ { "match": { "user.first": "Alice" }}, { "match": { "user.last": "White" }} ] } }, "inner_hits": { "highlight": { "fields": { "user.first": {} } } } } }}
1.12token_count类型
token_count用于统计词频:
PUT my_index{ "mappings": { "my_type": { "properties": { "name": { "type": "text", "fields": { "length": { "type": "token_count", "analyzer": "standard" } } } } } }}POST my_index/my_type/1{ "name": "John Smith" }POST my_index/my_type/2{ "name": "Rachel Alice Williams" }POST my_index/_search{ "query": { "term": { "name.length": 3 } }}
1.13 geo point 类型
地理位置信息类型用于存储地理位置信息的经纬度:
PUT my_index{ "mappings": { "my_type": { "properties": { "location": { "type": "geo_point" } } } }}POST my_index/my_type/1{ "text": "Geo-point as an object", "location": { "lat": 41.12, "lon": -71.34 }}POST my_index/my_type/2{ "text": "Geo-point as a string", "location": "41.12,-71.34" }POST my_index/my_type/3{ "text": "Geo-point as a geohash", "location": "drm3btev3e86" }POST my_index/my_type/4{ "text": "Geo-point as an array", "location": [ -71.34, 41.12 ] }POST my_index/_search{ "query": { "geo_bounding_box": { "location": { "top_left": { "lat": 42, "lon": -72 }, "bottom_right": { "lat": 40, "lon": -74 } } } }}