博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
ElasticSearch5.x实践_day05_01_Mapping_Field datatype
阅读量:6042 次
发布时间:2019-06-20

本文共 11000 字,大约阅读时间需要 36 分钟。

hot3.png

一、Field datatype(字段数据类型)

1.1string类型

ELasticsearch 5.X之后的字段类型不再支持string,由text或keyword取代。 如果仍使用string,会给出警告。

PUT my_index{  "mappings": {    "my_type": {      "properties": {        "title": {          "type":  "string"        }      }    }  }}
#! Deprecation: The [string] field is deprecated, please use [text] or [keyword] instead on [title]{  "acknowledged": true,  "shards_acknowledged": true}

1.2 text类型

text取代了string,当一个字段是要被全文搜索的,比如Email内容、产品描述,应该使用text类型。设置text类型以后,字段内容会被分析,在生成倒排索引以前,字符串会被分析器分成一个一个词项。text类型的字段不用于排序,很少用于聚合(termsAggregation除外)。

把full_name字段设为text类型的Mapping如下:

PUT my_index{  "mappings": {    "my_type": {      "properties": {        "full_name": {          "type":  "text"        }      }    }  }}

1.3 keyword类型

keyword类型适用于索引结构化的字段,比如email地址、主机名、状态码和标签。如果字段需要进行过滤(比如查找已发布博客中status属性为published的文章)、排序、聚合。keyword类型的字段只能通过精确值搜索到。

1.4 数字类型

对于数字类型,ELasticsearch支持以下几种:

类型 取值范围
long -2^63至2^63-1
integer -2^31至2^31-1
short -32,768至32768
byte -128至127
double 64位双精度IEEE 754浮点类型
float 32位单精度IEEE 754浮点类型
half_float 16位半精度IEEE 754浮点类型
scaled_float 缩放类型的的浮点数(比如价格只需要精确到分,price为57.34的字段缩放因子为100,存起来就是5734)  scaled缩放 [skeɪld]

对于float、half_float和scaled_float,-0.0和+0.0是不同的值,使用term查询查找-0.0不会匹配+0.0,同样range查询中上边界是-0.0不会匹配+0.0,下边界是+0.0不会匹配-0.0。

对于数字类型的数据,选择以上数据类型的注意事项:

  1. 在满足需求的情况下,尽可能选择范围小的数据类型。比如,某个字段的取值最大值不会超过100,那么选择byte类型即可。迄今为止吉尼斯记录的人类的年龄的最大值为134岁,对于年龄字段,short足矣。字段的长度越短,索引和搜索的效率越高。
  2. 优先考虑使用带缩放因子的浮点类型。

例子:

POST my_index{  "mappings": {    "my_type": {      "properties": {        "number_of_bytes": {          "type": "integer"        },        "time_in_seconds": {          "type": "float"        },        "price": {          "type": "scaled_float",          "scaling_factor": 100        }      }    }  }}

1.5 Object类型

JSON天生具有层级关系,文档会包含嵌套的对象:

POST my_index/my_type/1{   "region": "US",  "manager": {     "age":     30,    "name": {       "first": "John",      "last":  "Smith"    }  }}

上面的文档中,整体是一个JSON,JSON中包含一个manager,manager又包含一个name。最终,文档会被索引成一平的key-value对:

{  "region":             "US",  "manager.age":        30,  "manager.name.first": "John",  "manager.name.last":  "Smith"}

上面文档结构的Mapping如下:

PUT my_index{  "mappings": {    "my_type": {       "properties": {        "region": {          "type": "keyword"        },        "manager": {           "properties": {            "age":  { "type": "integer" },            "name": {               "properties": {                "first": { "type": "text" },                "last":  { "type": "text" }              }            }          }        }      }    }  }}

1.6 date类型

JSON中没有日期类型,所以在ELasticsearch中,日期类型可以是以下几种:

  1. 日期格式的字符串:e.g. “2015-01-01” or “2015/01/01 12:10:30”.
  2. long类型的毫秒数( milliseconds-since-the-epoch)
  3. integer的秒数(seconds-since-the-epoch)

日期格式可以自定义,如果没有自定义,默认格式如下:

"strict_date_optional_time||epoch_millis"
PUT my_index{  "mappings": {    "my_type": {      "properties": {        "date": {          "type": "date"         }      }    }  }}POST my_index/my_type/1{ "date": "2015-01-01" } POST my_index/my_type/2{ "date": "2015-01-01T12:10:30Z" } POST my_index/my_type/3{ "date": 1420070400001 } POST my_index/_search{  "sort": { "date": "asc"} }

查看三个日期类型:

{  "took": 0,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "failed": 0  },  "hits": {    "total": 3,    "max_score": 1,    "hits": [      {        "_index": "my_index",        "_type": "my_type",        "_id": "2",        "_score": 1,        "_source": {          "date": "2015-01-01T12:10:30Z"        }      },      {        "_index": "my_index",        "_type": "my_type",        "_id": "1",        "_score": 1,        "_source": {          "date": "2015-01-01"        }      },      {        "_index": "my_index",        "_type": "my_type",        "_id": "3",        "_score": 1,        "_source": {          "date": 1420070400001        }      }    ]  }}

排序结果:

{  "took": 2,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "failed": 0  },  "hits": {    "total": 3,    "max_score": null,    "hits": [      {        "_index": "my_index",        "_type": "my_type",        "_id": "1",        "_score": null,        "_source": {          "date": "2015-01-01"        },        "sort": [          1420070400000        ]      },      {        "_index": "my_index",        "_type": "my_type",        "_id": "3",        "_score": null,        "_source": {          "date": 1420070400001        },        "sort": [          1420070400001        ]      },      {        "_index": "my_index",        "_type": "my_type",        "_id": "2",        "_score": null,        "_source": {          "date": "2015-01-01T12:10:30Z"        },        "sort": [          1420114230000        ]      }    ]  }}

1.7 Array类型

ELasticsearch没有专用的数组类型,默认情况下任何字段都可以包含一个或者多个值,但是一个数组中的值要是同一种类型。例如:

  1. 字符数组: [ “one”, “two” ]
  2. 整型数组:[1,3]
  3. 嵌套数组:[1,[2,3]],等价于[1,2,3]
  4. 对象数组:[ { “name”: “Mary”, “age”: 12 }, { “name”: “John”, “age”: 10 }]

注意事项:

  • 动态添加数据时,数组的第一个值的类型决定整个数组的类型
  • 混合数组类型是不支持的,比如:[1,”abc”]
  • 数组可以包含null值,空数组[ ]会被当做missing field对待。

1.8 binary类型

binary类型接受base64编码的字符串,默认不存储也不可搜索。

PUT my_index{  "mappings": {    "my_type": {      "properties": {        "name": {          "type": "text"        },        "blob": {          "type": "binary"        }      }    }  }}POST my_index/my_type/1{  "name": "Some binary blob",  "blob": "U29tZSBiaW5hcnkgYmxvYg==" }

搜索blog字段:

POST my_index/_search{  "query": {    "match": {      "blob": "test"     }  }}返回结果:{  "error": {    "root_cause": [      {        "type": "query_shard_exception",        "reason": "Binary fields do not support searching",        "index_uuid": "fgA7UM5XSS-56JO4F4fYug",        "index": "my_index"      }    ],    "type": "search_phase_execution_exception",    "reason": "all shards failed",    "phase": "query",    "grouped": true,    "failed_shards": [      {        "shard": 0,        "index": "my_index",        "node": "3dQd1RRVTMiKdTckM68nPQ",        "reason": {          "type": "query_shard_exception",          "reason": "Binary fields do not support searching",          "index_uuid": "fgA7UM5XSS-56JO4F4fYug",          "index": "my_index"        }      }    ]  },  "status": 400}

Base64加密、解码工具:

1.9 ip类型

ip类型的字段用于存储IPV4或者IPV6的地址。

PUT my_index{  "mappings": {    "my_type": {      "properties": {        "ip_addr": {          "type": "ip"        }      }    }  }}POST my_index/my_type/1{  "ip_addr": "192.168.1.1"}POST my_index/_search{  "query": {    "term": {      "ip_addr": "192.168.0.0"    }  }}

1.10 range类型

range类型支持以下几种:

类型 范围
integer_range -2^31至2^31-1
float_range 32-bit IEEE 754
long_range -2^63至2^63-1
double_range 64-bit IEEE 754
date_range 64位整数,毫秒计时

range类型的使用场景:比如前端的时间选择表单、年龄范围选择表单等。

例子:

PUT range_index{  "mappings": {    "my_type": {      "properties": {        "expected_attendees": {          "type": "integer_range"        },        "time_frame": {          "type": "date_range",           "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"        }      }    }  }}POST range_index/my_type/1{  "expected_attendees" : {     "gte" : 10,    "lte" : 20  },  "time_frame" : {     "gte" : "2015-10-31 12:00:00",     "lte" : "2015-11-01"  }}

上面代码创建了一个range_index索引,expected_attendees的人数为10到20,时间是2015-10-31 12:00:00至2015-11-01。

查询:

"query" : {

    "range" : {
      "time_frame" : {
        "gte" : "2015-08-01",
        "lte" : "2015-12-01",
        "relation" : "within"
      }
    }
  }

POST range_index/_search{  "query" : {    "range" : {      "time_frame" : {         "gte" : "2015-08-01",        "lte" : "2015-12-01",        "relation" : "within"       }    }  }}

查询结果:

{  "took": 2,  "timed_out": false,  "_shards": {    "total": 5,    "successful": 5,    "failed": 0  },  "hits": {    "total": 1,    "max_score": 1,    "hits": [      {        "_index": "range_index",        "_type": "my_type",        "_id": "1",        "_score": 1,        "_source": {          "expected_attendees": {            "gte": 10,            "lte": 20          },          "time_frame": {            "gte": "2015-10-31 12:00:00",            "lte": "2015-11-01"          }        }      }    ]  }}

1.11 nested类型

nested嵌套类型是object中的一个特例,可以让array类型的Object独立索引和查询。 使用Object类型有时会出现问题,比如文档 my_index/my_type/1的结构如下:

POST my_index/my_type/1{  "group" : "fans",  "user" : [     {      "first" : "John",      "last" :  "Smith"    },    {      "first" : "Alice",      "last" :  "White"    }  ]}

user字段会被动态添加为Object类型。

最后会被转换为以下平整的形式:

{  "group" :        "fans",  "user.first" : [ "alice", "john" ],  "user.last" :  [ "smith", "white" ]}

user.first和user.last会被平铺为多值字段,Alice和White之间的关联关系会消失。上面的文档会不正确的匹配以下查询(虽然能搜索到,实际上不存在Alice Smith):

POST my_index/_search{  "query": {    "bool": {      "must": [        { "match": { "user.first": "Alice" }},        { "match": { "user.last":  "Smith" }}      ]    }  }}

使用nested字段类型解决Object类型的不足:

PUT my_index{  "mappings": {    "my_type": {      "properties": {        "user": {          "type": "nested"         }      }    }  }}POST my_index/my_type/1{  "group" : "fans",  "user" : [    {      "first" : "John",      "last" :  "Smith"    },    {      "first" : "Alice",      "last" :  "White"    }  ]}POST my_index/_search{  "query": {    "nested": {      "path": "user",      "query": {        "bool": {          "must": [            { "match": { "user.first": "Alice" }},            { "match": { "user.last":  "Smith" }}           ]        }      }    }  }}POST my_index/_search{  "query": {    "nested": {      "path": "user",      "query": {        "bool": {          "must": [            { "match": { "user.first": "Alice" }},            { "match": { "user.last":  "White" }}           ]        }      },      "inner_hits": {         "highlight": {          "fields": {            "user.first": {}          }        }      }    }  }}

1.12token_count类型

token_count用于统计词频:

PUT my_index{  "mappings": {    "my_type": {      "properties": {        "name": {           "type": "text",          "fields": {            "length": {               "type":     "token_count",              "analyzer": "standard"            }          }        }      }    }  }}POST my_index/my_type/1{ "name": "John Smith" }POST my_index/my_type/2{ "name": "Rachel Alice Williams" }POST my_index/_search{  "query": {    "term": {      "name.length": 3     }  }}

1.13 geo point 类型

地理位置信息类型用于存储地理位置信息的经纬度:

PUT my_index{  "mappings": {    "my_type": {      "properties": {        "location": {          "type": "geo_point"        }      }    }  }}POST my_index/my_type/1{  "text": "Geo-point as an object",  "location": {     "lat": 41.12,    "lon": -71.34  }}POST my_index/my_type/2{  "text": "Geo-point as a string",  "location": "41.12,-71.34" }POST my_index/my_type/3{  "text": "Geo-point as a geohash",  "location": "drm3btev3e86" }POST my_index/my_type/4{  "text": "Geo-point as an array",  "location": [ -71.34, 41.12 ] }POST my_index/_search{  "query": {    "geo_bounding_box": {       "location": {        "top_left": {          "lat": 42,          "lon": -72        },        "bottom_right": {          "lat": 40,          "lon": -74        }      }    }  }}

 

 

 

 

 

 

 

 

转载于:https://my.oschina.net/LucasZhu/blog/1491253

你可能感兴趣的文章
Unity3D工程源码目录
查看>>
杀死进程命令
查看>>
cookie 和session 的区别详解
查看>>
Mongodb对集合(表)和数据的CRUD操作
查看>>
Target runtime Apache Tomcat is not defined.错误解决方法
查看>>
VC++ 监视文件(夹)
查看>>
【转】keyCode对照表及JS监听组合按键
查看>>
[Java开发之路](14)反射机制
查看>>
mac gentoo-prefix安装git svn
查看>>
浅尝异步IO
查看>>
C - Train Problem II——(HDU 1023 Catalan 数)
查看>>
Speak loudly
查看>>
iOS-在项目中引入RSA算法
查看>>
[译] 听说你想学 React.js ?
查看>>
gulp压缩合并js与css
查看>>
块级、内联、内联块级
查看>>
Predicate
查看>>
[面试题记录01]实现一个function sum达到一下目的
查看>>
这个季节的忧伤,点到为止
查看>>
mysql通过配置文件进行优化
查看>>