What’s new in the Elastic Stack - 7.x Edition

A presentation at Elastic Brown Bag Lunch in April 2019 in Munich, Germany by Alexander Reelsen

Slide 1

Slide 1

What’s new in the Elastic Stack? Alexander Reelsen alex@elastic.co @spinscale

Slide 2

Slide 2

Agenda ‣ What’s new in 6.x? ‣ What’s new in 7.x? ‣ Q&A

Slide 3

Slide 3

What’s new in 6.x?

Slide 4

Slide 4

Elasticsearch 6.x 6.0 Zero downtime upgrades Cross cluster search Sequence id based recoveries Index sorting range based datatypes 6.1 Index splitting 6.2 Rank evaluation API 6.3 Rollup Java 10 support 6.4 Reloadable secure settings Field Aliases Korean analyzer 6.5 G1GC support, Java 11 Minimal snapshots (50% less) 6.6 Frozen indices BKD backed geoshapes 6.7 CCR SQL ILM Upgrade Assistant 6.8 ECK (Elastic for Kubernetes) Move security features into basic

Slide 5

Slide 5

Elasticsearch 6.7 - CCR Cross Cluster Replication + UI Replicate data across data centers Leader index requires soft deletes to be set Follower index configures cluster and leader index Follower index can also be a pattern

Slide 6

Slide 6

Elasticsearch 6.7 - SQL REST API CLI JDBC ODBC

Slide 7

Slide 7

Elasticsearch 6.7 - ILM PUT _ilm/policy/full_policy { “policy”: { “phases”: { “hot”: { “actions”: { “rollover”: { “max_age”: “7d”, “max_size”: “50G” } } }, “warm”: { “min_age”: “30d”, “actions”: { “forcemerge”: { “max_num_segments”: 1 }, “shrink”: { “number_of_shards”: 1 }, “allocate”: { “number_of_replicas”: 2 } } }, } } } “cold”: { “min_age”: “60d”, “actions”: { “allocate”: { “require”: { “type”: “cold” } } } }, “delete”: { “min_age”: “90d”, “actions”: { “delete”: {} } }

Slide 8

Slide 8

Kibana 6.7 Maps Uptime Canvas Infrastructure is GA Logs is GA Localization Upgrade Assistant

Slide 9

Slide 9

Kibana 6.7 - Maps

Slide 10

Slide 10

Kibana 6.7 - Uptime

Slide 11

Slide 11

Kibana 6.7 - Infrastructure UI

Slide 12

Slide 12

Kibana 6.7 - Logs UI

Slide 13

Slide 13

Elasticsearch 6.8 - Security & ECK Native & file realm now free TLS now free Elastic Cloud on Kubernetes (Operator)

Slide 14

Slide 14

What’s new in 7.0?

Slide 15

Slide 15

Kibana 7.0 Elastic UI Library KQL by default (+ autocomplete) Responsive dashboards

Slide 16

Slide 16

Kibana 7.0

Slide 17

Slide 17

Kibana 7.0

Slide 18

Slide 18

Ingest 7.0 Beats: ECS/ILM integration Filebeat: zeek, santa, netflow support, encodings Auditbeat: system module Metricbeat Elasticsearch, Logstash & Kibana modules NATS, MSSQL, EC2, CouchDB Logstash Native Java Plugins Java execution engine on by default

Slide 19

Slide 19

Stack 7.0 ECS ES-Hadoop Kerberos Integration Java 8 required Cascading support removed Clients Rewritten JavaScript client New Go Client Java: High Level REST Client

Slide 20

Slide 20

Elasticsearch 7.0 Mapping types Searches date_nanos faster top-k retrieval rank_feature adaptive replica selection enabled by rank_features default dense_vector No refresh on idle shards (faster indexing) sparse_vector Queries intervals query script_score query (supercedes function_score) rank_feature query Others Rewritten cluster coordination Lucene 8 High Level REST client Docker part of the build Single shard index by default Rewritten memory circuit breaker Type is optional now TLS 1.3 Ships with OpenJDK

Slide 21

Slide 21

Elasticsearch 7.0 - Rewritten cluster coordination Gone: discovery.zen.minimum_master_nodes Sub-second master election Simplifying growing/shrinking of cluster Cluster bootstrapping/Voting configuration Rolling upgrades from 6 to 7 work Formal verification via TLA+

Slide 22

Slide 22

Elasticsearch 7.0 - Faster top-k retrieval While querying, exclude documents that cannot make it into the top hits Search: Elasticsearch OR Kibana Term 1: Elasticsearch (max score 5.0) Term 2: Kibana (max score 3.0) If first k results all have a score > 3.0, then documents only containing Kibana can be ignored Number of potential candidates is reduced while running

Slide 23

Slide 23

Elasticsearch 7.0 - Faster top-k retrieval Scores may no longer be negative Total hits are not counted by default

Slide 24

Slide 24

Elasticsearch - Adaptive Replica Selection Problem: Coordinating node round robins requests between data nodes Underperforming node harms the whole cluster Adaptive replica selection Response time of previous requests Search execution time of the data node Queue size of the search threadpool on the data node

Slide 25

Slide 25

Elasticsearch - Rank feature New rank_feature type New rank_feature query Index numbers than can be used to boost queries Modifies the scoring formula to in-/decrease score based on the value of the document Query functions: saturation, logarithm, sigmoid

Slide 26

Slide 26

Elasticsearch - Rank feature PUT test { “mappings”: { “properties”: { “pagerank”: { “type”: “rank_feature” }, “url_length”: { “type”: “rank_feature”, “positive_score_impact”: false } } } }

Slide 27

Slide 27

Elasticsearch - Rank feature PUT test/_doc/1 { “url”: “http://en.wikipedia.org/wiki/2016_Summer_Olympics”, “content”: “Rio 2016”, “pagerank”: 50.3, “url_length”: 42, } PUT test/_doc/2 { “url”: “http://en.wikipedia.org/wiki/2016_Brazilian_Grand_Prix”, “content”: “Formula One motor race held on 13 November 2016 at the Autódromo José Carlos Pace in São Paulo, Brazil”, “pagerank”: 50.3, “url_length”: 47, } PUT test/doc/3 { “url”: “http://en.wikipedia.org/wiki/Deadpool(film)”, “content”: “Deadpool is a 2016 American superhero film”, “pagerank”: 50.3, “url_length”: 37, }

Slide 28

Slide 28

Elasticsearch - Rank feature GET test/_search { “query”: { “bool”: { “must”: [ { “match”: { “content”: “2016” } } ], “should”: [ { “rank_feature”: { “field”: “pagerank” } }, { “rank_feature”: { “field”: “url_length”, “boost”: 0.1 } } ] } } }

Slide 29

Slide 29

Elasticsearch - Rank features New rank_features type Key/Value pairs instead of single values

Slide 30

Slide 30

Elasticsearch - Rank feature PUT test/_doc/1 { “url”: “http://en.wikipedia.org/wiki/2016_Summer_Olympics”, “content”: “Rio 2016”, “pagerank”: 50.3, “url_length”: 42, “topics”: { “sports”: 50, “brazil”: 30 } }

Slide 31

Slide 31

Elasticsearch - Rank feature GET test/_search { “query”: { “bool”: { “must”: [ { “match”: { “content”: “2016” } } ], “should”: [ { “rank_feature”: { “field”: “pagerank” } }, { “rank_feature”: { “field”: “url_length”, “boost”: 0.1 } }, { “rank_feature”: { “field”: “topics.sports”, “boost”: 0.4 } } ] } } }

Slide 32

Slide 32

Elasticsearch - Rank feature GET test/_search { “query”: { “rank_feature”: { “field”: “pagerank”, “saturation”: { “pivot”: 8 } } } }

Slide 33

Slide 33

Elasticsearch - Rank feature Limitations Field values must be single-valued and positive rank_feature fields do not support querying, sorting or aggregating Field values are not exact (relative error of about 0.4%) Uses top-k faster retrieval mechanism for speed (hit count!)

Slide 34

Slide 34

Elasticsearch - script_score query replaces the function_score query full painless scripting predefined functions: saturation, sigmoid, randomScore, decay(Numeric|Date|Geo)(Linear|Exp|Gauss)

Slide 35

Slide 35

Elasticsearch - Nanosecond support new datatype: date_nanos stores nanoseconds since the epoch (reduced range!) internally: moved from Joda-Time to java time Aggregations: millisecond resolution! Beware: Upgrade path from 6.x!

Slide 36

Slide 36

What’s new in 7.1?

Slide 37

Slide 37

Elasticsearch 7.1 security features moved into basic ECK (Elastic Cloud on Kubernetes/K8s)

Slide 38

Slide 38

What’s new in 7.2?

Slide 39

Slide 39

Elasticsearch 7.2 search_as_you_type mapping distance_feature query Replication of closed indices dense/sparse_vector datatype

Slide 40

Slide 40

Elasticsearch - vector datatypes dense_vector: stores dense vectors of float values, supplied as an array sparse_vector: stores sparse vectors of float values, supplied as map Use-Case: User centric recommendation based on past decisions

Slide 41

Slide 41

Elasticsearch - vector datatypes PUT my_index { “mappings”: { “properties”: { “sparse”: { “type”: “sparse_vector” }, “dense”: { “type”: “dense_vector” }, “my_text” : { “type” : “keyword” } } } }

Slide 42

Slide 42

Elasticsearch - vector datatypes PUT my_index/_doc/1 { “my_text” : “text1”, “dense” : [0.5, 10, 6] “sparse” : {“1”: 0.5, “5”: -0.5, “100”: 1} } PUT my_index/_doc/2 { “my_text” : “text2”, “dense” : [-0.5, 10, 10, 4] “sparse” : {“103”: 0.5, “4”: -0.5, } “5”: 1, “11” : 1.2}

Slide 43

Slide 43

Elasticsearch - script_score query sparse/dense functions: cosineSimilarity(Sparse) dotProduct(Sparse)

Slide 44

Slide 44

Discussion … ask all the things!

Slide 45

Slide 45

Links Elasticsearch https://www.elastic.co/blog/easier-relevance-tuning-elasticsearch-7-0 https://www.elastic.co/blog/faster-retrieval-of-top-hits-in-elasticsearch-with-block-max-wand https://www.elastic.co/blog/creating-frozen-indices-with-the-elasticsearch-freeze-index-api https://www.elastic.co/blog/follow-the-leader-an-introduction-to-cross-cluster-replication-in-elasticsearch https://www.elastic.co/blog/moving-from-types-to-typeless-apis-in-elasticsearch-7-0 https://www.elastic.co/blog/improving-node-resiliency-with-the-real-memory-circuit-breaker https://www.elastic.co/blog/a-new-era-for-cluster-coordination-in-elasticsearch https://www.elastic.co/elasticon/conf/2018/sf/reliable-by-design-applying-formal-methods-to-distributedsystems https://github.com/elastic/elasticsearch-formal-models C3: https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-suresh.pdf Beats https://www.elastic.co/blog/introducing-auditbeat-system-module

Slide 46

Slide 46

Links https://www.elastic.co/blog/security-forelasticsearch-is-now-free https://www.elastic.co/blog/introducing-elasticcloud-on-kubernetes-the-elasticsearch-operator-andbeyond