Getting started with Elastic Stack

A presentation at Kochi Elastic Meetup in July 2018 in Kochi, Kerala, India by Aravind Putrevu

Slide 1

Slide 1

! 1 Aravind Putrevu Developer | Evangelist @aravindputrevu | aravindputrevu.in Getting Started with Elastic Stack

Slide 2

Slide 2

! 2 Elastic Stack 100% open source

Slide 3

Slide 3

Single install Extensions for the Elastic Stack Subscription pricing X-Pack ! 3 Security Alerting Monitoring Reporting Graph Machine Learning

Slide 4

Slide 4

Te c h Finance Te l c o Consumer ! 4 Enterprise Customers in Every Industry

Slide 5

Slide 5

! 5 Massive Startup Adoption

Slide 6

Slide 6

Search and analytics, it all started here ! 6 More than 60% of our customers have a search or analytics use case

Slide 7

Slide 7

! 7

Slide 8

Slide 8

! 8

Slide 9

Slide 9

! 9

Slide 10

Slide 10

Logs Logs Logs, 
 many devices, 
 many systems More than 40% of our 
 customers use our products 
 for operational log analysis ! 10

Slide 11

Slide 11

We collect more than 1.2 TB logs every day from our infrastructure, web servers, and applications. ! 11

Slide 12

Slide 12

75% of our customers use our products for multiple use cases SEARCH SECURITY CUSTOM APPS METRICS OPERATIONAL 
 ANALYTICS LOG ANALYTICS ! 12

Slide 13

Slide 13

! 13 1,000+ developers use the Elastic Stack for use cases from trade tracking to creating new HR and compliance apps.

Slide 14

Slide 14

We send from Mars more than

30K messages 100K documents 4x a day for operational, telemetry, anomaly resolution, and log analysis. ! 14

Slide 15

Slide 15

DB-Engines Ranking of Search Engines ! 15 https://db-engines.com/en/ranking/search+engine

Slide 16

Slide 16

! 16

Elastic Cloud Kibana User Interface Elasticsearch Store, Index, 
 & Analyze Ingest Logstash Beats Elastic Stack Elastic: Product Portfolio Security Monitoring Alerting Graph X-Pack + Reporting Machine Learning

Slide 17

Slide 17

Elasticsearch Heart of the Elastic Stack ! 17 Distributed, Scalable High-availability Multi-tenancy Developer Friendly Real-time, Full-text Search Aggregations

Slide 18

Slide 18

! 18 Kibana Window into the Elastic Stack Visualize and analyze Geospatial Customize and Share Reports Graph Exploration UX to secure and manage the Elastic Stack Build Custom Apps

Slide 19

Slide 19

! 19

Slide 20

Slide 20

Slide 21

Slide 21

! 21 Beats Window into the Elastic Stack Ship data from the source Ship and centralize in Elasticsearch Ship to Logstash for transformation and parsing Ship to Elastic Cloud Libbeat: API framework to build custom beats 30+ community Beats

Slide 22

Slide 22

! 22 FILEBEAT Log Files METRICBEAT Metrics PACKETBEAT Network Data WINGLOGBEAT Window Events More than 30 community Beats and growing … Apachebeat, dockbeat, httpbeat, mysqlbeat, nginxbeat, redis beats, twitterbeat, and more HEARTBEAT Uptime Monitoring

Slide 23

Slide 23

! 23 Logstash Data processing pipeline Ingest data of all shapes, sizes, and sources Parse and dynamically transform data Transport data to any output Secure and encrypt data inputs Build your own pipeline More than 200+ plugins

Slide 24

Slide 24

Popular Data Sources Analysis Archiving Monitoring Alerting MQs AW S Web Apps IoT DBs

Slide 25

Slide 25

Beats Log Files Metrics Wire Data Datastore Web APIs Social Sensors Kafka Redis Messaging Queue Logstash ES-Hadoop Elasticsearch Kibana Nodes (X) Master Nodes (3) Ingest Nodes (X) Data Nodes – Hot (X) Data Notes – Warm (X) Instances (X) your{beat} X-Pack X-Pack Custom UI LDAP Authentication AD Notification SSO Hadoop Ecosystem System Architecture

Slide 26

Slide 26

! 26 X-Pack Extensions for the Elastic Stack Security Alerting Monitoring Graph Reporting Machine Learning

Slide 27

Slide 27

! 27 • Uses relevance capabilities of Elasticsearch

• Discover linkages and connections

• Leverage API and UI-drive tool

A NEW WAY TO EXPLORE DATA EXTEND TO NEW USE CASES • Fraud discovery
• Recommendations
• Cyber security
• Behavioral analyses Graph X-Pack

Slide 28

Slide 28

! 28

Slide 29

Slide 29

! 29 • Automatically detect anomalies

• Advanced correlation and categorization

• Identify root cause(s)

• Expose early warning signs
UNSUPERVISED MACHINE LEARNING ENABLE NEW USE CASES • Analyze time series data
• Detect Anomalies
• Expand security, IT Ops, fraud, finance, and many more use cases
Machine Learning X-Pack

Slide 30

Slide 30

! 30

Slide 31

Slide 31

Elastic Cloud Hosted Elasticsearch & Kibana Includes X-Pack features Free 14-day trial ! 31

Slide 32

Slide 32

Elastic Cloud Enterprise Provision and manage multiple Elastic Stack environments; Expose logging as a service to your entire organization ! 32

Slide 33

Slide 33

Slide 34

Slide 34

Elasticsearch Heart of the Elastic Stack ! 34 Distributed, Scalable High-availability Multi-tenancy Developer Friendly Real-time, Full-text Search Aggregations

Slide 35

Slide 35

! 35 Elasticsearch is… an open source , distributed, scalable, highly available, document-oriented, RESTful, full text search engine with real-time search and analytics capabilities Apache 2.0 License https://www.apache.org/licenses/LICENSE-2.0

Slide 36

Slide 36

! 36 Apache Lucene • Created by -   Doug Cutting • Written in - Java • Apache Solr, Elasticsearch

Slide 37

Slide 37

! 37 Elasticsearch is… An open source, distributed, scalable, highly available , document-oriented, RESTful, full text search engine with real-time search and analytics capabilities

Slide 38

Slide 38

! 38 Elasticsearch Cluster Data is partitioned into shards Each shard is a standalone Lucene index 1 0 2 3 4 Node running Elasticsearch process Shard

Slide 39

Slide 39

! 39 1 0 2 3 4 Elasticsearch Cluster shards are distributed across the nodes of a cluster Elasticsearch Cluster

Slide 40

Slide 40

! 40 2 0 1 4 3 Elasticsearch Cluster shards are distributed across the nodes of a cluster

Slide 41

Slide 41

! 41 2 0 3 4 1 0 4 1 3 2 Documents are replicated between a primary and its replicas Elasticsearch Cluster

Slide 42

Slide 42

! 42 2 0 3 4 1 0 4 1 3 2 If node fails, all of the data is still available on the other nodes Elasticsearch Cluster

Slide 43

Slide 43

! 43 2 0 3 4 1 0 4 1 3 2 4 0 Elasticsearch will promote a replica to a primary And start replicating unassigned replicas Elasticsearch Cluster

Slide 44

Slide 44

! 44 2 0 3 4 1 1 3 2 4 0 Elasticsearch Cluster

Slide 45

Slide 45

! 45 Cluster • Set using cluster.name:

• defaults to “elasticsearch” config/elasticsearch.yml cluster.name: elasticsearch bin/elasticsearch -E cluster.name=elasticsearch

Slide 46

Slide 46

! 46 Node • Every node should have a unique node.name • Nodes with the same cluster.name binds together. config/elasticsearch.yml node.name: "Node1" bin/elasticsearch -E node.name=Node1

Slide 47

Slide 47

! 47 Shard & Replica curl -XPUT "http://localhost:9200/books" -H 'Content-Type: application/json' -d' { "settings": { "number_of_shards": 5, "number_of_replicas": 1 } }' curl -XPUT "http://localhost:9200/books/_settings" -H 'Content-Type: application/json' -d' {
"number_of_replicas": 0 }'

Slide 48

Slide 48

! 48 Elasticsearch is… An open source, distributed, scalable, highly available, document-oriented , RESTful, full text search engine with real-time search and analytics capabilities Source: http://json.org

Slide 49

Slide 49

! 49 Elasticsearch is… An open source, distributed, scalable, highly available, document-oriented, RESTful , full text search engine with real-time search and analytics capabilities $ curl -XPUT http://localhost:9200/books/book/1 -d ' { "title" : "Elasticsearch Guide", "author" : "Kim", "date" : "2014-05-01", "pages" : 250 }’ -H 'Content-Type: application/json' {"_index":"books","_type":"book","_id":"1","_version":1,"created":true} host port index type document id method

Slide 50

Slide 50

! 50 curl -XPUT 'http://localhost:9200/books/book/1' -d ' { "title": "Romeo and Juliet", "author": "William Shakespeare", "category":"Tragedies", "written": "1562-12-01T20:40:00", "pages" : 125 }' -H 'Content-Type: application/json' curl -XPUT 'http://localhost:9200/books/book/2' -d ' { "title" : "Hamlet", "author": "William Shakespeare", "category":"Tragedies", "written": "1599-06-01T12:34:00", "pages" : 172 }' -H 'Content-Type: application/json' curl -XPUT 'http://localhost:9200/books/book/3' -d ' { "title": "The Prince and the Pauper", "author": "Mark Twain", "category":"Children book", "written": "1881-08-01T10:34:00", "pages" : 79}' -H 'Content-Type: application/json' Indexing a Document Use PUT command to index document.

Slide 51

Slide 51

! 51 curl -XGET 'http://localhost:9200/books/book/1?pretty' -H 'Content-Type: application/json' Retrieving a Document Use GET to retrieve an indexed document { "_index" : "books", "_type" : "book", "_id" : "1", "_version" : 3, "found" : true, "_source" : { "title" : "Romeo and Juliet", "author" : "William Shakespeare", "category" : "Tragedies", "written" : "1562-12-01T20:40:00", "pages" : 125 } }

Slide 52

Slide 52

! 52 curl -XDELETE localhost:9200/books/book/1 -H 'Content-Type: application/json' curl -XDELETE localhost:9200/books -H 'Content-Type: application/json' Deleting a Document Use DELETE command to delete index or document level.

Slide 53

Slide 53

! 53 curl -XPOST "http://localhost:9200/books/book/_bulk" -d ' {"index":{"_id":"1"}} {"title":"Romeo and Juliet","author":"William Shakespeare","category":"Tragedies","written":"1562-12-01T20:40:00","pages":125} {"index":{"_id":"2"}} {"title":"Hamlet","author":"William Shakespeare","category":"Tragedies","written":"1599-06-01T12:34:00","pages":172} {"index":{"_id":"3"}} {"title":"The Prince and the Pauper","author":"Mark Twain","category":"Children book","written":"1881-08-01T10:34:00","pages":79} ' -H 'Content-Type: application/json' Use _bulk API to run multiple commands Bulk API

Slide 54

Slide 54

! 54 curl -XGET "http://localhost:9200/books/_search?pretty=true" Search documents Use _search API for search index. { … }, "hits" : { "total" : 3, "max_score" : 1.0, "hits" : [ { … "_source": { "title": "Romeo and Juliet", "author": "William Shakespeare", "category":"Tragedies", "written": "1562-12-01T20:40:00", "pages" : 125 } }, …

Slide 55

Slide 55

! 55 URI Search curl -XGET "http://localhost:9200/books/_search?pretty=true" curl -XGET "http://localhost:9200/books/_search?q=author:william&pretty=true" curl -XGET "http://localhost:9200/books/_search?q=william&df=author&pretty=true" curl -XGET "http://localhost:9200/books/_search? q=william%20AND%20romeo&pretty=true"

Slide 56

Slide 56

! 56 Request Body Search curl -XGET "http://localhost:9200/books/_search?pretty=true" -d' { "query": { "match": { "author": "william" } } }' -H 'Content-Type: application/json'

Slide 57

Slide 57

! 57 Elasticsearch is… An open source, distributed, scalable, highly available, document-oriented, RESTful, full text search engine with real-time search and analytics capabilities

Slide 58

Slide 58

! 58 RDBMS creates table DOC TEXT 1 The quick brown fox jumps over the lazy dog 2 Fast jumping rabbits

Slide 59

Slide 59

! 59 Elasticsearch creates inverted index TOKEN (TERM) DOC TOKEN (TERM) DOC Fast 2 jumps 1 The 1 lazy 1 brown 1 over 1 dog 1 quick 1 fox 1 rabbits 2 jumping 2 the 1

Slide 60

Slide 60

! 60 Real inverted index looks like this Elsticsearch runs TEXT ANALYSIS process during indexing time. TOKEN (TERM) DOC TOKEN (TERM) DOC brown 1 lazi 1 dog 1 over 1 fast 1 , 2 quick 1 , 2 fox 1 rabbit 2 jump 1 , 2

Slide 61

Slide 61

! 61 Tokenizing Mostly uses Whitespace Tokenizer TOKEN (TERM) TOKEN (TERM) TOKEN (TERM) TOKEN (TERM) Fast dog jumps quick The fox lazy rabbits brown jumping over the TEXT The quick brown fox jumps over the lazy dog Fast jumping rabbits

Slide 62

Slide 62

! 62 Token Filtering - lowercase TOKEN (TERM) DOC TOKEN (TERM) DOC Fast !

fast 2 jumps 1 The !

the 1 lazy 1 brown 1 over 1 dog 1 quick 1 fox 1 rabbits 2 jumping 2 the 1

Slide 63

Slide 63

! 63 Token Filtering – ascii folding TOKEN (TERM) DOC TOKEN (TERM) DOC brown 1 lazy 1 dog 1 over 1 fast 2 quick 1 fox 1 rabbits 2 jumping 2 the 1 jumps 1 the 1

Slide 64

Slide 64

! 64 Token Filtering – remove stopwords Stopwords:
a, an, are, at, be, but, by, do, for, i, no, the, to … TOKEN (TERM) DOC TOKEN (TERM) DOC brown 1 lazy 1 dog 1 over 1 fast 2 quick 1 fox 1 rabbits 2 jumping 2 the 1 jumps 1 the 1

Slide 65

Slide 65

! 65 Stemming Remove ~s, ~ing. Use Snowball Token Filter mostly TOKEN (TERM) DOC TOKEN (TERM) DOC brown 1 lazy !

lazi 1 dog 1 over 1 fast 2 quick 1 fox 1 rabbits !

rabbit 2 jumping !

jump 2 jumps !

jump 1

Slide 66

Slide 66

! 66 Inverted Index jumping, jumps became jump, so merge it into single index. TOKEN (TERM) DOC TOKEN (TERM) DOC brown 1 lazi 1 dog 1 over 1 fast 2 quick 1 fox 1 rabbit 2 jump 1 , 2

Slide 67

Slide 67

! 67 Synonym We can define Fast and Quick as synonym. TOKEN (TERM) DOC TOKEN (TERM) DOC brown 1 lazi 1 dog 1 over 1 fast 1 , 2 quick 1 , 2 fox 1 rabbit 2 jump 1 , 2

Slide 68

Slide 68

! 68 _analyze API We can simulated analyzer with _analyze API analyzer consists of single tokenizer and multiple token filters curl -XPOST "http://localhost:9200/_analyze?pretty" -d ' { "tokenizer": "whitespace", "filter": ["lowercase", "stop", "snowball"], "text": [ "The quick brown fox jumps over the lazy dog", "Fast jumping rabbits" ] }' -H 'Content-Type: application/json'

Slide 69

Slide 69

! 69 Searching Searching text will be analyzed as well The lazy rabbits lazi rabbit 1 DOC TEXT 1 The quick brown fox jumps over the lazy dog 2 Fast jumping rabbits 2

Slide 70

Slide 70

! 70 Termvectors API "terms": { " shakespeare ": { "term_freq": 1, "tokens": [ … ] }, " william ": { "term_freq": 1, "tokens": [ … ] } curl -XGET "http://localhost:9200/books/book/1/_termvectors?fields=author&pretty"

Slide 71

Slide 71

! 71 match query curl -XGET "http://localhost:9200/books/_search?pretty=true" -d' { "query": { " match ": { "author": " W illiam" } } }' -H 'Content-Type: application/json' Query is analyzed

Slide 72

Slide 72

! 72 term query curl -XGET "http://localhost:9200/books/_search?pretty=true" -d' { "query": { " term ": { "author": " W illiam" } } }' -H 'Content-Type: application/json' Query will be NOT analyzed

Slide 73

Slide 73

! 73 Mappings • Mapping can be created automatticaly. • Since 5.x text type data will be create keyword multi filed. curl -XGET "http://localhost:9200/books/_mappings?pretty" "properties" : { "author" : { "type" : " text ", "fields" : { "keyword" : { "type" : " keyword ", "ignore_above" : 256 }

Slide 74

Slide 74

! 74 Keyword • Raw data, which is not analyzed. "terms" : { " William Shakespeare " : { "term_freq" : 1, "tokens" : [ … ] } } curl -XGET "http://localhost:9200/books/book/1/_termvectors?fields=author.keyword&pretty"

Slide 75

Slide 75

! 75 Keyword • Can be queried with Term query. curl -XGET "http://localhost:9200/books/_search?pretty=true" -d' { "query": { " term ": { "author.keyword": " William Shakespeare " } } }' -H 'Content-Type: application/json'

Slide 76

Slide 76

! 76 Aggregation Search aggregation hits

Slide 77

Slide 77

! 77 Aggregation Can be used with query on _search API curl 'localhost:9200/_search' -d ’ {

"query" : { // query },

"aggregations" : { // or “aggs” “aggs_name” : { // a set of aggregation } } }'

Slide 78

Slide 78

! 78 Aggregation Can be used with query on _search API curl -XGET "http://localhost:9200/books/_search?pretty" -d' { "query": { "match_all": {} }, "aggs": { "authors": { "terms": { "field": "author.keyword" } } } }' -H 'Content-Type: application/json' … "aggregations" : { "authors" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "William Shakespeare", "doc_count" : 2 }, { "key" : "Mark Twain", "doc_count" : 1 } ] }

Slide 79

Slide 79

! 79 Aggregation Bucket, Metric, Pipeline • Bucket ‒ Creats bucket, sub-group of documents. ‒ Bucket can conatin another aggregation. • Metric ‒ Calculates number field. • Pipeline ‒ Re-calculate Metric aggregation.

Slide 80

Slide 80

! 80 Aggregation • percentile • percentile_ranks • cardinality • significant_terms • top hits • scripted_metric • filters • range • geohash • terms • histogram • date_histogram • stats • extended stats • min / max • sum • pipeline aggregations

Slide 81

Slide 81

! 81 Agg. Combination curl -XGET "http://localhost:9200/books/_search?pretty" -d' { "query": { "match_all": {} }, "aggs": { "authors": { "terms": { "field": "author.keyword" }, "aggs": { "pages_per_author": { "sum": { "field": "pages" } } } } } }' -H 'Content-Type: application/json' … "aggregations" : { "authors" : { "doc_count_error_upper_bound" : 0, "sum_other_doc_count" : 0, "buckets" : [ { "key" : "William Shakespeare", "doc_count" : 2, "pages_per_author" : { "value" : 297.0 } }, { "key" : "Mark Twain", "doc_count" : 1, "pages_per_author" : { "value" : 79.0 } } ] }

Slide 82

Slide 82

! 82 Agg. Combination in Kibana

Slide 83

Slide 83

! 83 2 0 3 4 1 1 3 2 4 0 Coordinate node transfers query to every single shard of index. Search sequence – 1. Query Phase client

Slide 84

Slide 84

! 84 2 0 3 4 1 1 3 2 4 0 All shard returns doc_id and ranking score with size of query. Search sequence – 1. Query Phase client

Slide 85

Slide 85

! 85 2 0 3 4 1 1 3 2 4 0 Coordinate node sorts docs by ranking, and request top size of documents again. Search sequence – 2. Fetch Phase client

Slide 86

Slide 86

! 86 2 0 3 4 1 1 3 2 4 0 Shard returns full document contents, including _source. Search sequence – 2. Fetch Phase client

Slide 87

Slide 87

! 87 Ranking algorithm. TF/IDF Usually. Elasticsearch uses BM25

Slide 88

Slide 88

! 88 TF / IDF • Term Frequency ‒ The more a term appears in a field, the more important it is
• Inverse Document Frequency ‒ The more documents that contain the term, the less important the term is. Term Frequency / Inverse Document Frequency

Slide 89

Slide 89

! 89 Elasticsearch is… An open source, distributed, scalable, highly available, document-oriented, RESTful , full text search engine with real-time search and analytics

capabilities

Slide 90

Slide 90

! 90 Lucene Segments Files contains Inverted Index, Doc Value, _source, etc… shard segment document • Elasticsearch creates segment every single second (default). • Segment is immutable - which is can’t be changed • Small segment will be merged into big segment.

Slide 91

Slide 91

! 91 Segment Merge http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 2 3 7 8 9 10 12 13 15

Slide 92

Slide 92

! 92 https://www.elastic.co/community

https://www.facebook.com/groups/ElasticIndiaUserGroup/

community@elastic.co Thank you