Elasticsearch - You know, for Search

A presentation at DigitalOcean Webinar Series in June 2018 in by Aravind Putrevu

Slide 1

Slide 1

Elasticsearch Search Engine on your server Aravind Putrevu Developer | Evangelist @aravindputrevu | aravindputrevu.in elastic.co/community 1

Slide 2

Slide 2

Agenda 2 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning

Slide 3

Slide 3

Agenda 3 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning

Slide 4

Slide 4

Agenda 4 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning

Slide 5

Slide 5

Agenda 5 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning

Slide 6

Slide 6

Agenda 6 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning

Slide 7

Slide 7

Security Alerting Monitoring Elastic Stack No enterprise edition All new versions with 6.2 X-Pack Reporting Machine Learning Graph 7

Slide 8

Slide 8

Why it is Popular? Speed 8 Scale Relevance

Slide 9

Slide 9

Terms An index is a stores collection documents that A cluster is a collection of one or more nodes A node (servers) is a single server that is part of your cluster, yourofdata, and have somewhat similar characteristics participates in the cluster’s indexing and search capabilities Index Node Cluster Type Deprecated in 6.0.0 A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index Document A document is a basic unit of information that can be indexed. This document is expressed in JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format. Elasticsearch provides the ability to subdivide your index into multiple pieces called shards Shard 9 Replica To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short https://www.elastic.co/guide/en/elasticsearch/reference/current/glossary.html

Slide 10

Slide 10

Elasticsearch Node Types Nodes can play one or more roles, for workload isolation and scaling Elasticsearch • – • • Coordinating (X) • Machine Learning (2+) • Route requests, handle search reduce phase, distribute bulk indexing All nodes function as coordinating nodes Ingest Nodes – Ingest (X) Hold indexed data and perform data related operations Differentiated Hot and Warm Data nodes can be used Coordinating Nodes – – Data – Warm (X) Control the cluster, requires a minimum of 3, one is active at any given time Data Nodes – – Master (3) Data – Hot (X) Master Nodes Use ingest pipelines to transform and enrich before indexing Machine Learning Nodes – Run machine learning jobs X-Pack 10 All product names, logos, and brands are property of their respective owners and are used only for identification purposes. This is not an endorsement.

Slide 11

Slide 11

What powers Elasticsearch? ● A Java library ● Great for full-text search But 11 ● Challenging to use ● Not designed for scale https://www.elastic.co/blog/found-elasticsearch-top-down

Slide 12

Slide 12

Talking to Elasticsearch 12 https://www.elastic.co/guide/en/elasticsearch/client/index.html

Slide 13

Slide 13

Indexing a document 13 https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

Slide 14

Slide 14

Inserting data _bulk 14

Slide 15

Slide 15

Where will my data go? The default value used for _routing is the document’s _id. 0 < shard < number_of_primary_shards - 1 15 https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html

Slide 16

Slide 16

Mappings 16

Slide 17

Slide 17

Full Text Analysis Inverted Index 17

Slide 18

Slide 18

Analyzer Helps in converting text into tokens for better search capability 1 Character filters 18 2 Tokenizer 3 Token Filters

Slide 19

Slide 19

Aggregations 19 ● Metrics ● Bucket ● Pipeline ● and so on...

Slide 20

Slide 20

Querying Data 20 ● Full Text Queries ● Term Level Queries ● Compound Queries ● Geo Queries

Slide 21

Slide 21

Query DSL Match Query 21

Slide 22

Slide 22

Query DSL Term Queries 22

Slide 23

Slide 23

Query DSL Nested queries 23

Slide 24

Slide 24

Query DSL Geo queries 24

Slide 25

Slide 25

Beats Elasticsearch Master Nodes (3) Log Files Metrics Custom UI Logstash Ingest Nodes (X) Wire Data Kibana your{beat} Data Nodes – Hot (X) Kafka Instances (X) Datastore Web APIs Redis Social Sensors Messaging Queue Data Notes – Warm (X) Nodes (X) X-Pack LDAP Hadoop Ecosystem 25 ES-Hadoop AD X-Pack SSO Authentication Notification

Slide 26

Slide 26

Capacity Planning It depends... 26 https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Slide 27

Slide 27

Capacity Planning What is your use case? ● Full text search ● Logging/Metrics ● Complex Aggregations with lot of users Each use case needs a different cluster configuration. 27 https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Slide 28

Slide 28

Capacity Planning Let us take Logging.. ● Inflow of data per day ○ ○ ○ ● 28 15 days High Availability (Replication factor) ○ ● Master Node : X Data Retention ○ ● Per day : 10GB Per Month : 300GB Per Year: 3600GB Data Node : X 1 i.e., 7200GB Per Year Type of Queries https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Slide 29

Slide 29

Capacity Planning Hardware Recommendations 29 ● SSD’s are the best ● Local Disk is king! ● Prefer Medium size machine’s over Large size machine’s ● Only 50% of your RAM to Elasticsearch ● Don’t Cross 32GB Java Heap Space https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Slide 30

Slide 30

Beats Elasticsearch Master Nodes (3) Log Files Metrics Custom UI Logstash Ingest Nodes (X) Wire Data Kibana your{beat} Data Nodes – Hot (X) Kafka Instances (X) Datastore Web APIs Redis Social Sensors Messaging Queue Data Notes – Warm (X) Nodes (X) X-Pack LDAP Hadoop Ecosystem 30 ES-Hadoop AD X-Pack SSO Authentication Notification https://www.elastic.co/blog/hot-warm-architecture-in-elasticsearch-5-x

Slide 31

Slide 31

training.elastic.co 31

Slide 32

Slide 32

Resources • https://www.elastic.co/learn • https://www.elastic.co/blog/category/engineering • https://discuss.elastic.co/ • https://fb.com/groups/ElasticIndiaUserGroup • https://elastic.co/community 32

Slide 33

Slide 33

Fin! discuss.elastic.co | aravind@elastic.co | @aravindputrevu 33