Elasticsearch Search Engine on your server Aravind Putrevu Developer | Evangelist @aravindputrevu | aravindputrevu.in elastic.co/community 1

Agenda 2 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning

Agenda 3 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning

Agenda 4 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning

Agenda 5 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning

Agenda 6 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning

Security Alerting Monitoring Elastic Stack No enterprise edition All new versions with 6.2 X-Pack Reporting Machine Learning Graph 7

Why it is Popular? Speed 8 Scale Relevance

Terms An index is a stores collection documents that A cluster is a collection of one or more nodes A node (servers) is a single server that is part of your cluster, yourofdata, and have somewhat similar characteristics participates in the cluster’s indexing and search capabilities Index Node Cluster Type Deprecated in 6.0.0 A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index Document A document is a basic unit of information that can be indexed. This document is expressed in JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format. Elasticsearch provides the ability to subdivide your index into multiple pieces called shards Shard 9 Replica To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short https://www.elastic.co/guide/en/elasticsearch/reference/current/glossary.html

Elasticsearch Node Types Nodes can play one or more roles, for workload isolation and scaling Elasticsearch • – • • Coordinating (X) • Machine Learning (2+) • Route requests, handle search reduce phase, distribute bulk indexing All nodes function as coordinating nodes Ingest Nodes – Ingest (X) Hold indexed data and perform data related operations Differentiated Hot and Warm Data nodes can be used Coordinating Nodes – – Data – Warm (X) Control the cluster, requires a minimum of 3, one is active at any given time Data Nodes – – Master (3) Data – Hot (X) Master Nodes Use ingest pipelines to transform and enrich before indexing Machine Learning Nodes – Run machine learning jobs X-Pack 10 All product names, logos, and brands are property of their respective owners and are used only for identification purposes. This is not an endorsement.

What powers Elasticsearch? ● A Java library ● Great for full-text search But 11 ● Challenging to use ● Not designed for scale https://www.elastic.co/blog/found-elasticsearch-top-down

Talking to Elasticsearch 12 https://www.elastic.co/guide/en/elasticsearch/client/index.html

Indexing a document 13 https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster

Inserting data _bulk 14

Where will my data go? The default value used for _routing is the document’s _id. 0 < shard < number_of_primary_shards - 1 15 https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html

Mappings 16

Full Text Analysis Inverted Index 17

Analyzer Helps in converting text into tokens for better search capability 1 Character filters 18 2 Tokenizer 3 Token Filters

Aggregations 19 ● Metrics ● Bucket ● Pipeline ● and so on...

Querying Data 20 ● Full Text Queries ● Term Level Queries ● Compound Queries ● Geo Queries

Query DSL Match Query 21

Query DSL Term Queries 22

Query DSL Nested queries 23

Query DSL Geo queries 24

Beats Elasticsearch Master Nodes (3) Log Files Metrics Custom UI Logstash Ingest Nodes (X) Wire Data Kibana your{beat} Data Nodes – Hot (X) Kafka Instances (X) Datastore Web APIs Redis Social Sensors Messaging Queue Data Notes – Warm (X) Nodes (X) X-Pack LDAP Hadoop Ecosystem 25 ES-Hadoop AD X-Pack SSO Authentication Notification

Capacity Planning It depends... 26 https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Capacity Planning What is your use case? ● Full text search ● Logging/Metrics ● Complex Aggregations with lot of users Each use case needs a different cluster configuration. 27 https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Capacity Planning Let us take Logging.. ● Inflow of data per day ○ ○ ○ ● 28 15 days High Availability (Replication factor) ○ ● Master Node : X Data Retention ○ ● Per day : 10GB Per Month : 300GB Per Year: 3600GB Data Node : X 1 i.e., 7200GB Per Year Type of Queries https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Capacity Planning Hardware Recommendations 29 ● SSD’s are the best ● Local Disk is king! ● Prefer Medium size machine’s over Large size machine’s ● Only 50% of your RAM to Elasticsearch ● Don’t Cross 32GB Java Heap Space https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing

Beats Elasticsearch Master Nodes (3) Log Files Metrics Custom UI Logstash Ingest Nodes (X) Wire Data Kibana your{beat} Data Nodes – Hot (X) Kafka Instances (X) Datastore Web APIs Redis Social Sensors Messaging Queue Data Notes – Warm (X) Nodes (X) X-Pack LDAP Hadoop Ecosystem 30 ES-Hadoop AD X-Pack SSO Authentication Notification https://www.elastic.co/blog/hot-warm-architecture-in-elasticsearch-5-x

training.elastic.co 31

Resources • https://www.elastic.co/learn • https://www.elastic.co/blog/category/engineering • https://discuss.elastic.co/ • https://fb.com/groups/ElasticIndiaUserGroup • https://elastic.co/community 32

Fin! discuss.elastic.co | aravind@elastic.co | @aravindputrevu 33