A presentation at DigitalOcean Webinar Series by Aravind Putrevu
Elasticsearch Search Engine on your server Aravind Putrevu Developer | Evangelist @aravindputrevu | aravindputrevu.in elastic.co/community 1
Agenda 2 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning
Agenda 3 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning
Agenda 4 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning
Agenda 5 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning
Agenda 6 1 Terms 2 Talking to Elasticsearch 3 Mappings 4 Analyzers and Aggregations 5 Capacity Planning
Security Alerting Monitoring Elastic Stack No enterprise edition All new versions with 6.2 X-Pack Reporting Machine Learning Graph 7
Why it is Popular? Speed 8 Scale Relevance
Terms An index is a stores collection documents that A cluster is a collection of one or more nodes A node (servers) is a single server that is part of your cluster, yourofdata, and have somewhat similar characteristics participates in the cluster’s indexing and search capabilities Index Node Cluster Type Deprecated in 6.0.0 A type used to be a logical category/partition of your index to allow you to store different types of documents in the same index Document A document is a basic unit of information that can be indexed. This document is expressed in JSON (JavaScript Object Notation) which is a ubiquitous internet data interchange format. Elasticsearch provides the ability to subdivide your index into multiple pieces called shards Shard 9 Replica To this end, Elasticsearch allows you to make one or more copies of your index’s shards into what are called replica shards, or replicas for short https://www.elastic.co/guide/en/elasticsearch/reference/current/glossary.html
Elasticsearch Node Types Nodes can play one or more roles, for workload isolation and scaling Elasticsearch • – • • Coordinating (X) • Machine Learning (2+) • Route requests, handle search reduce phase, distribute bulk indexing All nodes function as coordinating nodes Ingest Nodes – Ingest (X) Hold indexed data and perform data related operations Differentiated Hot and Warm Data nodes can be used Coordinating Nodes – – Data – Warm (X) Control the cluster, requires a minimum of 3, one is active at any given time Data Nodes – – Master (3) Data – Hot (X) Master Nodes Use ingest pipelines to transform and enrich before indexing Machine Learning Nodes – Run machine learning jobs X-Pack 10 All product names, logos, and brands are property of their respective owners and are used only for identification purposes. This is not an endorsement.
What powers Elasticsearch? ● A Java library ● Great for full-text search But 11 ● Challenging to use ● Not designed for scale https://www.elastic.co/blog/found-elasticsearch-top-down
Talking to Elasticsearch 12 https://www.elastic.co/guide/en/elasticsearch/client/index.html
Indexing a document 13 https://www.elastic.co/blog/how-many-shards-should-i-have-in-my-elasticsearch-cluster
Inserting data _bulk 14
Where will my data go? The default value used for _routing is the document’s _id. 0 < shard < number_of_primary_shards - 1 15 https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-routing-field.html
Mappings 16
Full Text Analysis Inverted Index 17
Analyzer Helps in converting text into tokens for better search capability 1 Character filters 18 2 Tokenizer 3 Token Filters
Aggregations 19 ● Metrics ● Bucket ● Pipeline ● and so on...
Querying Data 20 ● Full Text Queries ● Term Level Queries ● Compound Queries ● Geo Queries
Query DSL Match Query 21
Query DSL Term Queries 22
Query DSL Nested queries 23
Query DSL Geo queries 24
Beats Elasticsearch Master Nodes (3) Log Files Metrics Custom UI Logstash Ingest Nodes (X) Wire Data Kibana your{beat} Data Nodes – Hot (X) Kafka Instances (X) Datastore Web APIs Redis Social Sensors Messaging Queue Data Notes – Warm (X) Nodes (X) X-Pack LDAP Hadoop Ecosystem 25 ES-Hadoop AD X-Pack SSO Authentication Notification
Capacity Planning It depends... 26 https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
Capacity Planning What is your use case? ● Full text search ● Logging/Metrics ● Complex Aggregations with lot of users Each use case needs a different cluster configuration. 27 https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
Capacity Planning Let us take Logging.. ● Inflow of data per day ○ ○ ○ ● 28 15 days High Availability (Replication factor) ○ ● Master Node : X Data Retention ○ ● Per day : 10GB Per Month : 300GB Per Year: 3600GB Data Node : X 1 i.e., 7200GB Per Year Type of Queries https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
Capacity Planning Hardware Recommendations 29 ● SSD’s are the best ● Local Disk is king! ● Prefer Medium size machine’s over Large size machine’s ● Only 50% of your RAM to Elasticsearch ● Don’t Cross 32GB Java Heap Space https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
Beats Elasticsearch Master Nodes (3) Log Files Metrics Custom UI Logstash Ingest Nodes (X) Wire Data Kibana your{beat} Data Nodes – Hot (X) Kafka Instances (X) Datastore Web APIs Redis Social Sensors Messaging Queue Data Notes – Warm (X) Nodes (X) X-Pack LDAP Hadoop Ecosystem 30 ES-Hadoop AD X-Pack SSO Authentication Notification https://www.elastic.co/blog/hot-warm-architecture-in-elasticsearch-5-x
training.elastic.co 31
Resources • https://www.elastic.co/learn • https://www.elastic.co/blog/category/engineering • https://discuss.elastic.co/ • https://fb.com/groups/ElasticIndiaUserGroup • https://elastic.co/community 32
Fin! discuss.elastic.co | aravind@elastic.co | @aravindputrevu 33