Introduction into Elasticsearch & Spring Data Elasticsearch

A presentation at JUG CH in June 2020 in by Alexander Reelsen

Slide 1

Slide 1

Introduction into Elasticsearch & Spring Data Elasticsearch Alexander Reelsen Community Advocate alex@elastic.co | @spinscale

Slide 2

Slide 2

TOC Why do you need a search engine in your app? Introduction into Elasticsearch Introduction into Spring Data Elasticsearch Demo Running Elasticsearch: Scaling your cluster Next steps

Slide 3

Slide 3

Why do you need a search engine? … or any data store

Slide 4

Slide 4

Speed, Scale & Relevance

Slide 5

Slide 5

Speed

Slide 6

Slide 6

Scale

Slide 7

Slide 7

Relevance

Slide 8

Slide 8

… and much more NRT: Searching & Indexing Read scalability & write scalability Resiliency Operational simplicity & monitoring capabilities Developer experience Infrastructure integration Team experience Use-Cases: Observability, Workplace Search, Security, Product Search, Wikipedia

Slide 9

Slide 9

Product Overview

Slide 10

Slide 10

Solutions

Slide 11

Slide 11

Elastic Stack building blocks

Slide 12

Slide 12

Deployment options

Slide 13

Slide 13

Licensing

Slide 14

Slide 14

Elastic Stack building blocks

Slide 15

Slide 15

Elasticsearch in 10 seconds Search Engine (FTS, Analytics, Geo), near real-time Distributed, scalable, highly available, resilient Interface: HTTP & JSON Heart of the Elastic Stack (Kibana, Logstash, Beats)

Slide 16

Slide 16

Installation & Start # https://www.elastic.co/downloads/elasticsearch wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.8.0-darwin-x86_64.tar.gz # wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.8.0-linux-x86_64.tar.gz # wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.8.0-windows-x86_64.zip tar zxf elasticsearch-7.8.0-darwin-x86_64.tar.gz cd elasticsearch-7.8.0 ./bin/elasticsearch wget https://artifacts.elastic.co/downloads/kibana/kibana-7.8.0-darwin-x86_64.tar.gz # wget https://artifacts.elastic.co/downloads/kibana/kibana-7.8.0-linux-x86_64.tar.gz # wget https://artifacts.elastic.co/downloads/kibana/kibana-7.8.0-windows-x86_64.zip tar zxf kibana-7.8.0-darwin-x86_64.tar.gz cd kibana-7.8.0 ./bin/kibana Point your browser to http://localhost:5601/

Slide 17

Slide 17

Click Dev-Tools Samples in Kibana Samples in Github

Slide 18

Slide 18

Slide 19

Slide 19

Introduction into Spring Data Elasticsearch Community maintained Spring Data Extension Reactive extension Make sure to use major version 4 (based on Elasticsearch 7.x), default in Spring Boot 2.3 Uses the Elasticsearch REST Client

Slide 20

Slide 20

Elasticsearch REST client Depends on the Elasticsearch core project Based on Apache HTTP Client (works on java 8), might want to consider shading Supports synchronous calls & cancellable async calls Threadsafe RestClient RestHighLevelClient

Slide 21

Slide 21

Elasticsearch REST client architecture

Slide 22

Slide 22

Spring Data Elasticsearch - Basics ElasticsearchTemplate & ElasticsearchRestTemplate MappingElasticsearchConverter CrudRepository Auditing, Entity Callbacks, efficient scroll searching

Slide 23

Slide 23

Spring Data Elasticsearch - Entities @Document(indexName = “persons”, shards = 1, createIndex = false) public class Person { @Id private String id; private String name; @Email @Field(type = FieldType.Keyword) private String email; @Field(name=”created_at”, type = FieldType.Date, format = DateFormat.date_time) private Date createdAt; @Size(max=500) @Pattern(regexp = “https?://.*”, message = “must start with http:// or https://”) @URL @Field(type = FieldType.Keyword) private String url; } private List<Person> friends; // creates an array private Point Location; // maps to geo_point

Slide 24

Slide 24

Spring Data Elasticsearch - Repositories import org.springframework.data.elasticsearch.repository.ElasticsearchRepository; public interface UserProfileRepository extends ElasticsearchRepository<UserProfile, String> { } Dynamic finders like findByEmail(String email) Attention: Inefficient queries like findByDescriptionEndingWith()

Slide 25

Slide 25

Spring Data Elasticsearch - Searching final BoolQueryBuilder qb = QueryBuilders.boolQuery() // somewhat stable randomization to make sure users get an arbitrary document .must(QueryBuilders.scriptScoreQuery(QueryBuilders.matchAllQuery(), new Script(“randomScore(1000, ‘created_at’)”))) // only consider contributions that are not yet approvaed .filter(QueryBuilders.termQuery(“state”, Contribution.State.CREATED.name())) // ensure only contributions from the same region are shown .filter(QueryBuilders.termQuery(“region”, profile.getRegion())) // only consider languages spoken by the user as well as english .filter(QueryBuilders.termsQuery(“language”, languages)) // exclude documents that were created by this user .mustNot(QueryBuilders.termQuery(“submitted_by.email”, profile.getEmail())) // exclude documents that were already voted on .mustNot(QueryBuilders.termQuery(“comments.submitted_by.email”, profile.getEmail())); Query query = new NativeSearchQuery(qb).setPageable(PageRequest.of(0, 1)); final SearchHit<Contribution> result = elasticsearchRestTemplate.searchOne(query, Contribution.class);

Slide 26

Slide 26

Spring Data Elasticsearch - Count private boolean canSubmitMoreContributions(String email) { final BoolQueryBuilder qb = QueryBuilders.boolQuery() .filter(QueryBuilders.termQuery(“submitted_by.email”, email)) .filter(QueryBuilders.rangeQuery(“created_at”).gte(“now-1d”)); final long recentlySubmittedCount = elasticsearchRestTemplate.count(new NativeSearchQuery(qb), Contribution.class); return recentlySubmittedCount <= 10; }

Slide 27

Slide 27

Spring Data Elasticsearch - Count public interface ContributionRepository extends ElasticsearchRepository<UserProfile, String> { @Query(“{“bool”: { “must” : [ { “term” : { “submitted_by.email”:”?0” } }, { “range” : { “created_at”: { “gte” : “?1” } } } ] } }”) long countRecentContributions(String email, String date); }

Slide 28

Slide 28

Spring Data Elasticsearch - Aggregations // filter by approved final BoolQueryBuilder qb = QueryBuilders.boolQuery() .filter(QueryBuilders.termQuery(“state”, Contribution.State.APPROVED.name())) .filter(QueryBuilders.termQuery(“region”, region.name())); final NativeSearchQuery query = new NativeSearchQuery(qb); // aggregate on username, get top 10, sum up score query.addAggregation(AggregationBuilders.terms(“by_user”).field(“submitted_by.email”).size(40) .subAggregation(AggregationBuilders.sum(“total_score”).field(“score”)) // make sure we get the full name of the last contribution .subAggregation(AggregationBuilders.topHits(“by_name”).size(1).sort(SortBuilders.fieldSort(“created_at”) .order(SortOrder.DESC)).fetchSource(“submitted_by.full_name”, “”))); query.setPageable(Pageable.unpaged()); final SearchHits<Contribution> hits = elasticsearchRestTemplate.search(query, Contribution.class); // returns an Elasticsearch class final Aggregations aggregations = hits.getAggregations();

Slide 29

Slide 29

Demo

Slide 30

Slide 30

Running Elasticsearch: Scaling your cluster Do not overshard: Single shard can easily contain 20-50GB Let the filesystem cache get to work Performance test, on your data! Use rally Hint: Capacity Planning Webinar

Slide 31

Slide 31

Slide 32

Slide 32

Compute Resources Storage: SSDs for hot data, HDDs for warm/cold, avoid NAS Memory: JVM heap + OS cache Compute: Thread pool scaling based on CPU count Network: The faster the better (careful cloud providers with burst rates)

Slide 33

Slide 33

Next steps Improve your search: Learn about mappings and queries Improve your model Figure out expected throughput Use aliases, always!

Slide 34

Slide 34

Summary Search is never done! Use the reference documentation Ask your users about expectations, do not guess! Testing: TestContainers

Slide 35

Slide 35

Resources spinscale/link-rating Qovery Spring Data Elasticsearch Documentation Elasticsearch Java REST Client Documentation Elasticsearch Nightly Benchmarks

Slide 36

Slide 36

Thanks for listening Q&A Alexander Reelsen Community Advocate alex@elastic.co | @spinscale

Slide 37

Slide 37

Elastic Cloud

Slide 38

Slide 38

Elastic Support Subscriptions

Slide 39

Slide 39

Discuss Forum https://discuss.elastic.co

Slide 40

Slide 40

Community & Meetups https://community.elastic.co

Slide 41

Slide 41

Thanks for listening Q&A Alexander Reelsen Community Advocate alex@elastic.co | @spinscale