Finding e-commerce products using Elasticsearch
Alexander Reelsen | @spinscale alex@elastic.co
Slide 2
search is hard!
Slide 3
search in ecommerce is harder
Slide 4
good data & good searches
Slide 5
bad data & smart searches
Slide 6
good data & worst searches
Slide 7
Agenda
Slide 8
Agenda
facetted navigation
Slide 9
Agenda facetted navigation
search bar
Slide 10
Agenda facetted navigation
clean data search bar
Slide 11
Agenda facetted navigation
clean data
smart searches search bar
Slide 12
Agenda facetted navigation
clean data
synonyms smart searches
search bar
Slide 13
Agenda clean data
facetted navigation
UOM smart searches
search bar
synonyms
Slide 14
Agenda facetted navigation
clean data
decompounding
synonyms
UOM
smart searches
search bar
Slide 15
Agenda facetted navigation
clean data
relevancy UOM
smart searches
synonyms
decompounding search bar
Slide 16
Agenda facetted navigation
clean data
variants UOM
relevancy
smart searches
synonyms
decompounding search bar
Slide 17
Agenda facetted navigation
clean data variants
deduplication
synonyms
UOM
relevancy
smart searches
decompounding search bar
Slide 18
Agenda facetted navigation deduplication
clean data variants
search as you type
UOM
relevancy
smart searches
synonyms
decompounding search bar
Slide 19
Agenda facetted navigation deduplication
UOM
clean data variants
analytics
relevancy
smart searches search as you type
synonyms
decompounding search bar
Slide 20
Agenda
analytics
facetted navigation deduplication
UOM
clean data variants
data quality
relevancy
smart searches search as you type
synonyms
decompounding search bar
Slide 21
Agenda
analytics
facetted navigation deduplication
UOM
clean data variants
mobile
relevancy
smart searches search as you type
synonyms data quality
decompounding search bar
Slide 22
Agenda
analytics
facetted navigation deduplication
clean data variants
product detail pagedata quality
UOM
mobile
relevancy
smart searches search as you type
synonyms
decompounding search bar
Slide 23
Agenda
analytics
facetted navigation deduplication
UOM
product detail page
mobile
clean data variants
LTR
relevancy
smart searches search as you type
synonyms data quality
decompounding search bar
Slide 24
Agenda
analytics
facetted navigation deduplication
product detail page
clean data variants
multi language
UOM
mobile
relevancy
smart searches search as you type
synonyms data quality
decompounding search bar
LTR
Slide 25
Agenda
analytics
facetted navigation deduplication multi language
UOM
mobile
product detail page
clean data variants
ETIME
relevancy
smart searches search as you type
synonyms data quality
decompounding search bar
LTR
clean data » Hardest thing to do ever » Formats being accepted? JSON, XML, CSV, EDIFACT? » How to train merchants? » Another local cleansing step? Accountability on failure? » If you fail here, stop optimising your search! » indexing pipeline: applying synonyms?
Slide 35
synonyms
Slide 36
synonyms » topf => kochtopf » naik => nike » portmonee => geldbörse » who maintains this list? » who keeps it updated? » who matches this against your worst queries, that return 0 hits? » reloadable without index closing (since ES 7.3)
Slide 37
UOM
Slide 38
UOM » Unit of Measure (100cm vs. 1m) » Requires normalization: part of data cleansing » Dissecting into a base unit and a value in order to query » Who is doing this already? » JSR 385: Units of Measurement API 2.0 » Could be done in an Ingest Processor
Slide 39
decompounding
Slide 40
decompounding
Slide 41
decompounding
Slide 42
decompounding
Slide 43
relevancy
Slide 44
relevancy » relevancy needs to be defined by the business owners (who rarely understand it) » BM25 is not the score you are looking for » need to incorporate business/product metrics » provision, item on stock, location, free shipping, last sale
Slide 45
relevancy » Search for ‘bicycle’ » Are 20 different bikes relevant results? » What about locks, lights, clothes? Maybe go with 10 bikes, 3 accessoires? » User bought a bike three months ago, maybe he is searching for equipment? Or a replacement tire?
Slide 46
relevancy » are there certain products you always want to score higher?
Slide 47
relevancy
Slide 48
variants
Slide 49
variants
Slide 50
variants » how to model variants and their differences? » just attributes? and price? product title and description? » search: across all variants or the main products? » display: variants as own results or group them? » display: what happens when one product is out of stock?
Slide 51
variants
Slide 52
deduplication
Slide 53
deduplication » Safe: ISBN, ASIN » Unsafe: Product images, description, name, release date, size » query time or index time?
Slide 54
deduplication
Slide 55
search as you type
Slide 56
search as you type » “The importance of seach-as-you-type cannot be overstated” » Hint: make a user test first. There are users who do not look up when typing! » Speed is key » Rank your suggestions on your own criteria! » Ensure exact hits are scored up (brown fox vs. brown foxes) » Steer the user without showing any search results » Possibly an own index with reduced result set » Analyze searches and adapt to follow trends
Slide 57
search as you type
Slide 58
analytics
Slide 59
analytics » conversion rate » search results with zero hits » “one search and out” » busiest hours (planning downtime) » recommendations
Slide 60
product detail page
Slide 61
product detail page » crucial to make a sale » what to display, if the product is out of stock » what to display, if the product is EOL? » dynamic price calculation
Slide 62
LTR
Slide 63
LTR
Slide 64
summary
Slide 65
summary » ecommerce search is complex » so many things to take into account… » untold: index strategies, updates, management » always have a middleware (UI, query injection, a/b testing, landing pages, redirects, query logging, business owner endpoint)