A presentation at Search Meetup Hamburg in in Hamburg, Germany by Alexander Reelsen
Finding e-commerce products using Elasticsearch Alexander Reelsen | @spinscale alex@elastic.co
search is hard!
search in ecommerce is harder
good data & good searches
bad data & smart searches
good data & worst searches
Agenda
Agenda facetted navigation
Agenda facetted navigation search bar
Agenda facetted navigation clean data search bar
Agenda facetted navigation clean data smart searches search bar
Agenda facetted navigation clean data synonyms smart searches search bar
Agenda clean data facetted navigation UOM smart searches search bar synonyms
Agenda facetted navigation clean data decompounding synonyms UOM smart searches search bar
Agenda facetted navigation clean data relevancy UOM smart searches synonyms decompounding search bar
Agenda facetted navigation clean data variants UOM relevancy smart searches synonyms decompounding search bar
Agenda facetted navigation clean data variants deduplication synonyms UOM relevancy smart searches decompounding search bar
Agenda facetted navigation deduplication clean data variants search as you type UOM relevancy smart searches synonyms decompounding search bar
Agenda facetted navigation deduplication UOM clean data variants analytics relevancy smart searches search as you type synonyms decompounding search bar
Agenda analytics facetted navigation deduplication UOM clean data variants data quality relevancy smart searches search as you type synonyms decompounding search bar
Agenda analytics facetted navigation deduplication UOM clean data variants mobile relevancy smart searches search as you type synonyms data quality decompounding search bar
Agenda analytics facetted navigation deduplication clean data variants product detail pagedata quality UOM mobile relevancy smart searches search as you type synonyms decompounding search bar
Agenda analytics facetted navigation deduplication UOM product detail page mobile clean data variants LTR relevancy smart searches search as you type synonyms data quality decompounding search bar
Agenda analytics facetted navigation deduplication product detail page clean data variants multi language UOM mobile relevancy smart searches search as you type synonyms data quality decompounding search bar LTR
Agenda analytics facetted navigation deduplication multi language UOM mobile product detail page clean data variants ETIME relevancy smart searches search as you type synonyms data quality decompounding search bar LTR
demo
search bar
search bar
smart searches
smart searches nike running hoodie xl
smart searches nike running hoodie xl
smart searches brand } } nike running hoodie xl size
clean data
clean data » Hardest thing to do ever » Formats being accepted? JSON, XML, CSV, EDIFACT? » How to train merchants? » Another local cleansing step? Accountability on failure? » If you fail here, stop optimising your search! » indexing pipeline: applying synonyms?
synonyms
synonyms » topf => kochtopf » naik => nike » portmonee => geldbörse » who maintains this list? » who keeps it updated? » who matches this against your worst queries, that return 0 hits? » reloadable without index closing (since ES 7.3)
UOM
UOM » Unit of Measure (100cm vs. 1m) » Requires normalization: part of data cleansing » Dissecting into a base unit and a value in order to query » Who is doing this already? » JSR 385: Units of Measurement API 2.0 » Could be done in an Ingest Processor
decompounding
decompounding
decompounding
decompounding
relevancy
relevancy » relevancy needs to be defined by the business owners (who rarely understand it) » BM25 is not the score you are looking for » need to incorporate business/product metrics » provision, item on stock, location, free shipping, last sale
relevancy » Search for ‘bicycle’ » Are 20 different bikes relevant results? » What about locks, lights, clothes? Maybe go with 10 bikes, 3 accessoires? » User bought a bike three months ago, maybe he is searching for equipment? Or a replacement tire?
relevancy » are there certain products you always want to score higher?
relevancy
variants
variants
variants » how to model variants and their differences? » just attributes? and price? product title and description? » search: across all variants or the main products? » display: variants as own results or group them? » display: what happens when one product is out of stock?
variants
deduplication
deduplication » Safe: ISBN, ASIN » Unsafe: Product images, description, name, release date, size » query time or index time?
deduplication
search as you type
search as you type » “The importance of seach-as-you-type cannot be overstated” » Hint: make a user test first. There are users who do not look up when typing! » Speed is key » Rank your suggestions on your own criteria! » Ensure exact hits are scored up (brown fox vs. brown foxes) » Steer the user without showing any search results » Possibly an own index with reduced result set » Analyze searches and adapt to follow trends
search as you type
analytics
analytics » conversion rate » search results with zero hits » “one search and out” » busiest hours (planning downtime) » recommendations
product detail page
product detail page » crucial to make a sale » what to display, if the product is out of stock » what to display, if the product is EOL? » dynamic price calculation
LTR
LTR
summary
summary » ecommerce search is complex » so many things to take into account… » untold: index strategies, updates, management » always have a middleware (UI, query injection, a/b testing, landing pages, redirects, query logging, business owner endpoint)
search ui https://github.com/elastic/search-ui
Elastic App Search
Elastic App Search
Elastic App Search https://www.elastic.co/blog/elastic-app-search-7-2-0-released
books
books
books
links
links » https://project-a.github.io/on-site-search-design-patterns-for-e-commerce/
Thank you for listening! Alexander Reelsen @spinscale alex@elastic.co
This talk will cover all the different aspects to keep in mind when running a product search engine in an ecommerce platform. That search engine is usually a direct driver for your conversion and thus needs to be handled with care. Starting with a small demo for facetted/aggregated search most of the talk will deal with more advanced topics like proper data modeling, further query strategies, ranking strategies, feedback loops and query parsing.