Running a Serverless Lucene Reverse Geocoder

A presentation at JUG Nürnberg in March 2020 in Nuremberg, Germany by Alexander Reelsen

Slide 1

Slide 1

Running a Serverless Lucene Reverse Geocoder Alexander Reelsen alex@elastic.co @spinscale

Slide 2

Slide 2

Agenda ‣ What is serverless? ‣ Searching for Locations ‣ Demo ‣ How to execute java code faster

Slide 3

Slide 3

Serverless?

Slide 4

Slide 4

Serverless? ‣ FaaS (Function as a Service) ‣ Execution Environment as a Service ‣ Payment model: Pay per code runtime ‣ Not running? No bill! ‣ Configure memory size (also changes CPU power) ‣ Maximum function execution time ‣ Provider takes care of scaling functions

Slide 5

Slide 5

Examples ‣ Good: Short lived HTTP requests ‣ Good: Short running jobs ‣ Good: Event streaming & processing ‣ Good: Share nothing web applications ‣ Good: Parallelizable workloads ‣ Bad: Slack bots?

Slide 6

Slide 6

Providers? ‣ AWS Lambda, GCP Cloud Functions, Azure Cloud Functions, Cloudflare, IBM OpenWhisk, Google Cloud Run ‣ Faastruby, Binaris, Spotinst ‣ K8s: KNative, Fission, Kubeless, Nuclio, OpenFaas ‣ Docker: Fn, OpenFaas

Slide 7

Slide 7

Java? ‣ Not too well suited for short lived tasks ‣ JVM startup time ‣ JIT compiler ‣ Dependency initialisation ‣ Application initialisation

Slide 8

Slide 8

Location Tracker: Owntracks + Lambda + Kibana

Slide 9

Slide 9

Slide 10

Slide 10

https://github.com/spinscale/serverless-owntracks-kotlin

Slide 11

Slide 11

Location search

Slide 12

Slide 12

Reverse Geocoder ‣ Input: Latitude, Longitude ‣ Output: readable representation (City)

Slide 13

Slide 13

Search across points ‣ Each city gets indexed with a lat/lon pair ‣ Search for the next point to the supplied one ‣ Problem: Neighbours!

Slide 14

Slide 14

Point based search: Near neighbours

Slide 15

Slide 15

Point based search: Near neighbours

Slide 16

Slide 16

Search across shapes ‣ Each city gets indexed with a lat/lon pair ‣ Certain cities get indexed as a geoshape ‣ Run two searches: ‣ Lat/Lon within any shape ‣ Lat/Lon nearby any point

Slide 17

Slide 17

Geo and Lucene: BFF! ‣ LatLonPoint: two points, 4 bytes each ‣ LatLonShape: triangular mesh tesselation

Slide 18

Slide 18

Geo and Lucene: BFF! https://home.apache.org/~mikemccand/geobench.html

Slide 19

Slide 19

Geo and Lucene: BFF! https://home.apache.org/~mikemccand/geobench.html

Slide 20

Slide 20

Serverless Lucene ‣ Local execution, index part of the package ‣ Offline index creation ‣ Packaging index into code ‣ Index needs to be unpacked, using Lucene via classpath resources is tricky

Slide 21

Slide 21

Demo

Slide 22

Slide 22

Summary ‣ Works! ‣ Problem: Data quality, getting accurate shape data ‣ Problem: First invocation (up to 2s) ‣ JVM startup ‣ Lucene index opening

Slide 23

Slide 23

Faster startup & runtime

Slide 24

Slide 24

Enter GraalVM! ‣ A new compiler, supporting HotSpot and AOT compilation ‣ Graal compiler part of Java9 (experimental!) ‣ Graal JIT compiler part of Java10 (Linux 64bit only) ‣ Project Metropolis: Java-on-Java Hotspot implementation ‣ Truffle: Framework to implement other languages on top of graal (jruby replacement)

Slide 25

Slide 25

Enter GraalVM! ‣ AOT static compilation + SubstrateVM = executable binaries of java apps ‣ Using SubstrateVM ‣ Reflection!

Slide 26

Slide 26

Slide 27

Slide 27

Deployment model

Slide 28

Slide 28

AWS Lambda ‣ Deployment model: Zip archive in S3 bucket ‣ Execution: Download zip archive & execute ‣ Requires regular java start (AWS reduced JVM startup time) ‣ GraalVM can only be used with custom runtime

Slide 29

Slide 29

runtime flow

Slide 30

Slide 30

CUSTOM RUNTIME

Slide 31

Slide 31

AWS_LAMBDA_RUNTIME_API=localhost:12345 _HANDLER=”my_handler” /bin/bootstrap CUSTOM RUNTIME

Slide 32

Slide 32

AWS_LAMBDA_RUNTIME_API=localhost:12345 _HANDLER=”my_handler” /bin/bootstrap AWS ENDPOINT CUSTOM RUNTIME

Slide 33

Slide 33

AWS_LAMBDA_RUNTIME_API=localhost:12345 _HANDLER=”my_handler” /bin/bootstrap GET /2018-06-01/runtime/invocation/next AWS ENDPOINT CUSTOM RUNTIME

Slide 34

Slide 34

AWS_LAMBDA_RUNTIME_API=localhost:12345 _HANDLER=”my_handler” /bin/bootstrap GET /2018-06-01/runtime/invocation/next AWS ENDPOINT Lambda-Runtime-Aws-Request-Id: 123 { “body”: “{ “foo”: “bar” }”, “requestContext”: “{ … }”, “headers” : { … } } CUSTOM RUNTIME

Slide 35

Slide 35

AWS_LAMBDA_RUNTIME_API=localhost:12345 _HANDLER=”my_handler” /bin/bootstrap POST /2018-06-01/runtime/invocation/123/response { AWS ENDPOINT “body”: “{ “foo”: “bar” }”, “headers” : { … } } CUSTOM RUNTIME

Slide 36

Slide 36

AWS_LAMBDA_RUNTIME_API=localhost:12345 _HANDLER=”my_handler” /bin/bootstrap POST /2018-06-01/runtime/invocation/123/response { AWS ENDPOINT “body”: “{ “foo”: “bar” }”, “headers” : { … } } HTTP OK 202 CUSTOM RUNTIME

Slide 37

Slide 37

AWS_LAMBDA_RUNTIME_API=localhost:12345 _HANDLER=”my_handler” /bin/bootstrap POST /2018-06-01/runtime/invocation/123/error { AWS ENDPOINT “statusCode”: 500, “body” : “…” } CUSTOM RUNTIME

Slide 38

Slide 38

AWS_LAMBDA_RUNTIME_API=localhost:12345 _HANDLER=”my_handler” /bin/bootstrap POST /2018-06-01/runtime/init/error { AWS ENDPOINT “errorMessage” : “…”, “errorType” : “…” } CUSTOM RUNTIME

Slide 39

Slide 39

Google Cloud Run

Slide 40

Slide 40

Google Cloud Run ‣ Serverless done ‘right’? ‣ Docker container as a web application listening in $PORT ‣ Deployment model: docker push && gcloud beta run ‣ Easier to test ‣ Configurable concurrency ‣ Improved billing model due to concurrency

Slide 41

Slide 41

Demo

Slide 42

Slide 42

Summary

Slide 43

Slide 43

Summary ‣ Economics ‣ Billing model ‣ Runtime cost vs. development cost ‣ Break even vs. non serverless model ‣ Development ‣ Vendor lock-in ‣ Deployment model is important ‣ Cloud Run: Run serverless or as container ‣ Scalability strategies, base load container, increased load serverless? ‣ Observability

Slide 44

Slide 44

Discussion … ask all the things!

Slide 45

Slide 45

Links ‣ https://serverless.com/framework/docs/ ‣ https://www.openfaas.com ‣ https://cloud.google.com/knative/ ‣ https://kubeless.io/ ‣ https://fission.io/ ‣ http://fnproject.io/ ‣ https://nuclio.io ‣ https://openwhisk.incubator.apache.org/ ‣ https://www.graalvm.org ‣ https://openjdk.java.net/projects/metropolis/ ‣ https://github.com/oracle/graal/tree/master/substratevm ‣ https://en.wikipedia.org/wiki/Reverse_geocoding https://noti.st/spinscale/ACCnKE/running-a-serverless-lucene-reverse-geocoder