Elasticsearch - Securing software while maintaining usability

A presentation at JavaDay Odesa in September 2019 in Odesa, Odessa Oblast, Ukraine, 65000 by Alexander Reelsen

Slide 1

Slide 1

Elasticsearch Securing a search engine while maintaining usability Alexander Reelsen @spinscale alex@elastic.co

Slide 2

Slide 2

Elastic Stack

Slide 3

Slide 3

Elasticsearch in 10 seconds Search Engine (FTS, Analytics, Geo), near real-time Distributed, scalable, highly available, resilient Interface: HTTP & JSON Centrepiece of the Elastic Stack (Kibana, Logstash, Beats, APM, ML, App Search, Enterprise Search) Uneducated conservative guess: Tens of thousands of clusters worldwide, hundreds of thousands of instances

Slide 4

Slide 4

Agenda Security: Feature or non-functional requirement? Security Manager Production Mode vs. Development Mode Plugins Scripting language: Painless

Slide 5

Slide 5

Security Feature or non-functional requirement?

Slide 6

Slide 6

Software has to be secure! O RLY? Defensive programming Do not persist specific data (PCI DSS) Not exploitable (pro tip: not gonna happen) No unintended resource access (directory traversal) Least privilege principle Reduced impact surface (DoS) https://www.theregister.co.uk/2017/03/26/miele_joins_internetofst_hall_of_shame/ Security as a non-functional requirement

Slide 7

Slide 7

Security as a feature Authentication Authorization (LDAP, users, PKI) TLS transport encryption Audit logging SSO/SAML/Kerberos

Slide 8

Slide 8

Security or safety or resiliency? Integrity checks Preventing OOMEs Prevent deep pagination Do not expose credentials in cluster state/REST APISs Stop writing data before running out of disk space Unable to call System.exit

Slide 9

Slide 9

„[T]HERE ARE KNOWN KNOWNS; THERE ARE THINGS WE KNOW WE KNOW. WE ALSO KNOW THERE ARE KNOWN UNKNOWNS; THAT IS TO SAY WE KNOW THERE ARE SOME THINGS WE DO NOT KNOW. BUT THERE ARE ALSO UNKNOWN UNKNOWNS – THERE ARE THINGS WE DO NOT KNOW WE DON’T KNOW.“ Donald Rumsfeld, former secretary of defense, IT Security Expert

Slide 10

Slide 10

„[T]HERE ARE KNOWN KNOWNS; THERE ARE THINGS WE KNOW WE KNOW. WE ALSO KNOW THERE ARE KNOWN UNKNOWNS; THAT IS TO SAY WE KNOW THERE ARE SOME THINGS WE DO NOT KNOW. BUT THERE ARE ALSO UNKNOWN UNKNOWNS – THERE ARE THINGS WE DO NOT KNOW WE DON’T KNOW.“ Donald Rumsfeld, former secretary of defense, IT Security Expert

Slide 11

Slide 11

„[T]HERE ARE KNOWN KNOWNS; THERE ARE THINGS WE KNOW WE KNOW. WE ALSO KNOW THERE ARE KNOWN UNKNOWNS; THAT IS TO SAY WE KNOW THERE ARE SOME THINGS WE DO NOT KNOW. BUT THERE ARE ALSO UNKNOWN UNKNOWNS – THERE ARE THINGS WE DO NOT KNOW WE DON’T KNOW.“ Donald Rumsfeld, former secretary of defense, IT Security Expert

Slide 12

Slide 12

„[T]HERE ARE KNOWN KNOWNS; THERE ARE THINGS WE KNOW WE KNOW. WE ALSO KNOW THERE ARE KNOWN UNKNOWNS; THAT IS TO SAY WE KNOW THERE ARE SOME THINGS WE DO NOT KNOW. BUT THERE ARE ALSO UNKNOWN UNKNOWNS – THERE ARE THINGS WE DO NOT KNOW WE DON’T KNOW.“ Donald Rumsfeld, former secretary of defense, IT Security Expert

Slide 13

Slide 13

Security Manager Have you ever called System.setSecurityManager()?

Slide 14

Slide 14

What is a sandbox? connect 192.168.1.1:9300 Your code write /var/log/elasticsearch.log unlink /var/lib/elasticsearch/… ✅ ✅ ✅

Slide 15

Slide 15

What is a sandbox? open /etc/passwd Your code connect bitcoin-miner.foo.bar unlink /var/lib/elasticsearch ⛔ ⛔ ⛔

Slide 16

Slide 16

What is a sandbox? sandbox ✅ Your code ⛔

Slide 17

Slide 17

Introduction Sandbox your java application Prevent certain calls by your application Policy file grants permissions FilePermission (read, write) SocketPermission (connect, listen, accept) URLPermission, PropertyPermission, …

Slide 18

Slide 18

Java Security Manager Java Security Manager Java Program Policy

Slide 19

Slide 19

Java Security Manager Java Security Manager Policy FilePermission read /etc/elasticsearch Java Program FilePermission write /var/log/elasticsearch SocketPermission connect *

Slide 20

Slide 20

Java Security Manager Java Security Manager Policy Java Program

Slide 21

Slide 21

Introduction

Slide 22

Slide 22

Introduction

Slide 23

Slide 23

DEMO

Slide 24

Slide 24

OHAI JLS https://docs.oracle.com/javase/specs/jls/se11/html/jls-17.html#jls-17.5.3

Slide 25

Slide 25

Drawbacks Hardcoded policies before startup DNS lookups are cached forever by default Forces you to think about dependencies! Many libraries are not even tested with the security manager, unknown code paths may be executed No OOM protection! No stack overflow protection! Granularity No protection against java agents

Slide 26

Slide 26

Reducing impact Bad things have less bad results

Slide 27

Slide 27

Reducing impact Elasticsearch integration of the Java Security Manager Least privilege principle Do not run as root No chance of forking a process Do not expose sensitive settings

Slide 28

Slide 28

Security Manager in Elasticsearch Initialization required before starting security manager Elasticsearch needs to read its configuration file first to find out about the file paths Native code needs to be executed first Solution: Start with empty security manager, bootstrap, apply secure security manager

Slide 29

Slide 29

Elasticsearch startup JVM Startup time

Slide 30

Slide 30

JVM Startup Elasticsearch startup time Read configuration file

Slide 31

Slide 31

time Read configuration file JVM Startup Elasticsearch startup Native system calls

Slide 32

Slide 32

time Native system calls Read configuration file JVM Startup Elasticsearch startup Set security manager

Slide 33

Slide 33

time Set security manager Native system calls Read configuration file JVM Startup Elasticsearch startup Load plugins

Slide 34

Slide 34

time Load plugins Set security manager Native system calls Read configuration file JVM Startup Elasticsearch startup Bootstrap checks

Slide 35

Slide 35

time Bootstrap checks Load plugins Set security manager Native system calls Read configuration file JVM Startup Elasticsearch startup Network enabled

Slide 36

Slide 36

time Network enabled Bootstrap checks Load plugins Set security manager Native system calls Read configuration file JVM Startup Elasticsearch startup

Slide 37

Slide 37

Security Manager in Elasticsearch Special security manager is used Does not set exitVM permissions, only a few special classes are allowed to call Thread & ThreadGroup security is enforced Also SpecialPermission was added, a special marker permission to prevent elevation by scripts

Slide 38

Slide 38

Security Manager in Elasticsearch ESPolicy allows for loading from files plus dynamic configuration (from the ES configuration file) Bootstrap check for java.security.AllPermission

Slide 39

Slide 39

#noroot there is no reason to run code as root!

Slide 40

Slide 40

time Network enabled Bootstrap checks Load plugins Set security manager Native system calls Read configuration file JVM Startup Do not run as root

Slide 41

Slide 41

Do not run as root

Slide 42

Slide 42

seccomp … or how I loved to abort system calls

Slide 43

Slide 43

time Network enabled Bootstrap checks Load plugins Set security manager Native system calls Read configuration file JVM Startup Seccomp - prevent process forks

Slide 44

Slide 44

Seccomp - prevent process forks Security manager could fail Elasticsearch should still not be able to fork processes One way transition to tell the operating system to deny execve, fork, vfork, execveat system calls Works on Linux, Windows, Solaris, BSD, osx

Slide 45

Slide 45

Seccomp - prevent process forks

Slide 46

Slide 46

Seccomp - prevent process forks

Slide 47

Slide 47

seccomp sandbox seccomp ✅ Your code ⛔

Slide 48

Slide 48

DEMO

Slide 49

Slide 49

Production mode vs Development mode Annoying you now instead of devastating you later

Slide 50

Slide 50

time Network enabled Bootstrap checks Load plugins Set security manager Native system calls Read configuration file JVM Startup Bootstrap checks

Slide 51

Slide 51

Is your dev setup equivalent to production? Development environments are rarely setup like production ones How to ensure certain preconditions in production but not for development? What is a good indicator?

Slide 52

Slide 52

Mode check

Slide 53

Slide 53

Bootstrap checks

Slide 54

Slide 54

Bootstrap checks

Slide 55

Slide 55

Bootstrap checks

Slide 56

Slide 56

Plugins … remaining secure

Slide 57

Slide 57

time Network enabled Bootstrap checks Load plugins Set security manager Native system calls Read configuration file JVM Startup Bootstrap checks

Slide 58

Slide 58

Plugins in 60 seconds plugins are just zip files each plugin can have its own jars/dependencies each plugin is loaded with its own classloader each plugin can have its own security permissions ES core loads a bunch of code as modules (plugins that ship with Elasticsearch)

Slide 59

Slide 59

Plugins & modules Java Security Manager Policy Elasticsearch Plugin

Slide 60

Slide 60

Plugins & modules Java Security Manager Elasticsearch Plugin Policy

Slide 61

Slide 61

Plugins & modules Elasticsearch Module Elasticsearch Plugin Policy Policy Elasticsearch Module Policy

Slide 62

Slide 62

Sample permissions

Slide 63

Slide 63

Sample permissions

Slide 64

Slide 64

Sample permissions

Slide 65

Slide 65

Introducing Painless A scripting language for Elasticsearch

Slide 66

Slide 66

Scripting: Why and how? Expression evaluation without needing to write java extensions for Elasticsearch Node ingest script processor Search queries (dynamic requests & fields) Aggregations (dynamic buckets) Templating (Mustache)

Slide 67

Slide 67

Scripting in Elasticsearch MVEL Groovy Expressions Painless

Slide 68

Slide 68

Painless - a secure scripting language Hard to take an existing programming language and make it secure, but remain fast Sandboxing Whitelisting over blacklisting, per method Opt-in to regular expressions Prevent endless loops Detect self references to prevent stack overflows

Slide 69

Slide 69

DEMO

Slide 70

Slide 70

Summary Security is hard - let’s go shopping!

Slide 71

Slide 71

Summary Not using the Security Manager - what’s your excuse? Scripting is important, is your implementation secure? Use operating system features! If you allow for plugins, remain secure! If you remove features, have alternatives!

Slide 72

Slide 72

Summary Development has big impact on security Operations is happy to help what is there out of the box Developers know their application best! Don’t reinvent, check out existing features! Developers are responsible for writing secure code! Before something happens!

Slide 73

Slide 73

Thanks for listening! Questions? Alexander Reelsen @spinscale alex@elastic.co

Slide 74

Slide 74

Resources

Slide 75

Slide 75

Resources

Slide 76

Slide 76

Resources https://github.com/elastic/elasticsearch/ https://www.elastic.co/blog/bootstrap_checks_annoying_instead_of_devastating https://www.elastic.co/blog/scripting https://www.elastic.co/blog/scripting-security https://docs.oracle.com/javase/9/security/toc.htm https://docs.oracle.com/javase/9/security/permissions-java-development-kit.htm https://www.elastic.co/blog/seccomp-in-the-elastic-stack https://github.com/spinscale/talk-elasticsearch-security-manager-and-seccomp

Slide 77

Slide 77

Bonus register all your settings

Slide 78

Slide 78

Mark sensitive settings

Slide 79

Slide 79

Register all your settings

Slide 80

Slide 80

Bonus deep pagination vs search_after

Slide 81

Slide 81

Pagination: Request N C Find the first 10 results for Elasticsearch

Slide 82

Slide 82

Pagination: Request N C Find the first 10 results for Elasticsearch

Slide 83

Slide 83

Pagination: Request N N N C N N Find the first 10 results for Elasticsearch

Slide 84

Slide 84

Pagination: Query Phase N N SortedPriorityQueue size = 50 N C N N Each node returns 10 results, create real top 10 out of 50

Slide 85

Slide 85

Pagination: Fetch phase N N N C N N ask for the real top 10

Slide 86

Slide 86

Pagination: Query Phase N N N C N N return real top 10

Slide 87

Slide 87

Pagination: Query N N N C N N Find the 10 results starting at position 90

Slide 88

Slide 88

Pagination: Query Phase N N SortedPriorityQueue size = 500 N C N N Each node returns 100 results, create real top 90-100 out of 500

Slide 89

Slide 89

Pagination: Query N N N C N N Find the 10 results starting at position 99990

Slide 90

Slide 90

Pagination: Query Phase N N SortedPriorityQueue size = 500000 N C N N Each node returns 100k results

Slide 91

Slide 91

Pagination: Query 1 N N C N 100 Find the 10 results starting at position 99990 over 100 nodes

Slide 92

Slide 92

Pagination: Query 1 N SortedPriorityQueue size = 10_000_000 N C N 100 Each node returns 100k results

Slide 93

Slide 93

Solution: search_after Do not use numerical positions Use keys where you stopped in the inverted index Let the client tell you what the last key was Just specify the last sort value from the last document returned as a starting point

Slide 94

Slide 94

Pagination: search_after 1 N N C N 100 Find the 10 results starting at sort key name foo over 100 nodes

Slide 95

Slide 95

Pagination: search_after 1 N SortedPriorityQueue size = 1000 N C N 100 Each node returns 10 results

Slide 96

Slide 96

Bonus replacing delete by query

Slide 97

Slide 97

delete_by_query removal/replace delete_by_query API was not safe API endpoint was removed extensive documentation was added what to do instead infrastructure for long running background tasks was added delete_by_query was reintroduced using above infra and doing the exact same thing as in the documentation data > convenience!

Slide 98

Slide 98

Thanks for listening! Questions? Alexander Reelsen @spinscale alex@elastic.co