Elasticsearch Securing a search engine while maintaining usability
Alexander Reelsen @spinscale alex@elastic.co
Slide 2
Elastic Stack
Slide 3
Elasticsearch in 10 seconds Search Engine (FTS, Analytics, Geo), near real-time Distributed, scalable, highly available, resilient Interface: HTTP & JSON Centrepiece of the Elastic Stack (Kibana, Logstash, Beats, APM, ML, App Search, Enterprise Search) Uneducated conservative guess: Tens of thousands of clusters worldwide, hundreds of thousands of instances
Slide 4
Agenda Security: Feature or non-functional requirement? Security Manager Production Mode vs. Development Mode Plugins Scripting language: Painless
Slide 5
Security Feature or non-functional requirement?
Slide 6
Software has to be secure! O RLY? Defensive programming Do not persist specific data (PCI DSS) Not exploitable (pro tip: not gonna happen) No unintended resource access (directory traversal) Least privilege principle Reduced impact surface (DoS)
https://www.theregister.co.uk/2017/03/26/miele_joins_internetofst_hall_of_shame/
Security as a non-functional requirement
Slide 7
Security as a feature Authentication Authorization (LDAP, users, PKI) TLS transport encryption Audit logging SSO/SAML/Kerberos
Slide 8
Security or safety or resiliency? Integrity checks Preventing OOMEs Prevent deep pagination Do not expose credentials in cluster state/REST APISs Stop writing data before running out of disk space Unable to call System.exit
Slide 9
„[T]HERE ARE KNOWN KNOWNS; THERE ARE THINGS WE KNOW WE KNOW. WE ALSO KNOW THERE ARE KNOWN UNKNOWNS; THAT IS TO SAY WE KNOW THERE ARE SOME THINGS WE DO NOT KNOW. BUT THERE ARE ALSO UNKNOWN UNKNOWNS – THERE ARE THINGS WE DO NOT KNOW WE DON’T KNOW.“
Donald Rumsfeld, former secretary of defense, IT Security Expert
Slide 10
„[T]HERE ARE KNOWN KNOWNS; THERE ARE THINGS WE KNOW WE KNOW. WE ALSO KNOW THERE ARE KNOWN UNKNOWNS; THAT IS TO SAY WE KNOW THERE ARE SOME THINGS WE DO NOT KNOW. BUT THERE ARE ALSO UNKNOWN UNKNOWNS – THERE ARE THINGS WE DO NOT KNOW WE DON’T KNOW.“
Donald Rumsfeld, former secretary of defense, IT Security Expert
Slide 11
„[T]HERE ARE KNOWN KNOWNS; THERE ARE THINGS WE KNOW WE KNOW. WE ALSO KNOW THERE ARE KNOWN UNKNOWNS; THAT IS TO SAY WE KNOW THERE ARE SOME THINGS WE DO NOT KNOW. BUT THERE ARE ALSO UNKNOWN UNKNOWNS – THERE ARE THINGS WE DO NOT KNOW WE DON’T KNOW.“
Donald Rumsfeld, former secretary of defense, IT Security Expert
Slide 12
„[T]HERE ARE KNOWN KNOWNS; THERE ARE THINGS WE KNOW WE KNOW. WE ALSO KNOW THERE ARE KNOWN UNKNOWNS; THAT IS TO SAY WE KNOW THERE ARE SOME THINGS WE DO NOT KNOW. BUT THERE ARE ALSO UNKNOWN UNKNOWNS – THERE ARE THINGS WE DO NOT KNOW WE DON’T KNOW.“
Donald Rumsfeld, former secretary of defense, IT Security Expert
Slide 13
Security Manager Have you ever called System.setSecurityManager()?
Slide 14
What is a sandbox? connect 192.168.1.1:9300
Your code
write /var/log/elasticsearch.log
unlink /var/lib/elasticsearch/…
✅ ✅ ✅
Slide 15
What is a sandbox? open /etc/passwd
Your code
connect bitcoin-miner.foo.bar
unlink /var/lib/elasticsearch
⛔ ⛔ ⛔
Slide 16
What is a sandbox? sandbox ✅ Your code
⛔
Slide 17
Introduction Sandbox your java application Prevent certain calls by your application Policy file grants permissions FilePermission (read, write) SocketPermission (connect, listen, accept) URLPermission, PropertyPermission, …
Slide 18
Java Security Manager Java Security Manager
Java Program
Policy
Drawbacks Hardcoded policies before startup DNS lookups are cached forever by default Forces you to think about dependencies! Many libraries are not even tested with the security manager, unknown code paths may be executed No OOM protection! No stack overflow protection! Granularity No protection against java agents
Slide 26
Reducing impact Bad things have less bad results
Slide 27
Reducing impact Elasticsearch integration of the Java Security Manager Least privilege principle Do not run as root No chance of forking a process Do not expose sensitive settings
Slide 28
Security Manager in Elasticsearch Initialization required before starting security manager Elasticsearch needs to read its configuration file first to find out about the file paths Native code needs to be executed first Solution: Start with empty security manager, bootstrap, apply secure security manager
Slide 29
Elasticsearch startup
JVM Startup
time
Slide 30
JVM Startup
Elasticsearch startup
time
Read configuration file
Slide 31
time
Read configuration file
JVM Startup
Elasticsearch startup
Native system calls
Slide 32
time
Native system calls
Read configuration file
JVM Startup
Elasticsearch startup
Set security manager
Slide 33
time
Set security manager
Native system calls
Read configuration file
JVM Startup
Elasticsearch startup
Load plugins
Slide 34
time
Load plugins
Set security manager
Native system calls
Read configuration file
JVM Startup
Elasticsearch startup
Bootstrap checks
Slide 35
time
Bootstrap checks
Load plugins
Set security manager
Native system calls
Read configuration file
JVM Startup
Elasticsearch startup
Network enabled
Slide 36
time
Network enabled
Bootstrap checks
Load plugins
Set security manager
Native system calls
Read configuration file
JVM Startup
Elasticsearch startup
Slide 37
Security Manager in Elasticsearch Special security manager is used Does not set exitVM permissions, only a few special classes are allowed to call Thread & ThreadGroup security is enforced Also SpecialPermission was added, a special marker permission to prevent elevation by scripts
Slide 38
Security Manager in Elasticsearch ESPolicy allows for loading from files plus dynamic configuration (from the ES configuration file) Bootstrap check for java.security.AllPermission
Slide 39
#noroot there is no reason to run code as root!
Slide 40
time
Network enabled
Bootstrap checks
Load plugins
Set security manager
Native system calls
Read configuration file
JVM Startup
Do not run as root
Slide 41
Do not run as root
Slide 42
seccomp … or how I loved to abort system calls
Slide 43
time
Network enabled
Bootstrap checks
Load plugins
Set security manager
Native system calls
Read configuration file
JVM Startup
Seccomp - prevent process forks
Slide 44
Seccomp - prevent process forks Security manager could fail Elasticsearch should still not be able to fork processes One way transition to tell the operating system to deny execve, fork, vfork, execveat system calls Works on Linux, Windows, Solaris, BSD, osx
Slide 45
Seccomp - prevent process forks
Slide 46
Seccomp - prevent process forks
Slide 47
seccomp sandbox seccomp ✅ Your code
⛔
Slide 48
DEMO
Slide 49
Production mode vs Development mode Annoying you now instead of devastating you later
Slide 50
time
Network enabled
Bootstrap checks
Load plugins
Set security manager
Native system calls
Read configuration file
JVM Startup
Bootstrap checks
Slide 51
Is your dev setup equivalent to production? Development environments are rarely setup like production ones How to ensure certain preconditions in production but not for development? What is a good indicator?
Slide 52
Mode check
Slide 53
Bootstrap checks
Slide 54
Bootstrap checks
Slide 55
Bootstrap checks
Slide 56
Plugins … remaining secure
Slide 57
time
Network enabled
Bootstrap checks
Load plugins
Set security manager
Native system calls
Read configuration file
JVM Startup
Bootstrap checks
Slide 58
Plugins in 60 seconds plugins are just zip files each plugin can have its own jars/dependencies each plugin is loaded with its own classloader each plugin can have its own security permissions ES core loads a bunch of code as modules (plugins that ship with Elasticsearch)
Introducing Painless A scripting language for Elasticsearch
Slide 66
Scripting: Why and how? Expression evaluation without needing to write java extensions for Elasticsearch
Node ingest script processor Search queries (dynamic requests & fields) Aggregations (dynamic buckets) Templating (Mustache)
Slide 67
Scripting in Elasticsearch MVEL Groovy Expressions Painless
Slide 68
Painless - a secure scripting language Hard to take an existing programming language and make it secure, but remain fast Sandboxing Whitelisting over blacklisting, per method Opt-in to regular expressions Prevent endless loops Detect self references to prevent stack overflows
Slide 69
DEMO
Slide 70
Summary Security is hard - let’s go shopping!
Slide 71
Summary Not using the Security Manager - what’s your excuse? Scripting is important, is your implementation secure? Use operating system features! If you allow for plugins, remain secure! If you remove features, have alternatives!
Slide 72
Summary Development has big impact on security Operations is happy to help what is there out of the box Developers know their application best! Don’t reinvent, check out existing features! Developers are responsible for writing secure code! Before something happens!
Slide 73
Thanks for listening! Questions?
Alexander Reelsen @spinscale alex@elastic.co
Pagination: Request
N
C
Find the first 10 results for Elasticsearch
Slide 82
Pagination: Request
N
C
Find the first 10 results for Elasticsearch
Slide 83
Pagination: Request
N
N
N
C
N
N
Find the first 10 results for Elasticsearch
Slide 84
Pagination: Query Phase
N
N
SortedPriorityQueue size = 50
N
C
N
N
Each node returns 10 results, create real top 10 out of 50
Slide 85
Pagination: Fetch phase
N
N
N
C
N
N
ask for the real top 10
Slide 86
Pagination: Query Phase
N
N
N
C
N
N
return real top 10
Slide 87
Pagination: Query
N
N
N
C
N
N
Find the 10 results starting at position 90
Slide 88
Pagination: Query Phase
N
N
SortedPriorityQueue size = 500
N
C
N
N
Each node returns 100 results, create real top 90-100 out of 500
Slide 89
Pagination: Query
N
N
N
C
N
N
Find the 10 results starting at position 99990
Slide 90
Pagination: Query Phase
N
N
SortedPriorityQueue size = 500000
N
C
N
N
Each node returns 100k results
Slide 91
Pagination: Query
1
N
N
C
N
100
Find the 10 results starting at position 99990 over 100 nodes
Slide 92
Pagination: Query
1
N
SortedPriorityQueue size = 10_000_000
N
C
N
100
Each node returns 100k results
Slide 93
Solution: search_after Do not use numerical positions Use keys where you stopped in the inverted index Let the client tell you what the last key was Just specify the last sort value from the last document returned as a starting point
Slide 94
Pagination: search_after
1
N
N
C
N
100
Find the 10 results starting at sort key name foo over 100 nodes
Slide 95
Pagination: search_after
1
N
SortedPriorityQueue size = 1000
N
C
N
100
Each node returns 10 results
Slide 96
Bonus replacing delete by query
Slide 97
delete_by_query removal/replace delete_by_query API was not safe API endpoint was removed extensive documentation was added what to do instead infrastructure for long running background tasks was added delete_by_query was reintroduced using above infra and doing the exact same thing as in the documentation data > convenience!
Slide 98
Thanks for listening! Questions?
Alexander Reelsen @spinscale alex@elastic.co