Landscape of Open Source Databases

A presentation at EMFCamp in June 2022 in Eastnor, Ledbury HR8, UK by Lorna Jane Mitchell

Slide 1

Slide 1

Landscape of Open Source Databases Lorna Mitchell, Aiven

Slide 2

Slide 2

Keeping up with Databases • We need more databases, because we have more data • The right technology choice is important • Open source is secureable and future-proof @aiven_io ~ @lornajane

Slide 3

Slide 3

Data Sources Two main sources of data • my own Opinion • https://db-engines.com/en/ranking @aiven_io ~ @lornajane

Slide 4

Slide 4

Relational Databases Traditional databases • pre-defined tables with columns • relations between tables, e.g. book has an author @aiven_io ~ @lornajane

Slide 5

Slide 5

MySQL License: GPLv2 World’s most-used open source database • part of LAMP stack (Linux Apache MySQL PHP/Python/Perl) • proprietary Enterprise Server version also available @aiven_io ~ @lornajane

Slide 6

Slide 6

MariaDB License: GPLv2 • drop-in replacement for MySQL • support for additional storage engines • proprietary Enterprise Server version also available @aiven_io ~ @lornajane

Slide 7

Slide 7

PostgreSQL License: PostgreSQL license (MIT-ish) • powerful and performant relational database • many contributors, healthy community • lots of extensions @aiven_io ~ @lornajane

Slide 8

Slide 8

PostGIS License: GPLv2 Spatial database, as an extension to PostgreSQL. • support for geographical object data types • functions for working with area, distance, etc • specialist indexes to support spatial queries @aiven_io ~ @lornajane

Slide 9

Slide 9

TimescaleDB License: Apache2, some features TSL Extension for PostgreSQL • table types for timeseries data • additional SQL functions @aiven_io ~ @lornajane

Slide 10

Slide 10

Time Series Data Time series data: • a timestamp • a measurement @aiven_io ~ @lornajane

Slide 11

Slide 11

InfluxDB License: MIT • time series database • IoT, metrics, energy • clustered version has proprietary license @aiven_io ~ @lornajane

Slide 12

Slide 12

Re-use wire protocols Build a new database, use an existing wire protocol to get clients and integrations Examples: • CrateDB uses PostgreSQL protocol • VictoriaMetrics and M3DB use Influx and Prometheus protocols @aiven_io ~ @lornajane

Slide 13

Slide 13

SQLite License: public domain • file based, no server • embeddable • ideal edge model database @aiven_io ~ @lornajane

Slide 14

Slide 14

Redis License: BSD Speedy in-memory key value store • used for caching, queueing • supports many data types (lists, sets, hashes, etc) • 3rd most popular open source database @aiven_io ~ @lornajane

Slide 15

Slide 15

Key/Value Stores Other key value stores worth a mention: • Memcached • etcd • ArangoDB @aiven_io ~ @lornajane

Slide 16

Slide 16

Apache Cassandra License: Apache2 • distributed database for commodity hardware • designed for very large volumes of data • use denormalised data storage (no joins) @aiven_io ~ @lornajane

Slide 17

Slide 17

Distributed Databases Horizontally scalable for writes, spread across multiple nodes • data organised into shards or partitions • usually also replicated for redundancy • complexity handled by database @aiven_io ~ @lornajane

Slide 18

Slide 18

OpenSearch License: Apache2 Open source fork of Elasticsearch • powerful search and aggregation features • flexible data structure, but defined indexes • Opensearch Dashboards is the fork of Kibana @aiven_io ~ @lornajane

Slide 19

Slide 19

Open Source Databases Best technology around, whatever your data needs @aiven_io ~ @lornajane

Slide 20

Slide 20

Resources • https://aiven.io - DBaaS • https://uptime.aiven.io - Open source data event • https://lornajane.net - my websitb/blog • 7 Databases in 7 Weeks (2nd edition) @aiven_io ~ @lornajane