Data lake: how Red Hat maintains data quality across multiple Drupal sites

A presentation at DrupalCon Pittsburgh 2023 in in Pittsburgh, PA, USA by April Sides

This was co-presented with Melissa Bent.

Data accuracy and consistency is an important goal for any organization.

Maintaining data quality across multiple websites and applications (Drupal or otherwise), with different teams managing the same data in multiple systems, becomes complex and difficult to manage. Having a pool of data becomes an attractive solution to resolve some of these issues and allow for greater transparency and consistency across an organization. But, creating a scalable, reliable, and useful system can brings its own challenges.

Join us, as we explore several ways that Red Hat is using a data lake architecture to share data between different Drupal sites.

We’ll cover:

  • What is a data lake?
  • The benefits, challenges, and considerations of using a data lake.
  • Several ways Red Hat has integrated a data lake architecture with Drupal.
  • Lessons learned along the way.

Video