Decoding Data Lakehouse: A Technical Breakdown

A presentation at Big Data Technology Warsaw in in Warsaw, Poland by Dipankar Mazumdar

Data Warehouses (RDBMS-OLAP) have been foundational to democratizing data and enabling analytical use cases such as business intelligence and reporting for many years. However, OLAP systems present some well-known challenges. Cloud data lakes have addressed some of the shortcomings of OLAP systems, but they present their own set of challenges. More recently, organizations have often followed a two-tier architectural approach to take advantage of both these platforms, leveraging both cloud data lakes and OLAP systems. However, this approach brings additional challenges, complexities, and overhead. This session will present how a data lakehouse, a new architectural approach, achieves the same benefits of a data warehouse (OLAP) and cloud data lake combined, while also providing additional advantages. Through this session, the idea would be to break down the architecture of a data lakehouse & draw a comparison to the traditional data warehouse. We will focus on the following aspects:

  • defining a standard terminology for data warehouse
  • breaking down/categorizing DWH into 3 parts: -technical components - storage, compute, table/file format -technical capabilities - concurrency, latency, ad hoc queries -tech-independent practices - modeling, ETL, MDM, SCD
  • show how a lakehouse satisfies these 3 components & the additional value that it brings