Business analytics with OlaPy

A presentation at Open Source Summit in December 2019 in Paris, France by Stefane Fermigier

Slide 1

Slide 1

OlaPy, un outil pour l’analyse de données métier Business analytics with OlaPy Paris Open Source Summit - 11 Dec. 2019 Stéfane Fermigier Founder & CEO, Abilian - Enterprise Social Software

Slide 2

Slide 2

Olapy in brief • • • • • • • Developed since 2016 by Abilian In-memory data processing using Pandas Aggregated data browsing MDX support XMLA interface (-> Excel) Multiple back-ends (CSV, SQL) Simple web front-end and in-browser app

Slide 3

Slide 3

Before we start / motivations

Slide 4

Slide 4

Who am I ? • • • Stefane Fermigier, Python developer since 1996 Founder of Abilian SAS • Python shop, developing business application (collaboration, CRM, workflow…) • R&D activity (Wendelin -> Olapy) Organizer of the PyData Paris / PyParis conference (2014-2018)

Slide 5

Slide 5

Why use Python for business data analysis ? • • Why not? :) • As a Python shop, we’d like to leverage this leadership in data processing tools to build exploration / reporting features in our business applications using a familiar language Python is one of the leading languages for data science / data processing, and also a leading language for web & business apps

Slide 6

Slide 6

Concepts and architecture

Slide 7

Slide 7

On-Line Analytical Processing (OLAP) & Multidimensional Databases • • • A multidimensional DB is an hypercube Black Friday Axes are called user-defined dimensions Cells contain measures calculated from more or less complex formulas Continent mesures Geography Country Sub category City Category Company Operators on the cube are algebraic (return a cube) and can thus be combined 2014 2015 Time Multi-dimensional database = “super-spreadsheet” 2016 Pr od uc t • dimensions

Slide 8

Slide 8

MDX: a query language for business analytics • • • MDX = Multi Dimensional Expressions SQL extension for querying a multi-dimensional database Example: SELECT [Geography].[Geo].[Country] ON ROWS, [Time].[Calendar].[Year].[2010] ON COLUMNS FROM sales WHERE [Measures].[Count]

Slide 9

Slide 9

XMLA - Extensible Markup Language for Analysis • • Data Access Protocol Supports exchange of analytical data between clients and servers • Available on any device or platform • Using any programming language • SOAP with just 2 methods • Discover • Execute

Slide 10

Slide 10

Detailed architecture

Slide 11

Slide 11

Benchmarks (WIP)

Slide 12

Slide 12

Use cases & applications

Slide 13

Slide 13

From a spreadsheet software (e.g. Excel) • Install & run: pip install olapy olapy runserver • Then, from excel go to: • • • Data/from other sources/ And on “analyses services” Use URL: http://127.0.0.1:8000/xmla

Slide 14

Slide 14

Slide 15

Slide 15

Other clients • xmla.js : JavaScript client • • olap4j: Java client • • Ongoing work to be able to call OlaPy (or any other XMLA server) from browser-based spreadsheet software, such as OnlyOffice, Jexcel, Sheetjs, etc. Used (among others) by the PalOOca plugin for LibreOffice Clients also for Python, .NET, Perl, Ruby, etc.

Slide 16

Slide 16

Web application (POC) • • • • • Flask-based Web application (other framework will be supported) GUI-based MDX query editor GUI-based data explore / aggregator Graphical widgets Support for dashboarding

Slide 17

Slide 17

Slide 18

Slide 18

As a Python library - using Jupyter (or not)

Slide 19

Slide 19

Notebook in the browser - using Pyiodide • Pyodide brings the Python runtime to the browser via WebAssembly, along with the Python scientific stack including NumPy, Pandas, Matplotlib, parts of SciPy, and NetworkX. The packages directory lists over 35 packages which are currently available. • Pyodide provides transparent conversion of objects between Javascript and Python. When used inside a browser, Python has full access to the Web APIs. • While closely related to the iodide project, a tool for literate scientific computing and communication for the web, Pyodide goes beyond running in a notebook environment. To maximize the flexibility of the modern web, Pyodide may be used standalone in any context where you want to run Python inside a web browser.

Slide 20

Slide 20

Notebook in the browser - using Pyiodide

Slide 21

Slide 21

Out-of-core in-memory computing - using Wendelin “Wendelin is a big data framework designed for industrial applications based on python, NumPy, Scipy and other NumPy based libraries. It uses at its core the NEO distributed transactional NoSQL database to store petabytes of binary data. Wendelin combines the performance of scikit-learn machine learning with NEO distributed storage in order to provide out-of-core processing of large data sets. Its goal is to bring the best open source, big data engine based on Numpy python technologies and gather a wide community of contributors of new data analytics algorithms.”

Slide 22

Slide 22

Roadmap and support

Slide 23

Slide 23

Roadmap • Version 0.8 will be released before year end • • Last version to support Python 2.7 Then (2020): • Supported release of Olapy / Pyodide • Integration with Web spreadsheets • Web app (both standalone and as a component) • More use cases

Slide 24

Slide 24

Support offer • • Starting with release 0.8, we will sell support on Olapy Contact us for details :)

Slide 25

Slide 25

Questions ?