A presentation at Open Source Summit in in Paris, France by Stefane Fermigier
OlaPy, un outil pour l’analyse de données métier Business analytics with OlaPy Paris Open Source Summit - 11 Dec. 2019 Stéfane Fermigier Founder & CEO, Abilian - Enterprise Social Software
Olapy in brief • • • • • • • Developed since 2016 by Abilian In-memory data processing using Pandas Aggregated data browsing MDX support XMLA interface (-> Excel) Multiple back-ends (CSV, SQL) Simple web front-end and in-browser app
Before we start / motivations
Who am I ? • • • Stefane Fermigier, Python developer since 1996 Founder of Abilian SAS • Python shop, developing business application (collaboration, CRM, workflow…) • R&D activity (Wendelin -> Olapy) Organizer of the PyData Paris / PyParis conference (2014-2018)
Why use Python for business data analysis ? • • Why not? :) • As a Python shop, we’d like to leverage this leadership in data processing tools to build exploration / reporting features in our business applications using a familiar language Python is one of the leading languages for data science / data processing, and also a leading language for web & business apps
Concepts and architecture
On-Line Analytical Processing (OLAP) & Multidimensional Databases • • • A multidimensional DB is an hypercube Black Friday Axes are called user-defined dimensions Cells contain measures calculated from more or less complex formulas Continent mesures Geography Country Sub category City Category Company Operators on the cube are algebraic (return a cube) and can thus be combined 2014 2015 Time Multi-dimensional database = “super-spreadsheet” 2016 Pr od uc t • dimensions
MDX: a query language for business analytics • • • MDX = Multi Dimensional Expressions SQL extension for querying a multi-dimensional database Example: SELECT [Geography].[Geo].[Country] ON ROWS, [Time].[Calendar].[Year].[2010] ON COLUMNS FROM sales WHERE [Measures].[Count]
XMLA - Extensible Markup Language for Analysis • • Data Access Protocol Supports exchange of analytical data between clients and servers • Available on any device or platform • Using any programming language • SOAP with just 2 methods • Discover • Execute
Detailed architecture
Benchmarks (WIP)
Use cases & applications
From a spreadsheet software (e.g. Excel) • Install & run: pip install olapy olapy runserver • Then, from excel go to: • • • Data/from other sources/ And on “analyses services” Use URL: http://127.0.0.1:8000/xmla
Other clients • xmla.js : JavaScript client • • olap4j: Java client • • Ongoing work to be able to call OlaPy (or any other XMLA server) from browser-based spreadsheet software, such as OnlyOffice, Jexcel, Sheetjs, etc. Used (among others) by the PalOOca plugin for LibreOffice Clients also for Python, .NET, Perl, Ruby, etc.
Web application (POC) • • • • • Flask-based Web application (other framework will be supported) GUI-based MDX query editor GUI-based data explore / aggregator Graphical widgets Support for dashboarding
As a Python library - using Jupyter (or not)
Notebook in the browser - using Pyiodide • Pyodide brings the Python runtime to the browser via WebAssembly, along with the Python scientific stack including NumPy, Pandas, Matplotlib, parts of SciPy, and NetworkX. The packages directory lists over 35 packages which are currently available. • Pyodide provides transparent conversion of objects between Javascript and Python. When used inside a browser, Python has full access to the Web APIs. • While closely related to the iodide project, a tool for literate scientific computing and communication for the web, Pyodide goes beyond running in a notebook environment. To maximize the flexibility of the modern web, Pyodide may be used standalone in any context where you want to run Python inside a web browser.
Notebook in the browser - using Pyiodide
Out-of-core in-memory computing - using Wendelin “Wendelin is a big data framework designed for industrial applications based on python, NumPy, Scipy and other NumPy based libraries. It uses at its core the NEO distributed transactional NoSQL database to store petabytes of binary data. Wendelin combines the performance of scikit-learn machine learning with NEO distributed storage in order to provide out-of-core processing of large data sets. Its goal is to bring the best open source, big data engine based on Numpy python technologies and gather a wide community of contributors of new data analytics algorithms.”
Roadmap and support
Roadmap • Version 0.8 will be released before year end • • Last version to support Python 2.7 Then (2020): • Supported release of Olapy / Pyodide • Integration with Web spreadsheets • Web app (both standalone and as a component) • More use cases
Support offer • • Starting with release 0.8, we will sell support on Olapy Contact us for details :)
Questions ?