Without open data, there is no ethical machine learning.

A presentation at StrangeLoop in September 2023 in St. Louis, MO, USA by Erin Mikail Staples

Machine learning has experienced an explosion of popularity in recent years, and even more so in recent months. In response, companies including but not limited to Reddit, Twitter, and Snapchat have all closed off, restricted access, or made headlines for how “open” they are with data to train large generative models.

Managing large open datasets is often a thankless act — but it’s one that we must invest in if we truly want to prioritize a future that includes ethical machine learning. In this talk, we’ll explore the role open data plays in machine learning, why it’s concerning to see organizations close access to their API, the cost/benefits institutions and individuals run when they open their data, and — most importantly — we’ll walk through different resources, projects, and opportunities for individuals to direct us to a better future at this current juncture in open data and machine learning.