PyData Triangle March 2021 Online Meetup |
|
PyData Triangle March 2021 Meetup Zoom Meeting
0:00:00 Intro 0:00:48 Presenter: Rachael Tatman Title: Rules + Deep Learning: Why you need both to build Conversational AI that actually works Presentation Overview: Current NLP research is focused on large, neural models and these models have seen a lot of success across many different applications. But to build a conversational AI system that works well in practice, there's no getting around it: you need some rules as well. This talk with put both rules and transformer models into their historical context in NLP and discuss best practices and examples for combining them in hybrid systems. Bio: Rachael is a developer advocate for Rasa, where she's helping developers build and deploy conversational AI applications using their open source framework. Rachael has a PhD in Linguistics from the University of Washington. Her research was on computational sociolinguistics, or how our social identity affects the way we use language in computational contexts. Previously she was a data scientist at Kaggle and is still a Grandmaster. 0:42:05 Presenter: Alex Lew Title: Probabilistic Scripting for Common-Sense Data Cleaning at Scale Presentation Overview: Real-world data is often messy and incomplete, littered with typos, duplicates, NULL values, and other errors or inconsistencies. Although cleaning dirty data is important for many workflows, it has proven difficult to automate: cleaning often requires common-sense reasoning and judgment calls about objects in the world. In this talk, I’ll introduce a new declarative-programming approach to automating common-sense data cleaning, based on recent advances in probabilistic programming. Our system, PClean, allows users to declare their uncertain knowledge about their datasets declaratively, and compiles efficient cleaning algorithms guided by the scripts. We’ll look at the probabilistic programming ideas that make PClean tick, and show how short ( less than 50-line) scripts can achieve state-of-the-art accuracy and performance on several cleaning tasks, scaling to millions of rows. Bio: Alex Lew is a Ph.D. student at MIT's Probabilistic Computing Project, and a lead researcher for Metaprob, an open-source probabilistic programming language embedded in Clojure(Script). He aims to build tools that empower everyone to use probabilistic modeling and inference to solve problems creatively. Before coming to MIT, Alex designed and taught a four-year high-school computer science curriculum at the Commonwealth School in Boston. And before that, I was a student at Yale, where I received a B.S. in computer science and mathematics in 2015. A native of Durham, NC, he also returns home each summer to teach at the Duke Machine Learning Summer School (and spend time with his family and their dogs!). === www.pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: https://github.com/numfocus/YouTubeVideoTimestamps |