Vincent Warmerdam - Keynote "Natural Intelligence is All You Need [tm]"
Vložit
- čas přidán 25. 07. 2024
- In this talk I will try to show you what might happen if you allow yourself the creative freedom to rethink and reinvent common practices once in a while. As it turns out, in order to do that, natural intelligence is all you need. And we may start needing a lot of it in the near future
I've met a lot of authoritative people in my field who pass out advise that sounds like this:
Working on recommenders? Collect all the data! Sessions!
Working on text classification? That's a solved problem! Bert!
Working with embeddings? There's a library for that already!
Working on tabular data? XGBoost for the win! GridSearch!
In short: "this is how you do data science, don't go and reinvent the wheel".
If you spend 5 minutes thinking about "the invention of the wheel" though, then you may start to rethink. After all: the wheels on a bike are different from the wheels on an airplane, just like the wheels of a tractor. And for Pete's sake: that's a good thing! If we hadn't reinvented those wheels, we're be stuck with wooden horse carts.
So ... what might happen if we take the time to rethink a few things?
Specifically, this keynote will discuss the following topics:
text classification
fraud detection
product recommenders
active learning
embeddings
I hope you'll join me for some new ideas as well as some live demos.
Bio:
Vincent Warmerdam
Vincent D. Warmerdam is a software developer and senior data person. He’s currently works over at Explosion to work on data quality tools for developers. He’s also known for creating calmcode.io as well as a bunch of open source projects. You can check out his blog over at koaning.io to learn more about those.
===
www.pydata.org
PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R.
PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases.
00:00 Welcome!
00:10 Help us add time stamps or captions to this video! See the description for details.
Want to help add timestamps to our CZcams videos to help with discoverability? Find out more here: github.com/numfocus/CZcamsVi... - Věda a technologie
very inspiring 😄
His sessions are ALWAYS inspiring! Congrats
you know shit is about to be good when this guy talks
Heck yeah ! He's amazing! He questions the most basic aspects of data science, I love that about him. He's the one who goes in a crowd "why didn't you use simple linear regression? Why use this neural network for everything?!"
Timestamps (Generated by Whisper & GPT-4):
00:00 - Introduction to Keynote and Talk Preparation
00:36 - Article Discussion and Smartwatch Data Set Overview
01:16 - Statistics Course Case Study with the Data Set
02:01 - Data Visualization and Analysis Methodology
03:04 - Insights from Data Set and the 'Gorilla' Concept
03:13 - Real-World Application: Recommender Systems for Used Cars
05:04 - Shifting Strategies: Classifier Over Recommender
06:01 - Innovative Approach: Recommender System Reversal
07:01 - Influence of the Netflix Prize and Kaggle on Problem-Solving Approaches
08:05 - Concept of Reinventing the Wheel in Data Science
08:36 - New Data Set on Credit Card Fraud and Algorithmic Approaches
10:03 - Rethinking Algorithmic Approaches and Visualization Techniques
11:00 - Demonstration: Analyzing the Credit Card Fraud Data Set
13:44 - Utilizing Visualization for Predictive Analysis
14:17 - Interactive Data Exploration and Simplification
15:18 - Comparing Different Algorithmic Approaches
16:11 - Rethinking the Use of Random Forests in Fraud Detection
17:10 - The Importance of Human Learning in Data Analysis
20:30 - Transition to Word Embeddings and Conceptual Understanding
23:01 - Advanced Techniques in Natural Language Processing
25:06 - Exploring Phrase Embeddings for Enhanced Contextual Understanding
27:07 - The Importance of Rethinking Traditional Approaches
28:07 - Finding Inspiration in Unconventional Data Sets
30:10 - Building a Classifier for Novel Data Sets
32:12 - Rethinking Annotation and Classification Strategies
33:43 - Innovations in Data Annotation and Model Training
38:04 - The Optimality Trap in Data Science and Machine Learning
40:40 - Avoiding Monoculture Thinking in Data Problem Solving
42:43 - The Role of Doubt in Creative Problem Solving
44:50 - Encouraging Creativity and Independent Thinking in Data Science
45:48 - The Future of Data Science: Independence Over Tool Dependence
46:06 - Final Thoughts and Invitation to Workshop
Great talk and message. I agree we need doubt and curiosity. Always ask your DS, why did you choose this model?
I really loved this talk, man how could i work with you ?
You should volunteer for PyData, lots of interesting and smart people to work with :)
I was thinking exactly the same thing!
Very interesting session. Any chance you can share the Juypter Notebook for the credit card dataset?
29:00 agricultural photography