![RichardOnData](/img/default-banner.jpg)
- 128
- 908 911
RichardOnData
United States
Registrace 15. 09. 2019
Knowledge and education are important. But you DON'T have to have multiple PhD's in mathematics, computer science, and statistics -- as well as working knowledge of C++, Java, Python, and Hadoop in order to get MOST data science jobs.
On this channel, I want to help make both getting a data science job and acquiring the skills and thought processes you need a lot simpler and easier than that. I post here at least once a week with videos on: my own experiences in data science, on what's happening in the data science industry, practical applied statistics tutorials, R programming tutorials, and explain concepts so that you can understand them and use them! (Whether that's in the job you have now, the job you're trying to get, or for entertainment.)
On this channel, I want to help make both getting a data science job and acquiring the skills and thought processes you need a lot simpler and easier than that. I post here at least once a week with videos on: my own experiences in data science, on what's happening in the data science industry, practical applied statistics tutorials, R programming tutorials, and explain concepts so that you can understand them and use them! (Whether that's in the job you have now, the job you're trying to get, or for entertainment.)
Classification Metrics Explained | Sensitivity, Precision, AUROC, & More
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html
In this video, I go through the different types of binary classification metrics. These include: accuracy, prevalence, confusion matrices, sensitivity (aka recall or true positive rate), specificity (aka true negative rate), precision (aka positive predictive value), F1 score, and the areas under the precision-recall curve and the receiver operating characteristic curve, that is: AUPRC and AUROC. We close with how to implement these using the scikit-learn package in Python, going through a Jupyter notebook.
Code can be found here: github.com/RichardOnData/CZcams/blob/main/Python%20Notebooks/classification_metrics.ipynb
Patreon: www.patreon.com/richardondata
BTC: 3LM5d1vibhp1F7pcxAFX8Ys1DM6XLUoNVL
ETH: 0x3CfC599C4c1040963B644780a0E62d45999bE9D8
LTC: MH8yPjvSmKvpmRRmufofjRB9hnRAFHfx32
In this video, I go through the different types of binary classification metrics. These include: accuracy, prevalence, confusion matrices, sensitivity (aka recall or true positive rate), specificity (aka true negative rate), precision (aka positive predictive value), F1 score, and the areas under the precision-recall curve and the receiver operating characteristic curve, that is: AUPRC and AUROC. We close with how to implement these using the scikit-learn package in Python, going through a Jupyter notebook.
Code can be found here: github.com/RichardOnData/CZcams/blob/main/Python%20Notebooks/classification_metrics.ipynb
Patreon: www.patreon.com/richardondata
BTC: 3LM5d1vibhp1F7pcxAFX8Ys1DM6XLUoNVL
ETH: 0x3CfC599C4c1040963B644780a0E62d45999bE9D8
LTC: MH8yPjvSmKvpmRRmufofjRB9hnRAFHfx32
zhlédnutí: 329
Video
SHAP Values: An Overview
zhlédnutí 435Před měsícem
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html In this video, I talk about SHAP values and how these can be used for explainable AI and explaining how features contribute to a machine learning's predictions for each observation. These are great tools when your goal isn't (only) prediction, but is also inference - that is, understanding the most important featur...
Is ChatGPT-4 Worth It?
zhlédnutí 662Před měsícem
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html NOTE: Sorry about the bad audio quality on this one. I switched microphones when I upgraded phones recently, and thought during testing that it would be a lot better than it was here. Looking into a REAL microphone upgrade here. NOTE 2: I didn't talk about DALL-E on this one, which is another feature to GPT-4. The ...
Follow THESE 5 Tips to Get a Data Job
zhlédnutí 836Před 3 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html In this video I'll break down some tips that I have to get data jobs. This is going to be broad and apply to all types of positions, whether those are data analyst, data science, or data engineering jobs! To summarize: 1) Have good education in a field like statistics, computer science, math, engineering, business,...
Learn (and Do) Data Science FAST with ChatGPT
zhlédnutí 987Před 3 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html In this video I show some ways I've used ChatGPT to both learn, and to data science faster. ChatGPT can be an excellent tool if you're responsible with it. It can provide great ideas to help get through creative roadblocks, as well as to generate great coding examples that you can turn around and use to learn. You ...
The Data Job Market in 2024
zhlédnutí 7KPřed 4 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html My thoughts on the data job market in 2024. I looked at data scientist, data analyst, data engineer, and machine learning engineer jobs. In particular we talk about some broader trends in tech more recently, the recent tech layoffs, and what hiring and salaries are looking like for these positions. Crunchbase: news...
No, AI (Probably) Won’t Take Your Data Job Soon
zhlédnutí 596Před 4 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html NOTE: The beginning of this video is somewhat tongue in cheek. Certain things, you just have to let yourself have fun with. Some of the articles and videos I reference make very different points, specifically regarding the rise of data engineering and constructing end-to-end machine learning pipelines. Those are va...
R or Python: Which Should You Learn in 2024?
zhlédnutí 4,1KPřed 5 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html In this video we're revisiting the R vs Python comparison in the year 2024. How do they stand in recent job reports and in indices like PyPL or the TIOBE index?
Four Data Science Jobs: My Experiences
zhlédnutí 540Před 5 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html In this video I talk about every data science job I've had, how each job was dramatically different from the others, and how each one sort of led to the next.
10 Python Packages You Should Know (in 2024)
zhlédnutí 849Před 5 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html In this video I'm going to provide a recommended 10 packages that you should know and focus on to get strong at Python programming, in the context of data science. Recommended book "Python for Data Analysis": amzn.to/3cDXKcE 1. pandas pandas.pydata.org/Pandas_Cheat_Sheet.pdf 2. numpy images.datacamp.com/image/uploa...
What Is Survival Analysis?
zhlédnutí 499Před 5 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html In this video I cover survival analysis. Specifically what it is, and why it's useful when the time until an event is important and when you have "censored" data. I talk about what censored data is and provide definitions of the survival and hazard functions. This is illustrated visually by showing a Kaplan-Meier c...
How I Would Learn Data Science in 2024 (If I Had to Start Over)
zhlédnutí 2KPřed 5 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html ChatGPT: Bri Does AI: czcams.com/video/MnDudvCyWpc/video.html Ryan Scribner: czcams.com/video/X9ksiScY7hM/video.html Statistics: Duke: www.coursera.org/specializations/statistics John Hopkins: www.coursera.org/specializations/jhu-data-science University of Amsterdam: www.coursera.org/specializations/social-science ...
How to Setup Your Python Environment (With VSCode & Anaconda)
zhlédnutí 4,9KPřed 6 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html In this video, I walk you through how to set up your Python development environment. If you're a complete beginner, you'll probably be good with just Anaconda/JupyterLab/Jupyter Notebooks. If you're going to be a serious developer, you'll want to use Visual Studio Code and as a best practice set up virtual environm...
How I Passed the Google Cloud Professional ML Engineer Exam
zhlédnutí 8KPřed 6 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html 'Journey to Become a Google Cloud Machine Learning Engineer': amzn.to/3TjwmYT Exam guide: cloud.google.com/learn/certification/guides/machine-learning-engineer Github compilation: github.com/sathishvj/awesome-gcp-certifications/blob/master/professional-machine-learning-engineer.md Medium articles: towardsdatascienc...
Update | Where I’ve Been
zhlédnutí 750Před 7 měsíci
Subscribe to RichardOnData here: czcams.com/channels/KPyg5gsnt6h0aA8EBw3i6A.html Hi everyone. It's been a while.
Tufte's Principles of Graphical Integrity
zhlédnutí 4,3KPřed 2 lety
Tufte's Principles of Graphical Integrity
Why Is It SO HARD to Get a Data Science Job?
zhlédnutí 4,8KPřed 2 lety
Why Is It SO HARD to Get a Data Science Job?
10 Good Coding Practices for Data Science
zhlédnutí 3,7KPřed 2 lety
10 Good Coding Practices for Data Science
Data Science Advice for College Students
zhlédnutí 3KPřed 2 lety
Data Science Advice for College Students
The State of Data Science in 2021 | Anaconda's Annual Report
zhlédnutí 2,5KPřed 2 lety
The State of Data Science in 2021 | Anaconda's Annual Report
When Should You Use Regression Methods?
zhlédnutí 5KPřed 3 lety
When Should You Use Regression Methods?
Tuning hyperparameters and stacking models with "tidymodels" | R Tutorial (2021)
zhlédnutí 2,3KPřed 3 lety
Tuning hyperparameters and stacking models with "tidymodels" | R Tutorial (2021)
Evaluating ML Performance, Resampling, and Workflows in "tidymodels" | R Tutorial (2021)
zhlédnutí 1,9KPřed 3 lety
Evaluating ML Performance, Resampling, and Workflows in "tidymodels" | R Tutorial (2021)
Intro to machine learning in R with "tidymodels" | R Tutorial (2021)
zhlédnutí 8KPřed 3 lety
Intro to machine learning in R with "tidymodels" | R Tutorial (2021)
Creating ROC curves and ensembling models in R with "caret" | R Tutorial (2021)
zhlédnutí 4,4KPřed 3 lety
Creating ROC curves and ensembling models in R with "caret" | R Tutorial (2021)
Pls mic fix
I am a PhD candidate in data analytics looking for job for eight months right now not even a single interview it’s very tough😫
That was a really good explanation. Short and powerful.
I used Python during my PhD and ended up shifting to R. Python's statistical packages are lackluster (maybe not surprisingly). I'm not a big fan of dot chains and pandas' index system, and the deal breaker was that it was so sluggish and busts memory so often with medium size data (20GB+) even with 60GB+ RAM machine. Tried Dask but it's pandas based and slow - with duckDB / polas I think dask project will be less popular. I picked up tidyverse and data.table from R, and it did the job without a problem, and I kinda regretted learning Python. R has fixest package that is really fast for high demensional fixed effects regressions, and python doesn't seem support large scale regressions very well.
R is more capable of doing amazing things better than python
I am AWS ML certified
Good to hear that you learnt R and then created the video. I can understand @6:41 - After learning c , c++, basic , cobol ie having a programming background. R really felt funny and weird because there are multiple ways you can do the samething. But later i fell in love with R . I have heard numpy and pandas are inspired from R datastructures. You have computer engineers backing up development and usage of python whereas bunch of academicians and statisticians for R. R initially looked like hotchpotch but after looking at numpy and pandas with basic python...... i just laugh at my judgements reversing over time. Python seems to be more in line with traditional expectation from OOPS syntax...i can go on ..... but both could have been more streamlined for the workflow of datascience
I can't believe FORTRAN is #12! I programmed my master's thesis project in 1995 in FORTRAN and I thought nobody used it anymore. As a statistician I'm guessing R is the way to go.
Nice honest and informative video. Thank you.
helpful video , thank you sir 🌟
It remains vague. What exactly can you do with R that is not possible with Python?
You mean to say python along with packages numpy , pandas scikit learn etc....
I like how you present the data/ideas 😂... Thanks for the information ❤
I dislike videos that make reproducibility challenging. You could demonstrate the exact same concepts using a simple data frame that can be found in seaborn (or any other imported package for that matter). Nice video otherwise
That' a totally fair point. If I'm understanding you correctly here, the issue basically being that this dataset requires an API key and a few steps overall to get ahold of. I do find these concepts easy to understand through the lens of a disease, but I totally see what you mean here. I have a video coming out soon on bagging vs boosting, and I'll use a dataset for that one that's simpler to get your hands on.
@@RichardOnData That's cool man. I appreciate you replying and enjoy your overall content (i'm subscribed for quite sometime now). To be clear, I didnt "disliked" as the "button dislike", but i mean in general that i dont like the idea that [...]. cheers!
good.mugo on data
Thanks for watching as always
how do you LEARN this stuff -- I mean really LEARN -- I took a course in biostatistics where we covered this and for every problem I had to keep referring back to a page where I had written all the formulas -- there was no way I could tell you the formula for specificity or sensitivity -- I understood the consequences and reasons for them (telling someone they have diabetes when they don't leads them to spending money on drugs they don't need) -- but as for applying the correct measure and formula to every scenario I was totally lost -- if we weren't allowed to use a page of formulas for the final I would have failed spectacularly
There's really no substitute for repetition and experience. Years ago I had to give multiple presentations for a sepsis prediction model and had to use a ton of these metrics and then answer questions. It went from always mixing them up, to being able to rattle them off in my sleep, but it did take a lot of time.
Thanks Richard, that was a good overview of classification metrics.
Thank you!
good.mugo on data
For me the greatest difference between the two languages is the mentality. R users are taught basic programming fundamentals and learn that for every solution there is a package they can use. Python users are taught programming first and how the language is used to create packages. So R users learn to use the language at a higher level, and when they go deeper then things get messy. Also in 2024 I wouldn't keep putting labels such as R for statistics and Python general purpose etc. This kind of labels is absolutely nonsense.
Nice one! This is a topic I need to start getting into. Would love more content on XAI!
Awesome, instead of Python, can this be done using R
Thanks for the great video! Should i reverse the remotesigned execution policy after finishing the install and setup, or does it have to stay on Y permanently?
This was awesome. Thanks. I’m gonna go see how I can apply these when presenting to stakeholders
You are awesome bro, thanks
Well damn 1yr left to learn
How many weeks or months did you take to study for the exam?
My university economics program uses R. I learned both for obvious reasons
Thanks for the great video. it was an awesome comparison. Are you practicing data science, I am looking for small role in data analysis with R programming software, do you have any advice. I have a masters degree in environmental science from Addis Ababa University. By the way are you on LinkedIn, would like to follow you. Thanks.
Hi Sir thanks for yet another great video. can you make a video on the most widely used ML tools. I have a background of chemistry and Environmental science on a masters level, I have started learning r through reading book and watching you tube videos. do you think I have a future on data science. I'm from Ethiopia.
Video of my wish😂
that data dictionary does seem useful, I would be curious to see how well it deals with more " disguised" features, such as a categorical feature that seems continuous or the inverse. Because sometimes I feel getting the feature type right can really be a matter of knowing, it would be impressive if it was very consistent.
Stopped using ai, as I stopped learning and got lazy. Now only reach for it as a last resort.
I'm a newbie of R and I like it. Thanks for the great video.
Thanks for this breakdown . I am graduating from a master's in analytics in a few months and this video came at the right time
this nigga is a genius
Great and relevant content. Thanks my pal.
i like R
I got admitted to MSDS in MSU. Can you tell me some relevant course work that I should take to get into data engineering?
Thanks
If you're in healthcare or pharma, the other language to know is SAS. I know, it's old hat, but it has a simple syntax, a rich macro language and it is certified for use in FDA-regulated industries.
Or SPSS which is similar but cheaper and the the user interface is much nicer
Has this helped with making any transitions to MLE? Have you noticed companies caring?
Thanks, Richard! Great video.
How to decrease the size of the column?
Julia is the light and the way
Welcome back!
Thank you for this. Very informative.
You mention at the initial stages getting a degree in a related area but what about folks who have education and experience in a different area. For example, what if a designer or lawyer wants to get into the data field. What would your recommendations be for such people to build their knowledge and get their foot in the door ?
Very good summary. Thank you!
please share your linkedin