Why Is It SO HARD to Get a Data Science Job?
Vložit
- čas přidán 1. 06. 2024
- Subscribe to RichardOnData here: / @richardondata
In this video, I discuss some of the reasons why it can be so difficult to get a data science job. I talk about skills and mismatches that you may have with those skills, the different titles that you might look for to optimize your chances, and getting through interviews and assessments.
Articles:
www.kdnuggets.com/2020/09/mod...
towardsdatascience.com/why-ar....
towardsdatascience.com/5-reas...
towardsdatascience.com/acing-...
My videos referenced:
"How to Get a Data Science Job in 2021": • 5 Tips to Get a Data S...
"A Study Pathway for Data Science in 2020": • A Study Pathway for Da...
"Can You Become a Data Analyst/Scientist With No College Degree?": • Can You Become a Data ...
Statistics Coursera courses:
Duke: www.coursera.org/specializati...
John Hopkins: www.coursera.org/specializati...
University of Amsterdam: www.coursera.org/specializati...
Books:
"An Introduction to Statistical Learning": amzn.to/3mzrwmf
"The Hundred-Page Machine Learning Book": amzn.to/3mBOSHK
"The Guru's Guide to Transact-SQL": amzn.to/34ven7L
"R for Data Science": amzn.to/3nOHieF
"Python for Data Analysis": amzn.to/37ySAOc
#DataScience #BreakingIntoDataScience #StatisticsForDataScience
PayPal: richardondata@gmail.com
Patreon: / richardondata
BTC: 3LM5d1vibhp1F7pcxAFX8Ys1DM6XLUoNVL
ETH: 0x3CfC599C4c1040963B644780a0E62d45999bE9D8
LTC: MH8yPjvSmKvpmRRmufofjRB9hnRAFHfx32 - Zábava
Glad to have you back sir!
Much appreciated!
Welcome back. 🤗
I became a sports data scientist at 16 years of experience.I am a solution architect before
Very nice!
Data science is INCREDIBLY saturated. The market has way more data scientists than the economy needs.
I'm taking an R course at a Community College. There are a number of people who think that they can get an Associates Degree in Business Analytics and become a Business Analyst or Data Scientist. I think they are way out of touch with reality, but I could be wrong.
Hi everyone! I'm currently considering a Bachelor of Science in Data Science, and I'm wondering if anyone has any experience with this program. I'm particularly interested in the statistics and data science components of the degree, and I'm wondering if they're comprehensive enough. I'm also wondering if there are any areas that could be improved.
Here are the data science courses and what the course outline is in this degree:
PROBABILITY AND STATISTICS
Course Outline:
Introduction to Statistics and Data Analysis, Statistical Inference, Samples, Populations,
and the Role of Probability. Sampling Procedures. Discrete and Continuous Data. Statistical
Modeling. Types of Statistical Studies. Probability: Sample Space, Events, Counting
Sample Points, Probability of an Event, Additive Rules, Conditional Probability,
Independence, and the Product Rule, Bayes’ Rule. Random Variables and Probability
Distributions. Mathematical Expectation: Mean of a Random Variable, Variance and
Covariance of Random Variables, Means and Variances of Linear Combinations of
Random Variables, Chebyshev’s Theorem. Discrete Probability Distributions. Continuous
Probability Distributions. Fundamental Sampling Distributions and Data Descriptions:
Random Sampling, Sampling Distributions, Sampling Distribution of Means and the
Central Limit Theorem. Sampling Distribution of S2, t-Distribution, FQuantile and
Probability Plots. Single Sample & One- and Two-Sample Estimation
Problems. Single Sample & One- and Two-Sample Tests of Hypotheses. The Use of PValues
for Decision Making in Testing Hypotheses (Single Sample & One- and TwoSample Tests),
Linear Regression and Correlation. Least Squares and the Fitted Model, Multiple Linear
Regression and Certain, Nonlinear Regression Models, Linear Regression Model Using
Matrices, Properties of the Least Squares Estimators.
Reference Materials:
Probability and Statistics for Engineers and Scientists by Ronald E. Walpole, Raymond
H. Myers, Sharon L. Myers and Keying E. Ye, Pearson; 9th Edition (January 6, 2011).
ISBN-10: 0321629116
2. Probability and Statistics for Engineers and Scientists by Anthony J. Hayter, Duxbury
Press; 3rd Edition (February 3, 2006), ISBN-10:0495107573
3. Schaum's Outline of Probability and Statistics, by John Schiller, R. Alu Srinivasan and
Murray Spiegel, McGraw-Hill; 3rd Edition (2008). ISBN-10:0071544259
INTRODUCTION TO DATA SCIENCE
Course Outline:
Introduction: What is Data Science? Big Data and Data Science hype, Datafication, Current
landscape of perspectives, Skill sets needed; Statistical Inference: Populations and samples,
Statistical modeling, probability distributions, fitting a model, Intro to Python; Exploratory
Data Analysis and the Data Science Process; Basic Machine Learning Algorithms: Linear
Regression, k-Nearest Neighbors (k-NN), k-means, Naive Bayes; Feature Generation and
Feature Selection; Dimensionality Reduction: Singular Value Decomposition, Principal
Component Analysis; Mining Social-Network Graphs: Social networks as graphs,
Clustering of graphs, Direct discovery of communities in graphs, Partitioning of graphs,
Neighborhood properties in graphs; Data Visualization: Basic principles, ideas and tools for
data visualization; Data Science and Ethical Issues: Discussions on privacy, security, ethics,
Next-generation data scientists.
Reference Materials:
Foundations of data science, Blum, A., Hopcroft, J., & Kannan, R., Vorabversion eines
Lehrbuchs, 2016.
2. An Introduction to Data Science, Jeffrey S. Saltz, Jeffrey M. Stanton, SAGE
Publications, 2017.
3. Python for everybody: Exploring data using Python 3, Severance, C.R., CreateSpace
Independent Pub Platform. 2016.
4. Doing Data Science, Straight Talk from the Frontline, Cathy O'Neil and Rachel Schutt,
O'Reilly. 2014.
5. Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and
Presenting Data, EMC Education Services, John Wiley & Sons, 2015.
ADVANCED STATISTICS
Course Outline:
Introduction to Statistics, Use of Statistics in Data Science, Experimental Design, Statistical
Techniques for Forecasting, Interpolation/ Extrapolation, Introduction to Probability,
Conditional Probability, Prior and Posterior Probability, Random number generation (RNG),
Techniques for RNG, Correlation analysis, Chi Square Dependency tests, Diversity Index,
Data Distributions Multivariate Distributions, Error estimation, Confidence Intervals, Linear
transformations, Gradient Descent and Coordinate Descent, Likelihood inference, Revision
of linear regression and likelihood inference, Fitting algorithms for nonlinear models and
related diagnostics, Generalized linear model; exponential families; variance and link
functions, Proportion and binary responses; logistic regression, Count data and Poisson
responses; log-linear models, Overdispersion and quasi-likelihood; estimating functions,
Mixed models, random effects, generalized additive models and penalized regression;
Introduction to SPSS, Probability/ Correlation analysis/ Dependency tests/ Regression in
SPSS.
Reference Materials:
Probability and Statistics for Computer Scientists, 2nd Edition, Michael Baron.
Probability for Computer Scientists, online Edition, David Forsyth
Discovering Statistics using SPSS for Windows, Andy Field
BIG DATA ANALYTICS
Course Outline:
Introduction and Overview of Big Data Systems; Platforms for Big Data, Hadoop as a
Platform, Hadoop Distributed File Systems (HDFS), MapReduce Framework, Resource
Management in the cluster (YARN), Apache Scala Basic, Apache Scala Advances, Resilient
Distributed Datasets (RDD), Apache Spark, Apache Spark SQL, Data analytics on Hadoop
/ Spark, Machine learning on Hadoop / Spark, Spark Streaming, Other Components of
Hadoop Ecosystem
Reference Materials:
White, Tom. “Hadoop: The definitive guide." O'Reilly Media, Inc., 2012.
Karau, Holden, Andy Konwinski, Patrick Wendell, and Matei Zaharia. “Learning
spark: lightning-fast big data analysis." O'Reilly Media, Inc., 2015.
3. Miner, Donald, and Adam Shook. “MapReduce design patterns: building effective
algorithms and analytics for Hadoop and other systems." O'Reilly Media, Inc., 2012.
DATA WAREHOUSING AND BUSINESS INTELLIGENCE
Course Outline:
Introduction to Data Warehouse and Business Intelligence; Necessities and essentials of
Business Intelligence; DW Life Cycle and Basic Architecture; DW Architecture in SQL
Server; Logical Model; Indexes; Physical Model; Optimizations; OLAP Operations, Queries
and Query Optimization; Building the DW; Data visualization and reporting based on
Datawarehouse using SSAS and Tableau; Data visualization and reporting based on Cube;
Reports and Dashboard management on PowerBI; Dashboard Enrichment; Business
Intelligence Tools.
Reference Materials:
W. H. Inmon, “Building the Data Warehouse”, Wiley-India Edition.
Ralph Kimball, “The Data Warehouse Toolkit - Practical Techniques for Building
Dimensional Data Warehouse,” John Wiley & Sons, Inc.
3. Matteo Golfarelli, Stefano Rizzi, “Data Warehouse Design - Modern Principles and
Methodologies”, McGraw Hill Publisher
HERES THE REST
DATA VISUALISATION
Course Outline:
Introduction of Exploratory Data Analysis and Visualization, Building Blocks and Basic
Operations; Types of Exploratory Graphs, single and multi-dimensional summaries, five
number summary, box plots, histogram, bar plot and others; Distributions, their
representation using histograms, outliers, variance; Probability Mass Functions and their
visualization; Cumulative distribution functions, percentile-based statistics, random
numbers; Modelling distributions, exponential, normal, lognormal, pareto; Probability
density functions, kernel density estimation; Relationship between variables, scatter plots,
correlation, covariance; Estimation and Hypothesis Testing; Clustering using K-means and
Hierarchical; Time series and survival analysis; Implementing concepts with R (or similar
language)
Reference Materials:
“Exploratory Data Analysis with R” by Roger D. Peng
DATA MINING
Course Outline:
Introduction to data mining and basic concepts, Pre-Processing Techniques & Summary
Statistics, Association Rule mining using Apriori Algorithm and Frequent Pattern Trees,
Introduction to Classification Types, Supervised Classification (Decision trees, Naïve Bae
Classification, K-Nearest Neighbors, Support Vector Machines etc.), Unsupervised
Classification (K Means, K Median, Hieratical and Divisive Clustering, Kohonan Self
Organizing maps), outlier & anomaly detection, Web and Social Network Mining, Data
Mining Trends and Research Frontiers. Implementing concepts using Python
Reference Materials:
Jiawei Han & Micheline Kamber, Jian Pei (2011). Data Mining: Concepts and
Techniques, 3rd Edition.
2. Pang-Ning Tan, Michael Steinbach, and Vipin Kumar (2005). Introduction to Data
Mining.
3. Charu C. Aggarwal (2015). Data Mining: The Textbook
4. D. Hand, H. Mannila, P. Smyth (2001). Principles of Data Mining. MIT Press
ARTIFICIAL INTELLIGENCE
Course Outline:
An Introduction to Artificial Intelligence and its applications towards Knowledge Based
Systems; Introduction to Reasoning and Knowledge Representation, Problem Solving by
Searching (Informed searching, Uninformed searching, Heuristics, Local searching, Minmax algorithm, Alpha beta pruning, Game-playing); Case Studies: General Problem Solver,
Eliza, Student, Macsyma; Learning from examples; Natural Language Processing; Recent
trends in AI and applications of AI algorithms. Lisp & Prolog programming languages will
be used to explore and illustrate various issues and techniques in Artificial Intelligence.
Reference Materials:
Russell, S. and Norvig, P. “Artificial Intelligence. A Modern Approach”, 3rd ed, Prentice
Hall, Inc., 2015.
2. Norvig, P., “Paradigms of Artificial Intelligence Programming: Case studies in Common
Lisp”, Morgan Kaufman Publishers, Inc., 1992.
3. Luger, G.F. and Stubblefield, W.A., “AI algorithms, data structures, and idioms in Prolog,
Lisp, and Java”, Pearson Addison-Wesley. 2009.
I'm hoping that some of you can take a look at this list and let me know if you think the degree is up to par. Any feedback would be greatly appreciated!
Thanks in advance!
currently in the interview hunt process and it is taxing.
Maybe because econometrics, intermediate to advanced statistical analysis, and machine learning all require one to understand calculus and matrix algebra before one can understand them well. And then learning those subjects directly afterwards is no picnic either.
Then add coding on top of it, in two or three different languages minimum usually (SQL, R, Python, VBA, MATLAB, Julia, C++, or whatever else applies sometimes) which is difficult for most people, PLUS possess intermediate to advanced skills in Microsoft Excel and also some skills in either Tableau or Microsoft Power BI. I mean, why wouldn't that be easy lol?!?!?!
What about economics for data analytics,is it fitted?
i have a college degree in mechanical engineering and naval engineering, will this help me get a job as a data scientist?
Of course yes, as long as you can show that you understand programming as well. Then you're the perfect candidate
Oh man I just graduated college like few weeks ago
Congrats! How's the job search going?
@@RichardOnData I graduated with B.S in physics with computational concentration and CS minor. I been mostly applying to software engineering jobs. Lot of tech companies are trying to low ball me like I am indian guy with HB1 visas. Right now I am working on data science/analytics certification thru coursera. I start to feel my 4 years of college were waste of time and money doing virgin math. At least on bright side I have no student loans to pay since I work jobs to pay for tuition.
I think anyone confused as to why it is very difficult to obtain enough of the required skills (both technical codings per se and applied data analytics skills) and knowledge (data structures, algorithms, machine learning methods, statistics) at a sufficiently high level have probably just been in the weeds for so long they have forgotten that the job title has only two words in it and one of them is literally "scientist".
News flash, becoming a scientist is fucking hard, always has been and always will be
Dude, that thumbnail is depressing me.
Hah, thank you, that was the idea. Not being able to find a job is depressing stuff...