Project - 8 | Data Analysis with Python |
Vložit
- čas přidán 14. 05. 2021
- Download Source Code of this project (Rs.39) - rzp.io/l/project8sourcecode
Download - Complete Course Notes - Data Analyst Self Study Material (Rs.250) - datasciencelovers.graphy.com/...
Download Dataset File - shorturl.at/aot24
Enrol in our Udemy courses :
1. Python Data Analytics Projects - www.udemy.com/course/bigdata-...
2. Python For Data Science - www.udemy.com/course/python-f...
3. Numpy For Data Science - www.udemy.com/course/python-n...
Download Free Core Python Notes - datasciencelovers.graphy.com/...
Download - Python Pandas Notes ( Rs.50 ) - bit.ly/3KxMpgA
-----------------------------
Watch demo of Self Study Material - • Data Analyst - Course ...
Outside India, PayPal for Self Study Material ($4) - datasciencelovers@gmail.com
.......................................................................
Contact Mail Id : datasciencelovers@gmail.com
--------------------------------------------------------------------
In this video, you will learn how to work on a real project of Data Analysis with Python. Questions are given in the project and then solved with the help of Python. It is a project of Data Analysis with Python or you can say, Data Science with Python.
The commands that we used in this project :
* head() - It shows the first N rows in the data (by default, N=5).
* tail () - It shows the last N rows in the data (by default, N=5).
* shape - It shows the total no. of rows and no. of columns of the dataframe.
* size - To show No. of total values(elements) in the dataset.
* columns - To show each Column Name.
* dtypes - To show the data-type of each column.
* info() - To show indexes, columns, data-types of each column, memory at once.
* value_counts - In a column, it shows all the unique values with their count. It can be applied on a single column only.
* unique() - It shows the all unique values of the series.
* nunique() - It shows the total no. of unique values in the series.
* duplicated( ) - To check row wise and detect the Duplicate rows.
* isnull( ) - To show where Null value is present.
* dropna( ) - It drops the rows that contains all missing values.
* isin( ) - To show all records including particular elements.
* str.contains( ) - To get all records that contains a given string.
* str.split( ) - It splits a column's string into different columns.
* to_datetime( ) - Converts the data-type of Date-Time Column into datetime[ns] datatype.
* dt.year.value_counts( ) - It counts the occurrence of all individual years in Time column.
* groupby( ) - Groupby is used to split the data into groups based on some criteria.
* sns.countplot(df['Col_name']) - To show the count of all unique values of any column in the form of bar graph.
* max( ), min( ) - It shows the maximum/minimum value of the series.
* mean( ) - It shows the mean value of the series.
You will learn these things also:
Creating New Columns & Dataframe
Filtering (Single Column & Multiple Columns)
Filtering with And and OR
Seaborn Library - Bar Graphs
..............................................
Task. 1) Is there any Duplicate Record in this dataset ? If yes, then remove the duplicate records.
Task. 2) Is there any Null Value present in any column ? Show with Heat-map.
Q. 1) For 'House of Cards', what is the Show Id and Who is the Director of this show ?
Q. 2) In which year the highest number of the TV Shows & Movies were released ? Show with Bar Graph.
Q. 3) How many Movies & TV Shows are in the dataset ? Show with Bar Graph.
Q. 4) Show all the Movies that were released in year 2000.
Q. 5) Show only the Titles of all TV Shows that were released in India only.
Q. 6) Show Top 10 Directors, who gave the highest number of TV Shows & Movies to Netflix ?
Q. 7) Show all the Records, where "Category is Movie and Type is Comedies" or "Country is United Kingdom".
Q. 8) In how many movies/shows, Tom Cruise was cast ?
Q. 9) What are the different Ratings defined by Netflix ?
Q. 9.1) How many Movies got the 'TV-14' rating, in Canada ?
Q. 9.2) How many TV Shows got the 'R' rating, after year 2018 ?
Q. 10) What is the maximum duration of a Movie/Show on Netflix ?
Q. 11) Which individual country has the Highest No. of TV Shows ?
Q. 12) How can we sort the dataset by Year ?
Q. 13) Find all the instances where: Category is 'Movie' and Type is 'Dramas' or Category is 'TV Show' & Type is 'Kids' TV'.
------------------
#python #dataanalytics #datascience #project
Download Source Code of this project (Rs.29) - rzp.io/l/project8sourcecode
Download - Python Data Analytics Course Notes and Projects Source Codes ( Rs.250 ) - datasciencelovers.graphy.com/products/Python---Data-Analytics-Study-Material-64d7b0bdfd6efd7c4587e233?dgps_s=dsh&dgps_u=c&dgps_uid=64cb5694e4b000cf748a30c2&dgps_t=cp_m
Get our "Self Study Material", which includes all the Projects Source Codes and Notes of the complete Data Analytics course, which contain all commands of Core Python, Numpy, Pandas, Matplotlib, SQL that we use for Big-Data Analytics ( cost @ Rs.250 or $20 or €20 )
Contact Mail Id : datasciencelovers@gmail.com
Can I upload this project on GitHub?
I really like that you highlight the functions and methods you cover in the tutorial. This helps provide technical learning objectives many other videos do not cover. Great job.
All these projects helped me build the most important part of data analysis/science which is to 'think questions' in data and finding the solutions using tools like pandas and python language. Thanks for providing the content DSL.
This tutorial is awesome!
Just one annotation for Question 10):
After using str.split(), the new columns "Minutes" and "Unit" are formatted as a string, like you mentioned in the video. To get the correct answer of the max() and min() function, you have to convert the values of the column "Minutes" into integers. Otherwise the min() and max() functions will not work properly.
data[["Minutes", "Unit"]] = data["Duration"].str.split(" ", expand = True)
data["Minutes"] = data["Minutes"].astype(int)
data["Minutes"].max()
data["Minutes"].min()
19:07 write this code to avoid error
df['Date_N'] = pd.to_datetime(df['Release_Date'], errors='coerce')
Thanks buddy❤
doing this I am getting different counts for this code................ df['N_Date'].dt.year.value_counts()
great efforts taken in making of this video thank you sir
This page is very underrated. Others don't provide such great content. I have done all the projects of this channel related to Data analysis. Looking forward to a more advanced project that will help me to enhance my Python skills in Visualization.
Data Science Lover Please provide more content related to seaborn, plotly, matplotlib, numpy.
How long does it take to finish the projects
Can i add this project in my resume?
59:32 max value are wrong because our dataset are in object as you told in video. so we have to change dataset into int.
code-- df['Minutes'] = pd.to_numericdf(['Minutes'])
you can check
df.dtypes (its convert object to int)
aferthat run df.['Minutes'].max()
result is 312
. I commented because if someone finding how to change object to int . this comment help them . thank you @datasciencelover how wonderful series .
its giving error pd is not defined
where to initialize exactly that to clear pls
First, import pandas as pd
@Ruchika. I agree. converting into time format will be more specific, I think!!!
thank u Ruchika
Thankyou for making this detailed video, it helped practice python and pandas skills.
Thanks it was very useful. I binged watched along with hands on Jupyter notebook.
Very pratical and great content, thank you !
thank you so much sir, for such a amazing content. God bless you
One of the best channels I have ever come across.
Q.10, add this line of code to get accurate results>>
data['Minutes'] = data['Minutes'].astype(float)
got error
@@muzammilgoraya data['Minutes'] = data['Minutes'].astype(''float')
Good
We need to change it to numeric to get the maximum duration:-
The code is:-
df[['Minutes','Unit']]=df['Duration'].str.split(' ',expand=True)
df['Minutes']=df.Minutes.astype('int64')
df.Minutes.max()
312 will be the answer.
But I have a question here. What will be the duration of shows consisting of seasons? We didn't take that into account.
Still not getting it.
TypeError: not supported between instances of 'str' and 'float'
but its data type is 'object' brother
Hello Sir, thanks for the great job. For question 10..
We are missing two important facts :
Duration column has values with two types of units : seasons and mins
We cannont just find the max of the column after applying the split function.
More to that after we do the split it is good to change the column with number values to int
We have to find the max by filtering on each Category type ( movie and TV show)
Here is my query :
netflix[['Number', 'Unit']] = netflix["Duration"].apply(lambda x: pd.Series(str(x).split(" ")))
netflix["Number"]= netflix["Number"].astype(int)
netflix_TV_Show = netflix[(netflix["Category"]== "TV Show")]
netflix_TV_Show[(netflix_TV_Show["Number"] == netflix_TV_Show["Number"].max())]
netflix_Movie = netflix[(netflix["Category"]== "Movie")]
netflix_Movie[(netflix_Movie["Number"] == netflix_Movie["Number"].max())]
Thanks.
Just now completed all project and practiced on the dataset given. Thanks a lot for creating this playlist. I hope you can make the series on visualization libraries. Also How can I raise different types of question by looking at dataset . as you have posted various question in description i felt very easy to solve it. But raising good question by looking at dataset which can generate insights. any Recommendation ? please reply..Thank you!
for question 8, we can convert the datatype of Cast column from object datatype to String and then use the following syntax to search for Tom Cruise:
dataframe['Cast'] = dataframe['Cast'].astype(str)
after that,
dataframe[dataframe['Cast'].str.contains('Tom Cruise')]
thanks for mentioning i was thinking about this as well because the cast is in the form of a list converted into a string same goes with type as well and when you do this you get 2 movies with Tom Cruise in them while in the video we get none
Great video,Thanks
Can you do a video on Transport Optimization using the Pulp library. Using DHL or any other data set. I would love to learn it .
Thanks , very much use full.
Worth watching but instead of imputation you went for data drop which is not ideal. During data cleaning ideally should go for imputation like mean, mode etc. Data drop will cause loss of data. Just a thought 🙂🙂
If the portion of missing data is small enough , it can be safely dropped!
sir thank you so much , learnt a lot
This is the best tutorial video, I watched and practice all 8 of your youtube. could you start showing us how to use matplotlib or numPy in the tutorial video also. Thank you so much!!!!!
Sure
Thanks
Hi. Thanks for the informative video on DF with filtering and other concepts with tasks. Can you tell me HOW to filter UNIQUE countries from COUNTRY column and plot them as pie chart or other suitable chart... Am unable to do it.
Thank you very much sir
we cant directly apply .max() to a string column becoz we know 90'120' . we need to change it to integer first to apply max
Hey could you pls make a proj that can be put in resume..really looking fwd to it.. Love your channel :)
Its a request please make a playlist for ML and Deep Learning projects❤️
Have you framed these questions yourself only or it is given in the dataset? Because on Kaggle I barely found datasets with tasks. Do we have to create tasks by ourselves?
For every datasets, you must have to ask primary research questions that you want to find out throughout the data analysis process using statistical procedures, coefficients and correlation, regression etc. Your questions should be relevant according to the given variables.
Thank you for great support and teaching. Could you share your word file -Core Python ' please?
Good Video
bro after finding the null values how to handle we can't remove that much records from the data set right , so how to overcome up with this
great content
Sir you teach fantastic, pl take sql in hindi. I request u on the behalf all students
Thank you
Really so nicely explain, thanks for your learning process.
Thanks
You can enroll in our udemy course to get certificate - www.udemy.com/course/bigdata-analysis-python/?referralCode=F75B5F25D61BD4E5F161
Appreciate video but there are few Mistakes like in Q.10 maximum duration movie
df.loc[df['Category'] == 'Movie'].groupby('Category')['Numeric_duration'].max().reset_index()
Category Numeric_duration
0 Movie 312
Similarly , there are other mistakes Q.13 , either question is not framed properly, or Solution. There is mismatch .
Insteat of : data.groupby('Category')['Category'].count() can we use data.('Category').value_counts() also??
Hi when I read or load the dataset it does not appear the way it appears in your videos.what can I do please help
Correction
Sir, 58:33
Q10 me jo duration of Movies/TV Show find krna hai na, who jo 99 aaya hai woh galat hai. Dataset me 312min max value hai, ye jo aap ne units and values split kiye hai na us ke baad Object data type ko int data type me convert krne ke baad max value correct aayega.
For 11th Question, you have to segregate the data by split method and then use set_index,stack, reset_index function in it.
hey bro, can you help me with the code if we wish to do 'cluster analysis' in this project
I like your videos
This is so cool
Q1.
Answer. df[df['Title']== 'House of Cards'] , can we also use this
@data_science_lovers. the records those the duplicated function returns are not duplicates, values on most of the columns are different. Could you please check and explain what is going on here?
sir i have a doubt....inorder to show the top 10m directors shouldnt we need to split the multi-directors??
great effort i want to write code which show the most actor or actress in the tv shows or movies
19:26 why am I getting error here ? It's showing wrong regarding the format.
Can anyone help ?
Hi, I had the same problem, the solution is the following code:
data['Date_N']= pd.to_datetime(data['Release_Date'], format='mixed')
What is the best suited title for this project?
Sir make video data visualisation or web scrapping project
10:03 can u explain me how these 2 rows are duplicate??
In Q3. Addtion of movies and tv shows gives 7787 , while after cleaning the available rows are 4809. Can som1 explain this?
Sir can you please help me in how to deal with os error ?
Do more real world project end to end
Amazing Video
Thanks
You can enroll in our udemy course to get certificate - www.udemy.com/course/bigdata-analysis-python/?referralCode=F75B5F25D61BD4E5F161
I love this project. How can i upload it as a project in my resume.
hello I facing an error issue continuously, what can i do plz help
For first question we can directly use this
data[data['Title'] == 'House of Cards']
facing error with sns.countplot(data['Category']) with ValueError: could not convert string to float: 'TV Show', can anyone guide me here?
add X in the code...sns.countplot(x=data['Category'])
@@mahetsiedahi6530 thank you so much. It's so much helpful for me
@16:59 what if we dont know the spelling of the house of cards ? then if we use isin It shows no record which is logically not correct. What to do in that case?
df[df['title'].str.lower().isin(['house of cards','blood & water'])]. Hope this helps
first thanks for those videos but what should i know first to see this tutorial ? like prerequisites and is that Machine learning included ?
You should only learn basic python to solve these projects.
Machine Learning is not included in this video.
@@data_science_lovers
thanks for your help
Done
what is our target value please ?
In my entire life to watch a video continue the series in you tube your tutorial video only.
It was very help full for my skills thank you💥💥💥💥💥💥💥💥💥💥💥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥❤🔥
Q.10.What is the maximum duration of a Movie/TV Show on Netflix.
Given answer '99' seems to be incorrect as if is fetching from a object dtype
use below line to convert object data type to float and fetch max value of it
max(data.Minutes.astype('float'))
Will give result of '312.00'
DSL team is doing good job and i would like to watch more informative videos on Data science
correct💯
in which type of analysis will you classify this one into?
Descriptive Analysis.
Exploratory Analysis.
Inferential Analysis.
Predictive Analysis.
Causal Analysis.
Mechanistic Analysis.???
descriptive analysis. cuz we are not interpreting anything or drawing conclusion from it. we are just describing it.
@@varunpatil7226 hey bro, how can we do 'cluster analysis' in this project
@@varunpatil7226 bro can I add this project to a resume?
and can u suggest the best description to this project
For resume project, kindly send a mail to datasciencelovers@gmail.com.
We will reply back once we get an appropriate project.
@@dilipy6676 have you added this inyour resume? do let me know the description
Hey can i add this to my resume? Or its not a resume add on project?
It's up to you.
But, you should also be prepared for some other types of questions that may be asked from it.
We will upload a proper project video for resume later.
Use this to avoid error
df['Date_N']= pd.to_datetime(df['Release_Date'],format='mixed')
Df[['Minutes',' Unit ']] = Df['Duration'].str.split(' ',expand= True)
I am not getting this it throws valueerror: column must be same length as key)
What skills do i need to get an internship for data analysis
Good to have skills on Tableau, SQL, Python, Advanced Excel..
But not all are required
@@data_science_lovers
should i know Excel if i do well in programming by python ?
Yes.
Atleast basic Excel required to get answers for basic queries from data
@@data_science_lovers sir dint follow the Q10 bcs when you used split comnad and separated with spaces the split commoand should create a list.
And even if it doesn't don't you think converting 4 seasons into 4 minutes is wrong.
@@data_science_lovers sir shouldn't we frame the question .
Here your giving us the question which need to framed and we are just writing the syntax for that.
Please can you make a tutorial on how to interpret the data and ask the right questions please
Q. 2) In which year the highest number of the TV Shows & Movies were released ? Show with Bar Graph.
i have problem with this code : data['Date_N'] = pd.to_datetime(data['Release_Date'])
how to fix it?
I solved this problem by adding a format="mixed".
data['Date_N'] = pd.to_datetime(data['Release_Date'], format='mixed')
@@juliacosta9308 thank you for sharing this insight, I was facing the same problem.
Sir 7th question main sirf United States hi dikha rha h, can you please check
Brother Follow the syntax mentioned in video its show the results for both conditions
can i use this project for bigdata analytics
For school level
Hello sir ,
Sir, can you help me to complete a project ?
I have send some details about my project .Project name -Customer service requests analysis. If you search that project on Google you get the whole details. Sir please help me .your explanation way is very good .
Can u pls send details to datasciencelovers@gmail.com.
Can we add this to our resume?as our project
As your wish.
But you should also be prepared for some other types of questions that may be asked.
We will upload a proper project video for resume later.
please how we can download the data ? Thank you
you can download the dataset from kaggle
The link to download the dataset file is available now in the video description.
Kindly download.
how can we remove null values
df.dropna() will remove all the rows that contains null values
How can I mention this project in my resume? I mean what to write in the resume?
It's up to you.
But, you should also be prepared for some other types of questions that may be asked from it.
We will upload a proper project video for resume later.
@@data_science_lovers Please do a project that we can mention in the resume..
Yes
later
Is It Good To Keep This Project In My Resume ?
Yes for sure , if you're a fresher
@@Manojkumar-vh4tc hi manoj can I get job using these projects Iam fresher currently finding difficulty to land a job .please guide me bhai , dont know what to do tried everything to get job as data analyst .can these projects help me .i dont know what actual work in real life data analyst do .please help and guide me thank you
Have u got the job
And whats helped u to get it
@@Somnath-je9nd you will never get a job by copying youtube tutorials as most data analyst will know a tutorial from a original project. Work on your own project where you actually solve a problem. In real life various different type of businesses hire data analysts firms to make business strategy and forecasts.
For eg: A pharma company would like to know how many people in india can use their medicines, etc
Sir can i do the same analysis to put in my resume
Hi are u preparing for data analysis
@@amanrauthan1222 yes bro... Totally into it
@@amanrauthan1222 what about u
@@vivekshah8905 can we catch each other at Instagram my insta id is. hey.luckyyy
@@amanrauthan1222 i'll send u request soon
Can i add this project in my resume?
Not sure...but for resume we have another project, you can check here - czcams.com/video/3IVQUvT8lMg/video.html
Please make the Netflix Data Set Available!
Yes it is available to download.
Link is given in the video description.
15:20
It's very sad to see Tom cruise working only in 2 movies.
Can i get that dataset
Yes , the link to download dataset is given in the description of each video
3 mb ka file big dataset hota h kya 😂😂
null values ka heat map kon bnata h bhai. woh to wahi isnull se hi pata lagrha tha.
I have sent you email, actually I want source code of this project
Reply done
where to get python code
The cost of all projects source codes & complete courses notes is Rs.750 only.
@@data_science_lovers can i get a data anlyst project sir
Ok.
Please send a mail..i will send the source code
@@data_science_lovers I SENT ALREADY SENT IT SIR
ONCEAGAIN I WILL SEND U SIR
data set?
Download link available in video description
@@data_science_loversit says site can't be reached
It is working for me....share your email id
There is a correction at 59:00 where we were finding max duration of a movie.
-- Your answer for df.Minutes.max() = 99 is incorrect as 'Minutes ' was still Object .
-- To sort this , we can convert Minutes from Obj to int using --->. df['Minutes'] = df['Minutes'].astype('int')
-- After this , df.Minutes.max() Output will be 312
live in pakistan
Paypal
Id is datasciencelovers@gmail.com
@@data_science_lovers Paypal and phonepe both don't function in Pakistan. What to do then?
Kindly use Super Thanks option given below the video
Q10 is wrong in this video better way is this i think..
df[['minutes','unit']] = df['Duration'].str.split(' ',expand=True)
df
df['minutes']=df['minutes'].astype('int')
s = df.groupby(['Category']).get_group('TV Show')
m = df.groupby(['Category']).get_group('Movie')
#for tv shows
res = s[['minutes','unit']].max()
#for movies
res1 = m[['minutes','unit']].max()
print(res,"
",res1)
because two categories are there so two different max values should be calculated also 99 is wrong 312 is ryt convert the column to int for ryt ans
hope this helps thankyou!!!
For Q8.
df[ ( (df['Category']=='Movie') & (df['Type'].str.contains('Comedy')) ) | ( df['Country']=='United Kingdom' ) ]
Thank you so much sir