Master Web Scraping | Python Tutorial - Make an extra $500 over a weekend. Up your Amazon FBA Game
Vložit
- čas přidán 10. 06. 2019
- Learn one of the hottest skills in Data Science today and that is web scraping. Combine that with the ability to analyze data and you've just marked yourself as a hot commodity. Companies charge thousands per month per client using the same techniques you will learn in this video.
Code on github: github.com/satssehgal/Booksto...
Watch how to do this with Selenium 👉 • Web Scraping with Sele...
Watch how to do this with Scrapy 👉 • Introduction to Scrapy...
Learn how to scrape the same page with scrapy: • Introduction to Scrapy...
Github link to code: github.com/satssehgal/Booksto...
👉 Facebook Group: / theaiwarriors
👉 Instagram: @theaiwarriors
👉 Corporate Training and Upskilling: levers.ai
Netfirms (Affiliate) - bit.ly/2KdJ4Dp
Linode Server - bit.ly/2XpqGi9
Bluehost (Affiliate) - bit.ly/2GxxBh1
PythonAnywhere (Affiliate) - bit.ly/2kWORVe
Heroku - www.heroku.co
NordVPN (Affiliate) - bit.ly/2W87je0
Here is a link to my python for beginners, master python course: bit.ly/2HIZS42
Music: ONE by Lahar / musicbylahar
Creative Commons - Attribution 3.0 Unported - CC BY 3.0
Free Download / Stream: bit.ly/ONE-Lahar
Music promoted by Audio Library • Video - Jak na to + styl
Watch Next --> czcams.com/video/p42e8NBnrGI/video.html
I love your tutorials when you explain line by line every pice of code. Thank you!!!!
Wow!! I just got to the end and I'm so floored. I had no idea. You made this so simple to follow.
This was by far one of the best videos I've watched on web scraping and I finally "get" it after watching many tutorials. Well done and thank you for explaining this so well . I finally have an answer to my project that I've been trying to solve for weeks. Great job!
This was a terrific tutorial! You are very clear and easy to follow/understand. I have subscribed and I'll be reviewing your previous tutorials. Thanks!
This is exactly what I was looking for and even everyone out there I am sure. Thanks a bunch dude! 😊🙌🏻
I just started studying data science and your channel has everything i been thinking of learning! Thanks for your great work and sharing it! You are awesome
Thanks glad it’s helping
I followed you step by step and this is amazing. Thank you very much for your time, and patience at clarifying, teaching and sharing your knowledge.
You are very welcome
Thanks a lot! My first successful web scraping, it means a lot to me!
Amazing keep up the good work
Great vid, very useful. I found a small tweak for the title extraction, as titles were getting cut off with "...". The full title was stored in a tag with the , a a "tilte" parameter. for i in soup.findAll('h3'):
#print(i.a.attrs['title'])
ttl=i.a.attrs['title'] #within the h3 tag is an 'a' tag, that has a 'title' parameter with the title in it
titles.append(ttl)
Nicely done. Thank you for sharing!
its really work thank you so much sir, carry on great work.
Thank you for this tutorial. The way you teach and explain makes it easy for dumb dumbs like me to be able to follow along. You just got a lifetime subscriber. Keep up the great work and would love to learn more from you and your tutorials.
Thank you 🙏
I have to fully agree with you man. You really got me hooked for life.
Excellent tutorial. - Thanks for sharing.
the level knowledge you have is awesome..
Damn, this was satisfactory well explained, thanks
Thanks for such a valuable stuff 👍👍👍
Awesome glad it added some value for you
Thanks! I integrate this tutorial with Flask and Mysql
Amazing keep it up
Thanks alot , Works well for me
+Adebayo Taiwo awesome
You did an excellent job explaining the process of web scraping .
I have one question. How is it that tags can receive src as an argument? I understand why you did it but not how it works.
Thanks!
Question. I'm looking to pull data from multiple Excel spreadsheets that is located on a SharePoint and add the data into a new database. Would there be an easier way of doing this or would scrapping the data be just a simple?
Great Video Thanks
Please zoom in a bit...Great video...thank you
Great teacher
Thank you
How to scrap an image which gives a status code of 302?
Hi, I think there is 1 error in this tutorial - 31:35
When you click on cell A1 on the Excel sheet, the title reads as A Light in the ...
Instead of A Light in the Attic which is how it is represented on the website.
Any clarity on how to get the exact/full title?
Other than that, video was flawless
Potentially, a website you scraped had a propensity to not show the entire string until a page loaded in or the title was clicked. Just an idea - check your raw data with a print or something. (I haven't watched the entire video.)
Hi thanks for the tutorial. Your screen is too small to read the codes comfortably. There is huge blank empty spaces on each side of the main frame, if you could zoom in abit more.
Thank you for the comment. Yes you are right. I posted the code on github so you all can follow. In most of my newer tutorials I’ve switched to jupyter notebooks and it’s a lot more clear.
@@SATSifaction I just finished a Data Analytics course at the university of Miami and they did everything through python in anaconda and Jupyter notebooks and Visual Code.
Excellent tutorial and well explained. I had one hangup and I spent a day on trying to figure it out. I used Google Colab and this is the first time it threw this error. I couldn't put the dataframe in to an excel file. I kept getting no such file or directory. Finally, I put it into a Linux terminal and it ran without issues.
One thing I'm not sure of, is does the data repeat? It does for me.
It doesnt for me. You can probably alter the code to suit your needs.
I was getting a value error when running the code. I could use some help. It says "ValueError: arrays must all be same length". Any help would be very appreciated. I've attached the code below I
import requests
from bs4 import BeautifulSoup as bs4
import pandas as pd
pages = []
prices = []
stars = []
titles = []
urlss = []
pages_to_scrape = 5
for i in range(1, pages_to_scrape + 1):
url = ('books.toscrape.com/catalogue/page-{}.html').format(i)
pages.append(url)
for item in pages:
page = requests.get(item)
soup = bs4(page.text, 'html.parser')
for i in soup.find_all('h3'): # Gets tiitles
ttl = i.getText()
titles.append(ttl)
for i in soup.find_all('p', class_='price_color'):
price = i.getText()
newprice = price.replace("Â", "")
prices.append(newprice)
for s in soup.find_all('p', class_='star-rating'):
for k, v in s.attrs.items():
star = v[1]
stars.append(star)
divs = soup.find_all('div', class_='image_container')
for thumbs in divs:
tgs = thumbs.find('img', class_='thumbnail')
urls = 'books.toscrape.com/' + str(tgs['src'])
newurls = urls.replace("../", "")
urlss.append(newurls)
data = {'Titles': titles, 'Price': prices, 'URLS': urlss, 'Stars': stars}
print()
print(data)
df = pd.DataFrame(data=data)
df.index += 1
df
I get this error:
TypeError Traceback (most recent call last)
in
17 page = requests.get(item)
18 soup = bs4(page.text, 'html.parser')
---> 19 for i in soup.findALL('h3'):
20 ttl=i.getText()
21 titles.append(ttl)
TypeError: 'NoneType' object is not callable
I think the code works if the url format actually has it's pages (i.e page 1 - page 5) increases incrementally by 1 at each click, but don't think this works with say Amazon or other sites. How do we go about that ?
The code is site specific. With Amazon I would use scrapy and for pagination they have a method for next page in scrapy that would handle that for you. It’s explained well in the scrapy docs
Awesome boss, Do you think this field of data scrapping will be needed more and more. its 2023 right now and I'm taking a Google Cert on Data Analysis ,Do you think we can be still valuable in Data Scraping ? Its not taught in the class by Google. Please Advise >..JAY Thank you
Hi; I really enjoy this video , my background isn't in programming but I am really interested in this Web Scraping (Data Science) type of work but have no idea or direction as to how to get started, I know on Udemy has some related web scraping courses and how to learn HTML, CSS and Python etc.. I also know that some University or College do offer courses in Data Science, but the cost is very high almost $10,000 for a one year course.
Hi there. Web scraping is a very good area to get into especially around data collection. There are a lot of great free youtube resources. I would first try out a few projects before paying that sum of money to an institution. Several time I’ve seen people invest a lot of money in a hot skill but later have no passion for it. To get you started you can view my web scraping courses...all free...enjoy -> czcams.com/play/PLM30lSIwxWOjrr-6zuMj28fC5RxrPY_Tc.html
Hi can you explain why you use format(i) little bit more
You would use .format when you want to format a string to include a variable. If variable X=12 for examples the following code ‘I am happy to be {}’.format(X) would give ‘I am happy to be 12. Hope that helps.
When I run the df.to_excel() I get the openpyxl file not found.
Other than that, good so far.
I can copy the code and run it in Sublime and Gitbash terminal with no errors. And the excel file is produced.
I had to do a pip install openpyxl to get past that error for me. I made sure I ran all this in a virtualenvironment.
Pls, what is your file in github for this program, I found others programs but not this. The tutorial is great!!
I already found the github file
If I am using Spyder to program in Python. Would you suggest to use BeautifulSoup or Scrapy?
Either are fine though the two have different applications. BS is more for a quick and dirty web scrape while scrapy is more of a framework that has more applicability
What do you suggest for scrapping sites that have a login?
For logins i would recommend selenium. Check out my video on building a billing bot which covers the login process -> czcams.com/video/HsA0mJ4kNKE/video.html
i dont understand how someone can be ip blocked or something for web scraping... i mean is it not just reading the html code and searching and finding targeted tags and paths and then putting them in a storage and organize them... it seems like its something that's completely client-sided, how do they blacklist or find out you're scraping them?
To we scrape you will be using the request module. In order to get the data from their server you make a request to their server which will return the html content. Every request you make hits their server. If you make too many requests they can ban the IP address that makes the requests, in other words your ip. Also if a user doesn’t respect the robot.txt file which outlines what you can and cannot scrape then they can ban the ip for making requests that aren’t authorized.
I'm wondering why you decided to use jupiter books?
No specific reason other than the fact that it’s a great tool to teach and train python with...
Sir there is one website bol.com it's hard to scrap would you please teach me how to scrap that website
Hi. I can help you with this.. email more to take this further.
As a full-stack web developer, that means a lot to me. My feelings are hurt as fuck, lol.
How to scrape amazon products URL and asin number, please tell me
Watch this video. It’s a similar concept that you can apply to Amazon. However Amazon is a lot more difficult to scrape. 👉🏼 czcams.com/video/NXNhqNyYpHI/video.html
how would you turn the prices into usd?
You could always connect to an an external API o convert it. It depends on how the website displays the data. if its in USD it will scrape that and you can add your own API for currency conversion.
Thanks for sharing Sir, We are uncomfortable to read the code . These are too small, please share with zoom font size.
raviranjan sharma thanks for the note. I cannot edit the video however I have uploaded the code on GitHub for you to use and follow. It’s in the link description. All my new videos use much bigger font 😊
Thank you Sir
What is exactly prerequisite to learn Web scraping???? Please Sir make video of prerequisite of web scraping 🙏🙏🙏🙏🙏🙏🙏
We need exact knowledge of learning web scraping 🙏🙏🙏🙏🙏🙏🙏🙏🙏
Great Video sir! NoobQuestion: Are Web scraping jobs only revolves in extracting these kinds of data then to be later shown in a table?
+Jon Jimlin Sumalhay no there is more applications. Datasuch as prices, that are web scraped can be inputs to machine learning models like pricing algorithms as an example
@@SATSifaction would love to see an example of that someday
🤑🤩
Awesome create a web scrapper using django with input url from user
Helpful tutorial.but poor video quality