Master Web Scraping | Python Tutorial - Make an extra $500 over a weekend. Up your Amazon FBA Game

Sdílet
Vložit
  • čas přidán 10. 06. 2019
  • Learn one of the hottest skills in Data Science today and that is web scraping. Combine that with the ability to analyze data and you've just marked yourself as a hot commodity. Companies charge thousands per month per client using the same techniques you will learn in this video.
    Code on github: github.com/satssehgal/Booksto...
    Watch how to do this with Selenium 👉 • Web Scraping with Sele...
    Watch how to do this with Scrapy 👉 • Introduction to Scrapy...
    Learn how to scrape the same page with scrapy: • Introduction to Scrapy...
    Github link to code: github.com/satssehgal/Booksto...
    👉 Facebook Group: / theaiwarriors
    👉 Instagram: @theaiwarriors
    👉 Corporate Training and Upskilling: levers.ai
    Netfirms (Affiliate) - bit.ly/2KdJ4Dp
    Linode Server - bit.ly/2XpqGi9
    Bluehost (Affiliate) - bit.ly/2GxxBh1
    PythonAnywhere (Affiliate) - bit.ly/2kWORVe
    Heroku - www.heroku.co
    NordVPN (Affiliate) - bit.ly/2W87je0
    Here is a link to my python for beginners, master python course: bit.ly/2HIZS42
    Music: ONE by Lahar / musicbylahar
    Creative Commons - Attribution 3.0 Unported - CC BY 3.0
    Free Download / Stream: bit.ly/ONE-Lahar
    Music promoted by Audio Library • Video
  • Jak na to + styl

Komentáře • 81

  • @SATSifaction
    @SATSifaction  Před 4 lety +2

    Watch Next --> czcams.com/video/p42e8NBnrGI/video.html

  • @oscarmartinezbeltran
    @oscarmartinezbeltran Před 4 lety +7

    I love your tutorials when you explain line by line every pice of code. Thank you!!!!

  • @BrianThomas
    @BrianThomas Před 4 lety +2

    Wow!! I just got to the end and I'm so floored. I had no idea. You made this so simple to follow.

  • @powerb_i
    @powerb_i Před 4 lety +2

    This was by far one of the best videos I've watched on web scraping and I finally "get" it after watching many tutorials. Well done and thank you for explaining this so well . I finally have an answer to my project that I've been trying to solve for weeks. Great job!

  • @patrickstingley4126
    @patrickstingley4126 Před rokem +1

    This was a terrific tutorial! You are very clear and easy to follow/understand. I have subscribed and I'll be reviewing your previous tutorials. Thanks!

  • @curioussouls5438
    @curioussouls5438 Před 3 lety +2

    This is exactly what I was looking for and even everyone out there I am sure. Thanks a bunch dude! 😊🙌🏻

  • @tjzz425
    @tjzz425 Před 4 lety +2

    I just started studying data science and your channel has everything i been thinking of learning! Thanks for your great work and sharing it! You are awesome

  • @christianjimenezhernandez4262

    I followed you step by step and this is amazing. Thank you very much for your time, and patience at clarifying, teaching and sharing your knowledge.

  • @nicolewang6455
    @nicolewang6455 Před 4 lety +1

    Thanks a lot! My first successful web scraping, it means a lot to me!

  • @lajosfidy3785
    @lajosfidy3785 Před 3 lety +1

    Great vid, very useful. I found a small tweak for the title extraction, as titles were getting cut off with "...". The full title was stored in a tag with the , a a "tilte" parameter. for i in soup.findAll('h3'):
    #print(i.a.attrs['title'])
    ttl=i.a.attrs['title'] #within the h3 tag is an 'a' tag, that has a 'title' parameter with the title in it
    titles.append(ttl)

  • @jaymanalastas
    @jaymanalastas Před 5 lety

    Nicely done. Thank you for sharing!

  • @AliRaza-vi6qj
    @AliRaza-vi6qj Před 2 lety

    its really work thank you so much sir, carry on great work.

  • @patrickdancel5627
    @patrickdancel5627 Před 4 lety +5

    Thank you for this tutorial. The way you teach and explain makes it easy for dumb dumbs like me to be able to follow along. You just got a lifetime subscriber. Keep up the great work and would love to learn more from you and your tutorials.

    • @SATSifaction
      @SATSifaction  Před 4 lety +1

      Thank you 🙏

    • @BrianThomas
      @BrianThomas Před 4 lety +1

      I have to fully agree with you man. You really got me hooked for life.

  • @davebeckham5429
    @davebeckham5429 Před 4 lety

    Excellent tutorial. - Thanks for sharing.

  • @chetanvgoudar9079
    @chetanvgoudar9079 Před 3 lety

    the level knowledge you have is awesome..

  • @sivexokashe6423
    @sivexokashe6423 Před 2 lety

    Damn, this was satisfactory well explained, thanks

  • @adnanyounas2541
    @adnanyounas2541 Před 3 lety +1

    Thanks for such a valuable stuff 👍👍👍

  • @fraann
    @fraann Před 4 lety +1

    Thanks! I integrate this tutorial with Flask and Mysql

  • @blood4bones366
    @blood4bones366 Před 4 lety +1

    Thanks alot , Works well for me

  • @deedanner6431
    @deedanner6431 Před 3 lety

    You did an excellent job explaining the process of web scraping .
    I have one question. How is it that tags can receive src as an argument? I understand why you did it but not how it works.
    Thanks!

  • @BrianThomas
    @BrianThomas Před 4 lety

    Question. I'm looking to pull data from multiple Excel spreadsheets that is located on a SharePoint and add the data into a new database. Would there be an easier way of doing this or would scrapping the data be just a simple?

  • @morello6061
    @morello6061 Před 4 lety

    Great Video Thanks

  • @thennarasuthen9179
    @thennarasuthen9179 Před 2 lety +1

    Please zoom in a bit...Great video...thank you

  • @leventbozkurt9796
    @leventbozkurt9796 Před 2 lety +1

    Great teacher

  • @drac8854
    @drac8854 Před 4 lety

    How to scrap an image which gives a status code of 302?

  • @icedgodz428
    @icedgodz428 Před 3 lety +1

    Hi, I think there is 1 error in this tutorial - 31:35
    When you click on cell A1 on the Excel sheet, the title reads as A Light in the ...
    Instead of A Light in the Attic which is how it is represented on the website.
    Any clarity on how to get the exact/full title?
    Other than that, video was flawless

    • @jackhales6179
      @jackhales6179 Před 3 lety

      Potentially, a website you scraped had a propensity to not show the entire string until a page loaded in or the title was clicked. Just an idea - check your raw data with a print or something. (I haven't watched the entire video.)

  • @lenac3587
    @lenac3587 Před 4 lety +9

    Hi thanks for the tutorial. Your screen is too small to read the codes comfortably. There is huge blank empty spaces on each side of the main frame, if you could zoom in abit more.

    • @SATSifaction
      @SATSifaction  Před 4 lety +4

      Thank you for the comment. Yes you are right. I posted the code on github so you all can follow. In most of my newer tutorials I’ve switched to jupyter notebooks and it’s a lot more clear.

    • @williambeasley838
      @williambeasley838 Před 3 lety

      @@SATSifaction I just finished a Data Analytics course at the university of Miami and they did everything through python in anaconda and Jupyter notebooks and Visual Code.

  • @BobGamble
    @BobGamble Před 2 lety

    Excellent tutorial and well explained. I had one hangup and I spent a day on trying to figure it out. I used Google Colab and this is the first time it threw this error. I couldn't put the dataframe in to an excel file. I kept getting no such file or directory. Finally, I put it into a Linux terminal and it ran without issues.
    One thing I'm not sure of, is does the data repeat? It does for me.

    • @SATSifaction
      @SATSifaction  Před 2 lety

      It doesnt for me. You can probably alter the code to suit your needs.

  • @user-lh4hv3tx8b
    @user-lh4hv3tx8b Před 3 lety

    I was getting a value error when running the code. I could use some help. It says "ValueError: arrays must all be same length". Any help would be very appreciated. I've attached the code below I
    import requests
    from bs4 import BeautifulSoup as bs4
    import pandas as pd
    pages = []
    prices = []
    stars = []
    titles = []
    urlss = []
    pages_to_scrape = 5
    for i in range(1, pages_to_scrape + 1):
    url = ('books.toscrape.com/catalogue/page-{}.html').format(i)
    pages.append(url)
    for item in pages:
    page = requests.get(item)
    soup = bs4(page.text, 'html.parser')
    for i in soup.find_all('h3'): # Gets tiitles
    ttl = i.getText()
    titles.append(ttl)
    for i in soup.find_all('p', class_='price_color'):
    price = i.getText()
    newprice = price.replace("Â", "")
    prices.append(newprice)
    for s in soup.find_all('p', class_='star-rating'):
    for k, v in s.attrs.items():
    star = v[1]
    stars.append(star)
    divs = soup.find_all('div', class_='image_container')
    for thumbs in divs:
    tgs = thumbs.find('img', class_='thumbnail')
    urls = 'books.toscrape.com/' + str(tgs['src'])
    newurls = urls.replace("../", "")
    urlss.append(newurls)
    data = {'Titles': titles, 'Price': prices, 'URLS': urlss, 'Stars': stars}
    print()
    print(data)
    df = pd.DataFrame(data=data)
    df.index += 1
    df

  • @previncoin8592
    @previncoin8592 Před 3 lety

    I get this error:
    TypeError Traceback (most recent call last)
    in
    17 page = requests.get(item)
    18 soup = bs4(page.text, 'html.parser')
    ---> 19 for i in soup.findALL('h3'):
    20 ttl=i.getText()
    21 titles.append(ttl)
    TypeError: 'NoneType' object is not callable

  • @michellelee7585
    @michellelee7585 Před 4 lety

    I think the code works if the url format actually has it's pages (i.e page 1 - page 5) increases incrementally by 1 at each click, but don't think this works with say Amazon or other sites. How do we go about that ?

    • @SATSifaction
      @SATSifaction  Před 4 lety

      The code is site specific. With Amazon I would use scrapy and for pagination they have a method for next page in scrapy that would handle that for you. It’s explained well in the scrapy docs

  • @jay-rp6bm
    @jay-rp6bm Před rokem

    Awesome boss, Do you think this field of data scrapping will be needed more and more. its 2023 right now and I'm taking a Google Cert on Data Analysis ,Do you think we can be still valuable in Data Scraping ? Its not taught in the class by Google. Please Advise >..JAY Thank you

  • @danloyer6241
    @danloyer6241 Před 3 lety +1

    Hi; I really enjoy this video , my background isn't in programming but I am really interested in this Web Scraping (Data Science) type of work but have no idea or direction as to how to get started, I know on Udemy has some related web scraping courses and how to learn HTML, CSS and Python etc.. I also know that some University or College do offer courses in Data Science, but the cost is very high almost $10,000 for a one year course.

    • @SATSifaction
      @SATSifaction  Před 3 lety

      Hi there. Web scraping is a very good area to get into especially around data collection. There are a lot of great free youtube resources. I would first try out a few projects before paying that sum of money to an institution. Several time I’ve seen people invest a lot of money in a hot skill but later have no passion for it. To get you started you can view my web scraping courses...all free...enjoy -> czcams.com/play/PLM30lSIwxWOjrr-6zuMj28fC5RxrPY_Tc.html

  • @adnanyounas2541
    @adnanyounas2541 Před 3 lety +1

    Hi can you explain why you use format(i) little bit more

    • @SATSifaction
      @SATSifaction  Před 3 lety

      You would use .format when you want to format a string to include a variable. If variable X=12 for examples the following code ‘I am happy to be {}’.format(X) would give ‘I am happy to be 12. Hope that helps.

  • @dennismartin5455
    @dennismartin5455 Před 3 lety

    When I run the df.to_excel() I get the openpyxl file not found.
    Other than that, good so far.
    I can copy the code and run it in Sublime and Gitbash terminal with no errors. And the excel file is produced.

    • @jjasghar
      @jjasghar Před 2 lety

      I had to do a pip install openpyxl to get past that error for me. I made sure I ran all this in a virtualenvironment.

  • @josecarlosalaodeoliveira9463

    Pls, what is your file in github for this program, I found others programs but not this. The tutorial is great!!

  • @kasiopeaxerxa9754
    @kasiopeaxerxa9754 Před 3 lety

    If I am using Spyder to program in Python. Would you suggest to use BeautifulSoup or Scrapy?

    • @SATSifaction
      @SATSifaction  Před 3 lety

      Either are fine though the two have different applications. BS is more for a quick and dirty web scrape while scrapy is more of a framework that has more applicability

  • @CurrentElectrical
    @CurrentElectrical Před 3 lety

    What do you suggest for scrapping sites that have a login?

    • @SATSifaction
      @SATSifaction  Před 3 lety

      For logins i would recommend selenium. Check out my video on building a billing bot which covers the login process -> czcams.com/video/HsA0mJ4kNKE/video.html

  • @jesuschrist1501
    @jesuschrist1501 Před 4 lety

    i dont understand how someone can be ip blocked or something for web scraping... i mean is it not just reading the html code and searching and finding targeted tags and paths and then putting them in a storage and organize them... it seems like its something that's completely client-sided, how do they blacklist or find out you're scraping them?

    • @SATSifaction
      @SATSifaction  Před 4 lety +2

      To we scrape you will be using the request module. In order to get the data from their server you make a request to their server which will return the html content. Every request you make hits their server. If you make too many requests they can ban the IP address that makes the requests, in other words your ip. Also if a user doesn’t respect the robot.txt file which outlines what you can and cannot scrape then they can ban the ip for making requests that aren’t authorized.

  • @First_Principals
    @First_Principals Před 4 lety

    I'm wondering why you decided to use jupiter books?

    • @SATSifaction
      @SATSifaction  Před 4 lety

      No specific reason other than the fact that it’s a great tool to teach and train python with...

  • @muhammadatif2215
    @muhammadatif2215 Před 4 lety

    Sir there is one website bol.com it's hard to scrap would you please teach me how to scrap that website

  • @joelfuentescuriel6174
    @joelfuentescuriel6174 Před 3 lety

    As a full-stack web developer, that means a lot to me. My feelings are hurt as fuck, lol.

  • @riteshpatel-yz7rd
    @riteshpatel-yz7rd Před 3 lety

    How to scrape amazon products URL and asin number, please tell me

    • @SATSifaction
      @SATSifaction  Před 3 lety

      Watch this video. It’s a similar concept that you can apply to Amazon. However Amazon is a lot more difficult to scrape. 👉🏼 czcams.com/video/NXNhqNyYpHI/video.html

  • @SL-yj9vt
    @SL-yj9vt Před 3 lety +1

    how would you turn the prices into usd?

    • @SATSifaction
      @SATSifaction  Před 3 lety

      You could always connect to an an external API o convert it. It depends on how the website displays the data. if its in USD it will scrape that and you can add your own API for currency conversion.

  • @raviranjansharma8953
    @raviranjansharma8953 Před 4 lety

    Thanks for sharing Sir, We are uncomfortable to read the code . These are too small, please share with zoom font size.

    • @SATSifaction
      @SATSifaction  Před 4 lety +1

      raviranjan sharma thanks for the note. I cannot edit the video however I have uploaded the code on GitHub for you to use and follow. It’s in the link description. All my new videos use much bigger font 😊

    • @raviranjansharma8953
      @raviranjansharma8953 Před 4 lety

      Thank you Sir

  • @BeingVikram16
    @BeingVikram16 Před 3 lety

    What is exactly prerequisite to learn Web scraping???? Please Sir make video of prerequisite of web scraping 🙏🙏🙏🙏🙏🙏🙏
    We need exact knowledge of learning web scraping 🙏🙏🙏🙏🙏🙏🙏🙏🙏

  • @LemonWarfare
    @LemonWarfare Před 4 lety

    Great Video sir! NoobQuestion: Are Web scraping jobs only revolves in extracting these kinds of data then to be later shown in a table?

    • @SATSifaction
      @SATSifaction  Před 4 lety

      +Jon Jimlin Sumalhay no there is more applications. Datasuch as prices, that are web scraped can be inputs to machine learning models like pricing algorithms as an example

    • @nitusidhu6808
      @nitusidhu6808 Před 4 lety +2

      @@SATSifaction would love to see an example of that someday

  • @relaxinggospelmusic2421
    @relaxinggospelmusic2421 Před 5 měsíci

    🤑🤩

  • @storytimekids123
    @storytimekids123 Před 4 lety

    Awesome create a web scrapper using django with input url from user

  • @allrounder8816
    @allrounder8816 Před 2 lety

    Helpful tutorial.but poor video quality