Python Selenium Tutorial #10 - Scrape Websites with Infinite Scrolling

Sdílet
Vložit
  • čas přidán 13. 09. 2024
  • 🌐 NodeMaven Proxy Provider: go.nodemaven.c...
    💥 Special Bonus: Use "Michael" at checkout for an extra +2GB of bandwidth.
    🤖 2captcha Captcha Solving Service: bit.ly/2captch...
    This selenium tutorial is designed for beginners to learn how to use the python selenium library to perform web scraping, testing, and creating website bots. Selenium is a Python library that provides a high-level API to control Chrome or Chromium and Firefox or Geckodriver over the DevTools Protocol. Selenium runs non-headless by default but can be configured to run headless.
    Playlist: • Python Selenium Tutorial
    Code: github.com/mic...
    Join our Discord: / discord
    Infinite Scrolling Demo: intoli.com/blo...
    Undetectable ChromeDriver: pypi.org/proje...
    Gecko Driver: github.com/moz...
    Chrome Driver: chromedriver.c...
    Download Visual Studio Code: code.visualstu...
    Download Python: www.python.org...
    Selenium Library: pypi.org/proje...
    Donate
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    PayPal: support@websidev.com
    Bitcoin Wallet: bc1q05j8gcnq4mzvgj603cxdc8xxck4jgnu2ljsrt4
    Ethereum Wallet: 0x5e7BD4f473f153d400b39D593A55D68Ce80F8a2e
    Social
    ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
    Website: websidev.com
    Linkedin: / michael-kitas-638aa4209
    Instagram: / michael_kitas
    Github: github.com/mic...
    Business Email: support@websidev.com
    Tags:
    - Python Selenium Tutorial
    - Full Course Selenium
    - Python Selenium
    - Web Scraping Full Course
    - Python Selenium Web Scraping Full Course
    #selenium #python #webscraping

Komentáře • 26

  • @Faybmi
    @Faybmi Před 5 měsíci +1

    Is it possible to start parsing right away?
    with the fiftieth element and not start parsing everything again?

    • @MichaelKitas
      @MichaelKitas  Před 4 měsíci

      It wouldn't matter, we replace old scraped values with old + new ones each time. Is there a reason you want to start specifically where you left of? (Performance wise it doesn't matter)

  • @Yaser-ih2cx
    @Yaser-ih2cx Před 3 měsíci

    I can't find the code for this video in your github link.

  • @emphieishere
    @emphieishere Před 6 měsíci +1

    Thanks for a great video! Could you tell please, I just dont get it. Why should we update the items list every time instead of appending to it? Because I've tried to see how instagram behaves and it seems like everytime it scrolls down it loads an exact set of items and deletes the previous ones out of the code. Or am i being mistaken?

    • @MichaelKitas
      @MichaelKitas  Před 6 měsíci

      Because we would have duplicates each time we append since when new items are loaded we also get the old items in there.

  • @huey-nibiru
    @huey-nibiru Před rokem +1

    great video thanks for the help

  • @anurajms
    @anurajms Před 2 lety +1

    thank you

  • @pineappily3119
    @pineappily3119 Před 9 měsíci +1

    Hi I am having a doubt! You code works very well, but when I scrap, the data gets scraped from the start after some time. Is there any way for it?

    • @MichaelKitas
      @MichaelKitas  Před 9 měsíci

      Yeah, you should put an if statement to check if the amount you scraped is the same amount you currently saved, if so then stop the script

    • @pineappily3119
      @pineappily3119 Před 9 měsíci

      @@MichaelKitas Actually it didn't scrap everything. It just scrapes everything from the start again. But I got it solved it. Thanks

  • @narkornchaiwong9114
    @narkornchaiwong9114 Před rokem +1

    is web login & password and google Authenticator for selenium ? is python create from input for login website page ... result can't load from a selenium

  • @ronny584
    @ronny584 Před rokem +1

    For some reason my website can't load from a selenium scroll, it just stucks there.

  • @RealEstate3D
    @RealEstate3D Před 2 lety +1

    In my use case the first items disappear as new items are loaded, which makes sense for an application to not crash the RAM. In these cases unfortunately this wouldn`t be a solution.

    • @MichaelKitas
      @MichaelKitas  Před 2 lety

      Why not? Just save the items and every time you scrape new items just append them to an array or json file

    • @gomebenmoshe832
      @gomebenmoshe832 Před rokem

      Did you ever solve this? I have the same problem

  • @yafethtb
    @yafethtb Před 2 lety +1

    How about appending element.text directly to items list instead of updating items list with textElements list? Or is it each time Selenium scroll the page, it will scrape all over again all of the previous element.text? If that's the case, what if we use set instead of list to contain the result, so it will be only the unique result we keep?

    • @MichaelKitas
      @MichaelKitas  Před 2 lety

      It scrapes all over again, correct. You can try set, I am not sure what the difference is 👍

    • @yafethtb
      @yafethtb Před 2 lety +1

      @@MichaelKitas Ah, I see. I assume they will just scraping the current page after scrolling, but it seems it's not work like that. Thanks for the info.

    • @yafethtb
      @yafethtb Před 2 lety

      Then it might be better to scroll the page till the end of page and then scraping all the content? By doing this we don't have to updating the list.

    • @MichaelKitas
      @MichaelKitas  Před 2 lety

      @@yafethtbThat’s a bad practice, as some pages like Facebook Marketplace never have an ending and by the time they do you ram will overload and you will never get any data

  • @adamsteklov
    @adamsteklov Před rokem +1

    nah, nothing work. browser just closing before scroll to page 2

    • @MichaelKitas
      @MichaelKitas  Před rokem

      It’s not that the method doesn’t work, you either have an error and the browser is crashing or you are closing browser too soon

    • @adamsteklov
      @adamsteklov Před rokem +1

      @@MichaelKitas solved with albums?page=* . Infiniti scrolling have pages