Python Selenium Tutorial #10 - Scrape Websites with Infinite Scrolling
Vložit
- čas přidán 13. 09. 2024
- 🌐 NodeMaven Proxy Provider: go.nodemaven.c...
💥 Special Bonus: Use "Michael" at checkout for an extra +2GB of bandwidth.
🤖 2captcha Captcha Solving Service: bit.ly/2captch...
This selenium tutorial is designed for beginners to learn how to use the python selenium library to perform web scraping, testing, and creating website bots. Selenium is a Python library that provides a high-level API to control Chrome or Chromium and Firefox or Geckodriver over the DevTools Protocol. Selenium runs non-headless by default but can be configured to run headless.
Playlist: • Python Selenium Tutorial
Code: github.com/mic...
Join our Discord: / discord
Infinite Scrolling Demo: intoli.com/blo...
Undetectable ChromeDriver: pypi.org/proje...
Gecko Driver: github.com/moz...
Chrome Driver: chromedriver.c...
Download Visual Studio Code: code.visualstu...
Download Python: www.python.org...
Selenium Library: pypi.org/proje...
Donate
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
PayPal: support@websidev.com
Bitcoin Wallet: bc1q05j8gcnq4mzvgj603cxdc8xxck4jgnu2ljsrt4
Ethereum Wallet: 0x5e7BD4f473f153d400b39D593A55D68Ce80F8a2e
Social
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Website: websidev.com
Linkedin: / michael-kitas-638aa4209
Instagram: / michael_kitas
Github: github.com/mic...
Business Email: support@websidev.com
Tags:
- Python Selenium Tutorial
- Full Course Selenium
- Python Selenium
- Web Scraping Full Course
- Python Selenium Web Scraping Full Course
#selenium #python #webscraping
Is it possible to start parsing right away?
with the fiftieth element and not start parsing everything again?
It wouldn't matter, we replace old scraped values with old + new ones each time. Is there a reason you want to start specifically where you left of? (Performance wise it doesn't matter)
I can't find the code for this video in your github link.
Thanks for a great video! Could you tell please, I just dont get it. Why should we update the items list every time instead of appending to it? Because I've tried to see how instagram behaves and it seems like everytime it scrolls down it loads an exact set of items and deletes the previous ones out of the code. Or am i being mistaken?
Because we would have duplicates each time we append since when new items are loaded we also get the old items in there.
great video thanks for the help
thank you
Hi I am having a doubt! You code works very well, but when I scrap, the data gets scraped from the start after some time. Is there any way for it?
Yeah, you should put an if statement to check if the amount you scraped is the same amount you currently saved, if so then stop the script
@@MichaelKitas Actually it didn't scrap everything. It just scrapes everything from the start again. But I got it solved it. Thanks
is web login & password and google Authenticator for selenium ? is python create from input for login website page ... result can't load from a selenium
Not sure what you are talking about
For some reason my website can't load from a selenium scroll, it just stucks there.
What do you mean? It doesn't scroll?
In my use case the first items disappear as new items are loaded, which makes sense for an application to not crash the RAM. In these cases unfortunately this wouldn`t be a solution.
Why not? Just save the items and every time you scrape new items just append them to an array or json file
Did you ever solve this? I have the same problem
How about appending element.text directly to items list instead of updating items list with textElements list? Or is it each time Selenium scroll the page, it will scrape all over again all of the previous element.text? If that's the case, what if we use set instead of list to contain the result, so it will be only the unique result we keep?
It scrapes all over again, correct. You can try set, I am not sure what the difference is 👍
@@MichaelKitas Ah, I see. I assume they will just scraping the current page after scrolling, but it seems it's not work like that. Thanks for the info.
Then it might be better to scroll the page till the end of page and then scraping all the content? By doing this we don't have to updating the list.
@@yafethtbThat’s a bad practice, as some pages like Facebook Marketplace never have an ending and by the time they do you ram will overload and you will never get any data
nah, nothing work. browser just closing before scroll to page 2
It’s not that the method doesn’t work, you either have an error and the browser is crashing or you are closing browser too soon
@@MichaelKitas solved with albums?page=* . Infiniti scrolling have pages