How I Scrape Data with Multiple Selenium Instances
Vložit
- čas přidán 11. 09. 2024
- DISCORD (NEW): / discord
Selenium Grid first look for web scraping concurrently with headless chrome
Patreon: / johnwatsonrooney (NEW free tier)
Scraper API www.scrapingbe...
Donations: www.paypal.com...
Hosting: Digital Ocean: m.do.co/c/c7c9...
Gear I use: www.amazon.co....
Yes!!! My scraper system has grown exponentially and it's a bit too much to handle. This is exactly what I've been looking for
I wish all tutorials were as descriptive and straightforward as this one. Immediately subscribed ❤
I never experiment anything on my own in actuality. I just wait for your innovative solutions to come through so that i learn and implement them.
Hope there are more sessions based on selenium grid of just not scraping but with operations like populating a form on a webpage concurrently.
Thanks John for being an amazing teacher
Just what I needed to watch, thank u very much
I am working on remote server and had no time to check how grid works. I do know now. Geeez. This is what I was looking for. Thumbs up 👍
Thanks for watching !
Another thing you can do its use a browser as a service (like an API)
And you connect to that browser through API requests.
This is very useful. Hope you make more tutorials for selenium grid, especially running in a cloud environment.
Yes more coming
Amazing John Watson, this exactly was an issue i was struggling with.
And there arent so much information.
Thanks! Appreciate it
I was very New in Web scraping but definitely I can try different ways 😊 thanks for sharing this information Your video 😊
great thanks for watching!
@@JohnWatsonRooney i was regular followers
Very cool! Definitely going to try to set this up myself and test it with multiple scrapers.
Please do and let me know how you get on, I’ve got some more stuff to test like running grid over multiple severs via docker swarm
We'll do.. Running scrapers over multiple instances would be very interesting to see how you would set that up!@@JohnWatsonRooney
Thank John for this video and illustration, it was new for me to know about this feature, which is awesome, I remember I did multiple instances int he past of webdriver to run simultaneously, but also seems to be sequential !! as it was a bit of hassle but it was working, but now with this feature it will be more easier.
Awesome work as always John! Thanks brother!🎉
thanks!
as there is a selenium-arm built for docker you can also run that on raspberry pi or even a pine64 without a gui-OS installed on it like i do.
btw, it's still a browser that's spooled up and it's not headless, as you can vnc into those instances by clicking on the camera and see the browser open and close... just like you did on your desktop.
so those instances aren't headless, they just open inside docker which can be running on another host.
thanks for the clarification about the headless you are right. I need to look into the rpi arm version!
Amazing content. Would love it if you could create a Docker Crash Course.
I’d love to however I’ve still got a lot to learn about docker!
I know my comment might seem off-topic, but I really like your color theme. It looks so soothing and easy for the eyes. Could you please share the name of your color theme? Thank you.
No problem sure, it’s called everforest
Can you please make videos on Docker with these kind of experiments, that would be awesome..
Yes sure there are more like this coming
Thanks so much for the sharing of knowledge.
I think that's kind of similar with playwright with persistent, browser new context, different tab/instances with different cookies, headless
It spawns multiple instances rather than reusing the same with extra pages. I think there was a time when you could connect playwright to grid. I’m gonna explore the playwright options tok
thanks jhon for this amazing video like every time please i have a question about selenium i try the code in your last video and i want to add a chrme profile but i can't
A tutorial on how to set this with aws lambda would be amazing!
As i deployed that to the cloud i couldn't get any response from the Amazon site, but for every other site it worked well.
Thanks John,How I can pypass cloudflare capatcha?
have a look for cloud scraper and see if that helps you
What about this grid or multisession with the non headless browser, the chrome extensions and docker. It's challenging to setup together.
What are the advantages of using Selenium grid instead of Playwright async?
John quality work as always, i have a question mate related to Neovim, bows your experience with it been?
It seems like everyone these days have jumped on the bandwagon
I use default IDLE text edutor for quick scripting and VS Code with Emacs key bindings for more complex projects
I really love my Emacs key bindings, is learning Vim a requirement for Neovim or can I also use Emacs keybindings as well?
thanks mate. Yeah I'm loving Neovim but yes its all vim keybinds. I guess you could create your own keymap but I dont think that would be worth it. I never learned emacs so once i got the basics of the vim movement, copy/paste and some basic motions it really clicked for me. I'd say if your happy with what you've got don't worry about it. Nvim fits my flow really well and i feel faster than i was in vs code/pycharm. if i use vs code now i used it with vim bindings too.
Amazing
Thanks
John i can't follow you on java command things. Hope you make detailed video about selenium grid.
Excelent
How Can We Add Proxy with Remote driver?
How can i use these type of script in mobile python
Hello I needed some help in scrapping Amazon website please reply
why use selenium grid ? when we can have concurrent threads for each selenium web driver.
each thread will open up a driver get data and close it.
i dont get it why we are using selenium grid here when this can happen with basic selenium web driver.
or i am missing something ? or maybe this is not the best use case for selenium grid ?
Im going to test your solution but when I used múltiple Selenium at the same time with threads my app broke.
@@TheJFMR is your code open source ? can you share it ?
@@MARTIN-101 I think you were right, I already tested and It worked.
With concurrent.futures.ThreadPoolExecutor
I remember some time ago It does not work with multiprocessing because It break all the scripts
@@MARTIN-101 It work when you use multithreading in the same script but imagine in a scraping company that they need to run múltiple scripts or scrapy spiders in a crontab at the same time to scrape.
So here comes Selenium Grid into play
Why use Selenium when there is Playwright?
I normally use Playwright, but Selenium 4 is pretty good too and has Grid.
@@JohnWatsonRooney Wait, did Selenium just got updated? I don't remember such a functionality and being so easy to import to use it :0
@@AmodeusR selenium v4! (welcome to the discord #101 ;D)
where is github code ?
❤
.
Thank you soooooooooooooo much~