How I Scrape Data with Multiple Selenium Instances

Sdílet
Vložit
  • čas přidán 11. 09. 2024
  • DISCORD (NEW): / discord
    Selenium Grid first look for web scraping concurrently with headless chrome
    Patreon: / johnwatsonrooney (NEW free tier)
    Scraper API www.scrapingbe...
    Donations: www.paypal.com...
    Hosting: Digital Ocean: m.do.co/c/c7c9...
    Gear I use: www.amazon.co....

Komentáře • 61

  • @Septumsempra8818
    @Septumsempra8818 Před 11 měsíci +4

    Yes!!! My scraper system has grown exponentially and it's a bit too much to handle. This is exactly what I've been looking for

  • @anushibinj
    @anushibinj Před 6 měsíci +1

    I wish all tutorials were as descriptive and straightforward as this one. Immediately subscribed ❤

  • @irfanshaikh262
    @irfanshaikh262 Před 11 měsíci +2

    I never experiment anything on my own in actuality. I just wait for your innovative solutions to come through so that i learn and implement them.
    Hope there are more sessions based on selenium grid of just not scraping but with operations like populating a form on a webpage concurrently.
    Thanks John for being an amazing teacher

  • @matth3wss
    @matth3wss Před 25 dny +1

    Just what I needed to watch, thank u very much

  • @sviatkey
    @sviatkey Před 11 měsíci +2

    I am working on remote server and had no time to check how grid works. I do know now. Geeez. This is what I was looking for. Thumbs up 👍

  • @TheJFMR
    @TheJFMR Před 11 měsíci +4

    Another thing you can do its use a browser as a service (like an API)
    And you connect to that browser through API requests.

  • @anishpillai
    @anishpillai Před 11 měsíci +2

    This is very useful. Hope you make more tutorials for selenium grid, especially running in a cloud environment.

  • @TheJFMR
    @TheJFMR Před 11 měsíci +2

    Amazing John Watson, this exactly was an issue i was struggling with.
    And there arent so much information.

  • @chandrasekaran2429
    @chandrasekaran2429 Před 11 měsíci +2

    I was very New in Web scraping but definitely I can try different ways 😊 thanks for sharing this information Your video 😊

  • @rick-hoekman
    @rick-hoekman Před 11 měsíci +1

    Very cool! Definitely going to try to set this up myself and test it with multiple scrapers.

    • @JohnWatsonRooney
      @JohnWatsonRooney  Před 11 měsíci +1

      Please do and let me know how you get on, I’ve got some more stuff to test like running grid over multiple severs via docker swarm

    • @rick-hoekman
      @rick-hoekman Před 11 měsíci +1

      We'll do.. Running scrapers over multiple instances would be very interesting to see how you would set that up!@@JohnWatsonRooney

  • @CodePhiles
    @CodePhiles Před 10 měsíci

    Thank John for this video and illustration, it was new for me to know about this feature, which is awesome, I remember I did multiple instances int he past of webdriver to run simultaneously, but also seems to be sequential !! as it was a bit of hassle but it was working, but now with this feature it will be more easier.

  • @pascal831
    @pascal831 Před 11 měsíci +1

    Awesome work as always John! Thanks brother!🎉

  • @soul_maestro
    @soul_maestro Před 11 měsíci +3

    as there is a selenium-arm built for docker you can also run that on raspberry pi or even a pine64 without a gui-OS installed on it like i do.
    btw, it's still a browser that's spooled up and it's not headless, as you can vnc into those instances by clicking on the camera and see the browser open and close... just like you did on your desktop.
    so those instances aren't headless, they just open inside docker which can be running on another host.

    • @JohnWatsonRooney
      @JohnWatsonRooney  Před 11 měsíci

      thanks for the clarification about the headless you are right. I need to look into the rpi arm version!

  • @123arskas
    @123arskas Před 11 měsíci +1

    Amazing content. Would love it if you could create a Docker Crash Course.

    • @JohnWatsonRooney
      @JohnWatsonRooney  Před 11 měsíci +1

      I’d love to however I’ve still got a lot to learn about docker!

  • @41v47
    @41v47 Před 11 měsíci +1

    I know my comment might seem off-topic, but I really like your color theme. It looks so soothing and easy for the eyes. Could you please share the name of your color theme? Thank you.

  • @technicalking4711
    @technicalking4711 Před 11 měsíci +1

    Can you please make videos on Docker with these kind of experiments, that would be awesome..

  • @jiaqint961
    @jiaqint961 Před 5 měsíci

    Thanks so much for the sharing of knowledge.

  • @AllifIzzuddin
    @AllifIzzuddin Před 11 měsíci +1

    I think that's kind of similar with playwright with persistent, browser new context, different tab/instances with different cookies, headless

    • @JohnWatsonRooney
      @JohnWatsonRooney  Před 11 měsíci

      It spawns multiple instances rather than reusing the same with extra pages. I think there was a time when you could connect playwright to grid. I’m gonna explore the playwright options tok

  • @alexdin1565
    @alexdin1565 Před 11 měsíci +1

    thanks jhon for this amazing video like every time please i have a question about selenium i try the code in your last video and i want to add a chrme profile but i can't

  • @GusMD84
    @GusMD84 Před 11 měsíci

    A tutorial on how to set this with aws lambda would be amazing!

  • @kanwaradnan4849
    @kanwaradnan4849 Před 5 měsíci

    As i deployed that to the cloud i couldn't get any response from the Amazon site, but for every other site it worked well.

  • @user-rk7dr8ff6v
    @user-rk7dr8ff6v Před 11 měsíci +1

    Thanks John,How I can pypass cloudflare capatcha?

    • @JohnWatsonRooney
      @JohnWatsonRooney  Před 11 měsíci

      have a look for cloud scraper and see if that helps you

  • @richiestark4921
    @richiestark4921 Před 9 měsíci

    What about this grid or multisession with the non headless browser, the chrome extensions and docker. It's challenging to setup together.

  • @jpeca13
    @jpeca13 Před 7 měsíci

    What are the advantages of using Selenium grid instead of Playwright async?

  • @CrazyFanaticMan
    @CrazyFanaticMan Před 11 měsíci +1

    John quality work as always, i have a question mate related to Neovim, bows your experience with it been?
    It seems like everyone these days have jumped on the bandwagon
    I use default IDLE text edutor for quick scripting and VS Code with Emacs key bindings for more complex projects
    I really love my Emacs key bindings, is learning Vim a requirement for Neovim or can I also use Emacs keybindings as well?

    • @JohnWatsonRooney
      @JohnWatsonRooney  Před 11 měsíci +1

      thanks mate. Yeah I'm loving Neovim but yes its all vim keybinds. I guess you could create your own keymap but I dont think that would be worth it. I never learned emacs so once i got the basics of the vim movement, copy/paste and some basic motions it really clicked for me. I'd say if your happy with what you've got don't worry about it. Nvim fits my flow really well and i feel faster than i was in vs code/pycharm. if i use vs code now i used it with vim bindings too.

  • @technicalking4711
    @technicalking4711 Před 11 měsíci +1

    Amazing

  • @nizarfathurohman486
    @nizarfathurohman486 Před 10 měsíci

    John i can't follow you on java command things. Hope you make detailed video about selenium grid.

  • @Optimusjf
    @Optimusjf Před 11 měsíci +1

    Excelent

  • @MDAbdurRahimcs50
    @MDAbdurRahimcs50 Před 5 měsíci

    How Can We Add Proxy with Remote driver?

  • @lordlegendsss7776
    @lordlegendsss7776 Před 6 měsíci

    How can i use these type of script in mobile python

  • @iamshiva003
    @iamshiva003 Před 11 měsíci

    Hello I needed some help in scrapping Amazon website please reply

  • @MARTIN-101
    @MARTIN-101 Před 11 měsíci

    why use selenium grid ? when we can have concurrent threads for each selenium web driver.
    each thread will open up a driver get data and close it.
    i dont get it why we are using selenium grid here when this can happen with basic selenium web driver.
    or i am missing something ? or maybe this is not the best use case for selenium grid ?

    • @TheJFMR
      @TheJFMR Před 11 měsíci

      Im going to test your solution but when I used múltiple Selenium at the same time with threads my app broke.

    • @MARTIN-101
      @MARTIN-101 Před 11 měsíci

      @@TheJFMR is your code open source ? can you share it ?

    • @TheJFMR
      @TheJFMR Před 11 měsíci

      @@MARTIN-101 I think you were right, I already tested and It worked.
      With concurrent.futures.ThreadPoolExecutor
      I remember some time ago It does not work with multiprocessing because It break all the scripts

    • @TheJFMR
      @TheJFMR Před 10 měsíci +1

      @@MARTIN-101 It work when you use multithreading in the same script but imagine in a scraping company that they need to run múltiple scripts or scrapy spiders in a crontab at the same time to scrape.
      So here comes Selenium Grid into play

  • @AmodeusR
    @AmodeusR Před 11 měsíci +1

    Why use Selenium when there is Playwright?

    • @JohnWatsonRooney
      @JohnWatsonRooney  Před 11 měsíci +1

      I normally use Playwright, but Selenium 4 is pretty good too and has Grid.

    • @AmodeusR
      @AmodeusR Před 11 měsíci +1

      @@JohnWatsonRooney Wait, did Selenium just got updated? I don't remember such a functionality and being so easy to import to use it :0

    • @JohnWatsonRooney
      @JohnWatsonRooney  Před 11 měsíci +1

      @@AmodeusR selenium v4! (welcome to the discord #101 ;D)

  • @salamandralw
    @salamandralw Před 5 měsíci

    where is github code ?

  • @dobcs3236
    @dobcs3236 Před 6 měsíci

  • @bakasenpaidesu
    @bakasenpaidesu Před 11 měsíci +2

    .

  • @kaistai
    @kaistai Před 4 měsíci

    Thank you soooooooooooooo much~