John Watson Rooney
John Watson Rooney
  • 279
  • 7 226 201
Every Web Scraper should know THIS
➡ WORK WITH ME
johnwr.com
➡ COMMUNITY
discord.gg/C4J2uckpbR
www.patreon.com/johnwatsonrooney
➡ PROXIES
proxyscrape.com/?ref=jhnwr
www.scrapingbee.com?fpr=jhnwr
➡ HOSTING
m.do.co/c/c7c90f161ff6
If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and web content as much as I do, you can subscribe for weekly content.
⚠ DISCLAIMER
Some/all of the links above are affiliate links. By clicking on these links I receive a small commission should you chose to purchase any services or items.
zhlédnutí: 1 210

Video

You're missing out if you don't use THESE
zhlédnutí 1,6KPřed 4 hodinami
Check Out ProxyScrape here: proxyscrape.com/?ref=jhnwr ➡ WORK WITH ME johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and we...
Python Automation- product checker and buyer
zhlédnutí 1,8KPřed 21 hodinou
Check Out ProxyScrape here: proxyscrape.com/?ref=jhnwr ➡ WORK WITH ME johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and we...
How to Scrape 4 Sites with 1 Script (code along)
zhlédnutí 3KPřed 14 dny
Check Out ProxyScrapy here: proxyscrape.com/?ref=jhnwr ➡ WORK WITH ME johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and we...
My System for Easily Scraping 150k Items from the web
zhlédnutí 3,2KPřed 21 dnem
Use JWR at checkout to get 2GB of proxies for free: go.nodemaven.com/scrapingproxy ➡ E-commerce Data Extraction Specialist johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES go.nodemaven.com/scrapingproxy ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data ...
How much slower is Playwright at Scraping?
zhlédnutí 2KPřed 21 dnem
➡ E-commerce Data Extraction Specialist johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES nodemaven.com/?a_aid=JohnWatsonRooney ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and web content as much a...
The Simple Automation Script my Colleagues Loved.
zhlédnutí 3,5KPřed měsícem
The first 500 people to use my link skl.sh/johnwatsonrooney06241 will get a 1 month free trial of Skillshare premium! This video is sponsored by Skillshare johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES www.scrapingbee.com/?fpr=jhnwr proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python develo...
Scraping 7000 Products in 20 Minutes
zhlédnutí 4,1KPřed měsícem
Go to proxyscrape.com/?ref=jhnwr for the Proxies I use. johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES www.scrapingbee.com/?fpr=jhnwr proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like p...
How I Scrape 7k Products with Python (code along)
zhlédnutí 8KPřed měsícem
A short but complete project of scraping 7k products with Python. johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES www.scrapingbee.com/?fpr=jhnwr proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If ...
This will change Web Scraping forever.
zhlédnutí 8KPřed 2 měsíci
What to try this yourself? Sign up at www.zyte.com/ and use code JWR203 for $20 for free each month for 3 months. Limited availability first come first serve. Once you have created an account enter the coupon code JWR203 under settings, subscriptions, modify & enter code. Zyte gave me access to their API and NEW AI spider tech to see how it compares to scraping manually, with incredible results...
The most important Python script I ever wrote
zhlédnutí 170KPřed 2 měsíci
The story of my first and most important automation script, plus an example of what it would look like now. ✅ WORK WITH ME ✅ johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES www.scrapingbee.com/?fpr=jhnwr proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data...
Why I chose Python & Polars for Data Analysis
zhlédnutí 5KPřed 3 měsíci
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/JohnWatsonRooney/ . You’ll also get 20% off an annual premium subscription. This video was sponsored by Brilliant join the Discord to discuss all things Python and Web with our growing community! discord.gg/C4J2uckpbR Work with me: johnwr.com If you are new, welcome! I am John, a self taught Python developer w...
The Best Tools to Scrape Data in 2024
zhlédnutí 7KPřed 3 měsíci
Python has a great ecosystem for webscraping and in this video I run through the packages I use everyday to scrape data. Join the Discord to discuss all things Python and Web with our growing community! discord.gg/C4J2uckpbR If you are new, welcome! I am John, a self taught Python developer working in the web and data space. I specialize in data extraction and JSON web API's both server and cli...
Is Your Scraper Slow? Try THIS Simple Method
zhlédnutí 5KPřed 3 měsíci
Get Proxies from Nodemaven Now: go.nodemaven.com/scrapingproxy Use Code: JWR for 2 GB on purchase Threads and parallel processing are still useful for scraping, even though most of the waiting is I/O which is best served by async, it still can make your code much faster in the right situations, and is very simple to implement. Join the Discord to discuss all things Python and Web with our growi...
Scraping with Playwright 101 - Easy Mode
zhlédnutí 8KPřed 3 měsíci
Playwright is an incredible versatile tool for browser automation, and in this video I run thorugh a simple project to get you up and running scraping data with PW & Python Join the Discord to discuss all things Python and Web with our growing community! discord.gg/C4J2uckpbR If you are new, welcome! I am John, a self taught Python developer working in the web and data space. I specialize in da...
Cleaning up 1000 Scraped Products with Polars
zhlédnutí 5KPřed 4 měsíci
Cleaning up 1000 Scraped Products with Polars
Website to Dataset in an instant
zhlédnutí 7KPřed 4 měsíci
Website to Dataset in an instant
This is a Scraping Cheat Code (for certain sites)
zhlédnutí 4,5KPřed 4 měsíci
This is a Scraping Cheat Code (for certain sites)
Let me explain my new Rust love affair..
zhlédnutí 982Před 4 měsíci
Let me explain my new Rust love affair..
Stop Wasting Time on Simple Excel Tasks, Use Python
zhlédnutí 10KPřed 5 měsíci
Stop Wasting Time on Simple Excel Tasks, Use Python
The HTML Element I check FIRST when Web Scraping
zhlédnutí 2,8KPřed 5 měsíci
The HTML Element I check FIRST when Web Scraping
Try this SIMPLE trick when scraping product data
zhlédnutí 3,8KPřed 5 měsíci
Try this SIMPLE trick when scraping product data
More spiders, more data
zhlédnutí 2,8KPřed 5 měsíci
More spiders, more data
still the best way to scrape data.
zhlédnutí 15KPřed 6 měsíci
still the best way to scrape data.
Make Queues, Run Jobs, Scrape Data.
zhlédnutí 4,4KPřed 6 měsíci
Make Queues, Run Jobs, Scrape Data.
I had no idea you could scrape this site this way
zhlédnutí 4,4KPřed 6 měsíci
I had no idea you could scrape this site this way
This is the ONLY way I'll use Selenium now
zhlédnutí 7KPřed 7 měsíci
This is the ONLY way I'll use Selenium now
Scraping HTML Tables VS Dynamic JavaScript Tables
zhlédnutí 3,6KPřed 8 měsíci
Scraping HTML Tables VS Dynamic JavaScript Tables
Scrapy in 30 Minutes (start here.)
zhlédnutí 15KPřed 8 měsíci
Scrapy in 30 Minutes (start here.)
Webscraping with Python How to Save to CSV, JSON and Clean Data
zhlédnutí 5KPřed 8 měsíci
Webscraping with Python How to Save to CSV, JSON and Clean Data

Komentáře

  • @marcosziadi9059
    @marcosziadi9059 Před 4 hodinami

    Hi Jhon! I have a question, following your hiden api videos and some others, i finally finished a project that creates datasets with walmart products based on whatever the user wants the dataset to be about. I did this project using their hidden api, creating datasets that can get pretty big (15000 products), but for every dataset, i have to make around 100 and 200 get requests in order to get all the products. Is this legal/ethical to put it in my curriculum or in a linkedin post as a personal project even though in the walmart website says that they do not allow web scraping?

  • @indrasaputraahmadi3449
    @indrasaputraahmadi3449 Před 6 hodinami

    amazing explanation. thanks

  • @acharafranklyn5167
    @acharafranklyn5167 Před 6 hodinami

    This is gold

  • @CeratiGilmour
    @CeratiGilmour Před 7 hodinami

    Funcionaría junto con selenium?

  • @LinkedkefamFamlinkedIn
    @LinkedkefamFamlinkedIn Před 7 hodinami

    John please make it longer nd scrap data in csv file nd please use undetected or captcha solver methods to scrap data please. I love your videos. John❤

  • @tmb8807
    @tmb8807 Před 8 hodinami

    I'm sure you know this by now, but the Polars read_csv method supports glob patterns, so the loop approach is unnecessary - you can simply pass _folder + "/*.csv"_ as the source parameter and the concatenation will be done automatically.

    • @JohnWatsonRooney
      @JohnWatsonRooney Před 7 hodinami

      Yea you are right that’s much better thanks for clarifying!

  • @kirill_good_job
    @kirill_good_job Před 9 hodinami

    How to solve this error? ModuleNotFoundError Traceback (most recent call last) Cell In[1], line 1 ----> 1 from selenium import webdriver 3 url = 'www.youtube.com/@JohnWatsonRooney/videos' 5 driver = webdriver.Chrome() ModuleNotFoundError: No module named 'selenium'

  • @zakariaboulouarde4591
    @zakariaboulouarde4591 Před 10 hodinami

    Thaaank you so much 🙏🏽🙏🏽🙏🏽, I've really learned too much from your videos. What if the api is protected by Cloudflare and sometime it gives unauthorized, is there a solution?

    • @JohnWatsonRooney
      @JohnWatsonRooney Před 10 hodinami

      Once you have the cookies you should be good, you’ll need to refresh them every so often, either manually or by using an undetected browser/captcha solver

    • @zakariaboulouarde4591
      @zakariaboulouarde4591 Před 10 hodinami

      @@JohnWatsonRooney I am trying to visit the api from the browser and it give me unauthorized, I think it is not from the cookies. I can share with you the link to test.

  • @rodgerthat6287
    @rodgerthat6287 Před 11 hodinami

    Hey dude jsut stared my first internhsip, and this video has been immensly helpful! I really appreicate the effort put in and all the useful tips. Thanks!

  • @piercenorton1544
    @piercenorton1544 Před dnem

    What if we want to take a full page so we can give it to an LLM to parse? For example, what if we were parsing financial filings or contracts. We want chunks or pages to pass to an LLM to structure outputs. I think splitting the text on a tag and then joining the items together would be best, but maybe there is a better way.

  • @HitAndMissLab
    @HitAndMissLab Před dnem

    Do you have any videos on how to use proxies in Python?

    • @JohnWatsonRooney
      @JohnWatsonRooney Před 15 hodinami

      I don’t specifically but that’s a good idea I will create a video on proxies inc how to use

  • @breandensamas8623
    @breandensamas8623 Před dnem

    Good one

  • @elmzlan
    @elmzlan Před dnem

    I hope you have a course

  • @milosZcr
    @milosZcr Před 2 dny

    Great content, very useful now that I am learning about this subject. You earned a new sub here

  • @personofnote1571
    @personofnote1571 Před 2 dny

    Great point about separation of concerns. As you stated, the scraper should only be concerned with getting data and saving data. I am curious what other use cases would be compatible with scrapy’s pipelines. Would pipelines be a good place for things like “save to this OTHER database”, or “upload to S3”, or “ping this api”? Will be diving into this myself soon but curious about your thoughts here.

    • @JohnWatsonRooney
      @JohnWatsonRooney Před dnem

      yes absolutely, you could use an item field to decide whether to upload to X DB or Y DB, and certainly uploading to S3 would come here too. pinging an API you mean like to notify another system? I think that would be a great use case for pipelines (not thought of that before)

  • @jjeffery129
    @jjeffery129 Před 2 dny

    What’s wrong with scrapping them as string and change them in the end in your output file?

  • @alexdin1565
    @alexdin1565 Před 2 dny

    Hi Johne i have a question can we use scrapy with django? i mean make the webscraper as online tool

    • @RicardoPorteladaSilva
      @RicardoPorteladaSilva Před 2 dny

      I think you could create script to scrape separately and load de result to django databases. The processing occurs in separated moments. I hope you understand my English, I'm from Brazil, learning English. if you need more specific please feel free to getting in touch. its a great pleasure to help you

    • @JohnWatsonRooney
      @JohnWatsonRooney Před 2 dny

      this is pretty much it!

    • @HitAndMissLab
      @HitAndMissLab Před 14 hodinami

      @@RicardoPorteladaSilva what is the advantage of using Django DB?

  • @re1n751
    @re1n751 Před 3 dny

    Dude thankyou so much❤

  • @pkavenger9990
    @pkavenger9990 Před 3 dny

    Your content is good but i think you should engage with your audience more instead of speaking like you are talking to yourself. You will see that you will get much more views. Take Gotham chess channel for example he is not a Grandmaster of chess but His channels have more views and subscriber than Hikaru and Magnus because of his communication skills.

  • @bathuudamdin
    @bathuudamdin Před 4 dny

    Hi John, Is there any way to get around cloudflare protected api and get json data in python.

  • @p0tv319
    @p0tv319 Před 4 dny

    I have the issue with cookie consent prompt? How could i solve this? in every new context it will pop up....

  • @jw832
    @jw832 Před 4 dny

    can you make a request based auto product buyer instead and show us how to do that?

  • @merttarm848
    @merttarm848 Před 5 dny

    thanks for the video, amazing introduction to webscraping

  • @Chiramisudo
    @Chiramisudo Před 5 dny

    Unfortunately, I don't know how to translate this to JavaScript, so sadly, it's not very helpful to me. 😢

  • @user-fv1576
    @user-fv1576 Před 6 dny

    Title is misleading . The excel files I see are monsters in terms of complexity . Your example is seriously simplistic . 😊

  • @yellowboat8773
    @yellowboat8773 Před 6 dny

    Is it one core per instance?

  • @atulraaazzz2931
    @atulraaazzz2931 Před 7 dny

    Any amazon automation

  • @thecozyplace1206
    @thecozyplace1206 Před 7 dny

    Unfortunately, what I understand from this is that it only works for browsers... The company I worked at had some sort of CRM, very old, very laggy at every button press. With tons of pages and small fields to fill up. I REALLY wish I could find a way to automate the data entry of that one... It would replace all backoffices data entry people of italy xD

    • @JohnWatsonRooney
      @JohnWatsonRooney Před 7 dny

      You could try pyautogui- not sure how reliable it would be for you but might help in some way?

  • @hi_nesh
    @hi_nesh Před 8 dny

    Honestly, This channel is marvelous. It has helped me a lot. 'a lot' is even an understatement

  • @4BroGame
    @4BroGame Před 8 dny

    Hey bro I cloned a website and now I am opening that website code in vs code editor but after doing the necessary editing only text is changing not the images. Like I am putting my image URL on the place of website image URL but after saving it and opening it with live server the preview is showing me the images of cloned website not mine and in inspect element it is showing the image code of cloned website not mine why. I am trying from 6 hours and nothing is works for me. Will you plz tell me how can I change the images and edit it. Is this any api who is sending data from the backend.

  • @raymondnepomuceno8815

    Great content john, new subscriber here

  • @sagedoescode
    @sagedoescode Před 9 dny

    I also like non scraping content, keep it up

  • @hugohoyzer2202
    @hugohoyzer2202 Před 9 dny

    👌

  • @lucasseagull8282
    @lucasseagull8282 Před 9 dny

    By Stock, I undestand something else - would be great to see some python automation on stock market.

    • @JohnWatsonRooney
      @JohnWatsonRooney Před 9 dny

      Yes my bad sorry, I can see it wasn’t clear

    • @lucasseagull8282
      @lucasseagull8282 Před 9 dny

      @@JohnWatsonRooney no problem and thank you for your work - looking forward to the next videos.

  • @xguns6418
    @xguns6418 Před 9 dny

    what python website you are using ?

  • @arsenalman30
    @arsenalman30 Před 9 dny

    would be good to do an example where there isn't a schema on the webpage as more websites are now not using that

  • @junaidmughal3806
    @junaidmughal3806 Před 9 dny

    for me nothing beats monokai

  • @domenechj
    @domenechj Před 9 dny

    Great video as always!Have you tried crawlee for Python?

    • @JohnWatsonRooney
      @JohnWatsonRooney Před 9 dny

      I haven't - is it worth looking at? have you tested it?

  • @SAMWICK-fl1hi
    @SAMWICK-fl1hi Před 9 dny

    i had this error b'400 - Bad request'

  • @Frankie_Freedom
    @Frankie_Freedom Před 9 dny

    I kept getting a not defined on the line where it says "beer_list = [ ]" which I noticed you didn't why would that have happened?

  • @citizen320
    @citizen320 Před 10 dny

    Can you help me write a python script that will make a ppt that describes all the steps and processes for the SDLC of a team of 100 IT professionals? Thanks

  • @arpitakar3384
    @arpitakar3384 Před 10 dny

    The Zorr of Web Scraping.. Thanks 😊 giving this to us

  • @zedzpan
    @zedzpan Před 10 dny

    Thank you for this. Learnt so much. The try exception in the function helped a lot as well.

  • @MalikFaragalla
    @MalikFaragalla Před 10 dny

    Amazing

  • @hamzaehsankhan
    @hamzaehsankhan Před 11 dny

    Great stuff

  • @899
    @899 Před 12 dny

    Just found your channel. Can’t wait to take a deep-dive. I have been automating and scraping for 20+ years and I’m hoping joining a community will step up my game into the future. Looking forward to learning w/ you.

  • @RatoCanguru_Lucas
    @RatoCanguru_Lucas Před 12 dny

    Man, this is gold. Thanks for sharing!

  • @christophersmith1640
    @christophersmith1640 Před 12 dny

    You know if you clicked Preview instead of response it would have formated the json without having to go to a website

  • @michamr-o6960
    @michamr-o6960 Před 12 dny

    Very nice job. Good luck.

  • @randcoding
    @randcoding Před 12 dny

    installing scrapy-playwright using pipx on linux is causing issues, this is the error: Traceback (most recent call last): File "/home/user/.local/share/pipx/venvs/scrapy/lib/python3.12/site-packages/twisted/internet/defer.py", line 1999, in _inlineCallbacks result = context.run( File "/home/user/.local/share/pipx/venvs/scrapy/lib/python3.12/site-packages/twisted/python/failure.py", line 519, in throwExceptionIntoGenerator return g.throw(self.value.with_traceback(self.tb)) File "/home/user/.local/share/pipx/venvs/scrapy/lib/python3.12/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_request return (yield download_func(request=request, spider=spider)) File "/home/user/.local/share/pipx/venvs/scrapy/lib/python3.12/site-packages/scrapy/utils/defer.py", line 81, in mustbe_deferred result = f(*args, **kw) File "/home/user/.local/share/pipx/venvs/scrapy/lib/python3.12/site-packages/scrapy/core/downloader/handlers/__init__.py", line 83, in download_request raise NotSupported( scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': No module named 'scrapy_playwright'