- 279
- 7 226 201
John Watson Rooney
United Kingdom
Registrace 30. 10. 2019
Let's learn about Python, web scraping and API's!
Every Web Scraper should know THIS
➡ WORK WITH ME
johnwr.com
➡ COMMUNITY
discord.gg/C4J2uckpbR
www.patreon.com/johnwatsonrooney
➡ PROXIES
proxyscrape.com/?ref=jhnwr
www.scrapingbee.com?fpr=jhnwr
➡ HOSTING
m.do.co/c/c7c90f161ff6
If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and web content as much as I do, you can subscribe for weekly content.
⚠ DISCLAIMER
Some/all of the links above are affiliate links. By clicking on these links I receive a small commission should you chose to purchase any services or items.
johnwr.com
➡ COMMUNITY
discord.gg/C4J2uckpbR
www.patreon.com/johnwatsonrooney
➡ PROXIES
proxyscrape.com/?ref=jhnwr
www.scrapingbee.com?fpr=jhnwr
➡ HOSTING
m.do.co/c/c7c90f161ff6
If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and web content as much as I do, you can subscribe for weekly content.
⚠ DISCLAIMER
Some/all of the links above are affiliate links. By clicking on these links I receive a small commission should you chose to purchase any services or items.
zhlédnutí: 1 210
Video
You're missing out if you don't use THESE
zhlédnutí 1,6KPřed 4 hodinami
Check Out ProxyScrape here: proxyscrape.com/?ref=jhnwr ➡ WORK WITH ME johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and we...
Python Automation- product checker and buyer
zhlédnutí 1,8KPřed 21 hodinou
Check Out ProxyScrape here: proxyscrape.com/?ref=jhnwr ➡ WORK WITH ME johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and we...
How to Scrape 4 Sites with 1 Script (code along)
zhlédnutí 3KPřed 14 dny
Check Out ProxyScrapy here: proxyscrape.com/?ref=jhnwr ➡ WORK WITH ME johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and we...
My System for Easily Scraping 150k Items from the web
zhlédnutí 3,2KPřed 21 dnem
Use JWR at checkout to get 2GB of proxies for free: go.nodemaven.com/scrapingproxy ➡ E-commerce Data Extraction Specialist johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES go.nodemaven.com/scrapingproxy ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data ...
How much slower is Playwright at Scraping?
zhlédnutí 2KPřed 21 dnem
➡ E-commerce Data Extraction Specialist johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES nodemaven.com/?a_aid=JohnWatsonRooney ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like programming and web content as much a...
The Simple Automation Script my Colleagues Loved.
zhlédnutí 3,5KPřed měsícem
The first 500 people to use my link skl.sh/johnwatsonrooney06241 will get a 1 month free trial of Skillshare premium! This video is sponsored by Skillshare johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES www.scrapingbee.com/?fpr=jhnwr proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python develo...
Scraping 7000 Products in 20 Minutes
zhlédnutí 4,1KPřed měsícem
Go to proxyscrape.com/?ref=jhnwr for the Proxies I use. johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES www.scrapingbee.com/?fpr=jhnwr proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If you like p...
How I Scrape 7k Products with Python (code along)
zhlédnutí 8KPřed měsícem
A short but complete project of scraping 7k products with Python. johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES www.scrapingbee.com/?fpr=jhnwr proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data space. I specialize in data extraction and automation. If ...
This will change Web Scraping forever.
zhlédnutí 8KPřed 2 měsíci
What to try this yourself? Sign up at www.zyte.com/ and use code JWR203 for $20 for free each month for 3 months. Limited availability first come first serve. Once you have created an account enter the coupon code JWR203 under settings, subscriptions, modify & enter code. Zyte gave me access to their API and NEW AI spider tech to see how it compares to scraping manually, with incredible results...
The most important Python script I ever wrote
zhlédnutí 170KPřed 2 měsíci
The story of my first and most important automation script, plus an example of what it would look like now. ✅ WORK WITH ME ✅ johnwr.com ➡ COMMUNITY discord.gg/C4J2uckpbR www.patreon.com/johnwatsonrooney ➡ PROXIES www.scrapingbee.com/?fpr=jhnwr proxyscrape.com/?ref=jhnwr ➡ HOSTING m.do.co/c/c7c90f161ff6 If you are new, welcome. I'm John, a self taught Python developer working in the web and data...
Why I chose Python & Polars for Data Analysis
zhlédnutí 5KPřed 3 měsíci
To try everything Brilliant has to offer-free-for a full 30 days, visit brilliant.org/JohnWatsonRooney/ . You’ll also get 20% off an annual premium subscription. This video was sponsored by Brilliant join the Discord to discuss all things Python and Web with our growing community! discord.gg/C4J2uckpbR Work with me: johnwr.com If you are new, welcome! I am John, a self taught Python developer w...
The Best Tools to Scrape Data in 2024
zhlédnutí 7KPřed 3 měsíci
Python has a great ecosystem for webscraping and in this video I run through the packages I use everyday to scrape data. Join the Discord to discuss all things Python and Web with our growing community! discord.gg/C4J2uckpbR If you are new, welcome! I am John, a self taught Python developer working in the web and data space. I specialize in data extraction and JSON web API's both server and cli...
Is Your Scraper Slow? Try THIS Simple Method
zhlédnutí 5KPřed 3 měsíci
Get Proxies from Nodemaven Now: go.nodemaven.com/scrapingproxy Use Code: JWR for 2 GB on purchase Threads and parallel processing are still useful for scraping, even though most of the waiting is I/O which is best served by async, it still can make your code much faster in the right situations, and is very simple to implement. Join the Discord to discuss all things Python and Web with our growi...
Scraping with Playwright 101 - Easy Mode
zhlédnutí 8KPřed 3 měsíci
Playwright is an incredible versatile tool for browser automation, and in this video I run thorugh a simple project to get you up and running scraping data with PW & Python Join the Discord to discuss all things Python and Web with our growing community! discord.gg/C4J2uckpbR If you are new, welcome! I am John, a self taught Python developer working in the web and data space. I specialize in da...
Cleaning up 1000 Scraped Products with Polars
zhlédnutí 5KPřed 4 měsíci
Cleaning up 1000 Scraped Products with Polars
This is a Scraping Cheat Code (for certain sites)
zhlédnutí 4,5KPřed 4 měsíci
This is a Scraping Cheat Code (for certain sites)
Let me explain my new Rust love affair..
zhlédnutí 982Před 4 měsíci
Let me explain my new Rust love affair..
Stop Wasting Time on Simple Excel Tasks, Use Python
zhlédnutí 10KPřed 5 měsíci
Stop Wasting Time on Simple Excel Tasks, Use Python
The HTML Element I check FIRST when Web Scraping
zhlédnutí 2,8KPřed 5 měsíci
The HTML Element I check FIRST when Web Scraping
Try this SIMPLE trick when scraping product data
zhlédnutí 3,8KPřed 5 měsíci
Try this SIMPLE trick when scraping product data
I had no idea you could scrape this site this way
zhlédnutí 4,4KPřed 6 měsíci
I had no idea you could scrape this site this way
This is the ONLY way I'll use Selenium now
zhlédnutí 7KPřed 7 měsíci
This is the ONLY way I'll use Selenium now
Scraping HTML Tables VS Dynamic JavaScript Tables
zhlédnutí 3,6KPřed 8 měsíci
Scraping HTML Tables VS Dynamic JavaScript Tables
Webscraping with Python How to Save to CSV, JSON and Clean Data
zhlédnutí 5KPřed 8 měsíci
Webscraping with Python How to Save to CSV, JSON and Clean Data
Hi Jhon! I have a question, following your hiden api videos and some others, i finally finished a project that creates datasets with walmart products based on whatever the user wants the dataset to be about. I did this project using their hidden api, creating datasets that can get pretty big (15000 products), but for every dataset, i have to make around 100 and 200 get requests in order to get all the products. Is this legal/ethical to put it in my curriculum or in a linkedin post as a personal project even though in the walmart website says that they do not allow web scraping?
amazing explanation. thanks
This is gold
Funcionaría junto con selenium?
John please make it longer nd scrap data in csv file nd please use undetected or captcha solver methods to scrap data please. I love your videos. John❤
I'm sure you know this by now, but the Polars read_csv method supports glob patterns, so the loop approach is unnecessary - you can simply pass _folder + "/*.csv"_ as the source parameter and the concatenation will be done automatically.
Yea you are right that’s much better thanks for clarifying!
How to solve this error? ModuleNotFoundError Traceback (most recent call last) Cell In[1], line 1 ----> 1 from selenium import webdriver 3 url = 'www.youtube.com/@JohnWatsonRooney/videos' 5 driver = webdriver.Chrome() ModuleNotFoundError: No module named 'selenium'
Thaaank you so much 🙏🏽🙏🏽🙏🏽, I've really learned too much from your videos. What if the api is protected by Cloudflare and sometime it gives unauthorized, is there a solution?
Once you have the cookies you should be good, you’ll need to refresh them every so often, either manually or by using an undetected browser/captcha solver
@@JohnWatsonRooney I am trying to visit the api from the browser and it give me unauthorized, I think it is not from the cookies. I can share with you the link to test.
Hey dude jsut stared my first internhsip, and this video has been immensly helpful! I really appreicate the effort put in and all the useful tips. Thanks!
What if we want to take a full page so we can give it to an LLM to parse? For example, what if we were parsing financial filings or contracts. We want chunks or pages to pass to an LLM to structure outputs. I think splitting the text on a tag and then joining the items together would be best, but maybe there is a better way.
Do you have any videos on how to use proxies in Python?
I don’t specifically but that’s a good idea I will create a video on proxies inc how to use
Good one
I hope you have a course
Great content, very useful now that I am learning about this subject. You earned a new sub here
Great point about separation of concerns. As you stated, the scraper should only be concerned with getting data and saving data. I am curious what other use cases would be compatible with scrapy’s pipelines. Would pipelines be a good place for things like “save to this OTHER database”, or “upload to S3”, or “ping this api”? Will be diving into this myself soon but curious about your thoughts here.
yes absolutely, you could use an item field to decide whether to upload to X DB or Y DB, and certainly uploading to S3 would come here too. pinging an API you mean like to notify another system? I think that would be a great use case for pipelines (not thought of that before)
What’s wrong with scrapping them as string and change them in the end in your output file?
Hi Johne i have a question can we use scrapy with django? i mean make the webscraper as online tool
I think you could create script to scrape separately and load de result to django databases. The processing occurs in separated moments. I hope you understand my English, I'm from Brazil, learning English. if you need more specific please feel free to getting in touch. its a great pleasure to help you
this is pretty much it!
@@RicardoPorteladaSilva what is the advantage of using Django DB?
Dude thankyou so much❤
No problem 👍
Your content is good but i think you should engage with your audience more instead of speaking like you are talking to yourself. You will see that you will get much more views. Take Gotham chess channel for example he is not a Grandmaster of chess but His channels have more views and subscriber than Hikaru and Magnus because of his communication skills.
Fair point thanks for the advice
Hi John, Is there any way to get around cloudflare protected api and get json data in python.
I have the issue with cookie consent prompt? How could i solve this? in every new context it will pop up....
can you make a request based auto product buyer instead and show us how to do that?
thanks for the video, amazing introduction to webscraping
Unfortunately, I don't know how to translate this to JavaScript, so sadly, it's not very helpful to me. 😢
Title is misleading . The excel files I see are monsters in terms of complexity . Your example is seriously simplistic . 😊
Is it one core per instance?
Any amazon automation
Unfortunately, what I understand from this is that it only works for browsers... The company I worked at had some sort of CRM, very old, very laggy at every button press. With tons of pages and small fields to fill up. I REALLY wish I could find a way to automate the data entry of that one... It would replace all backoffices data entry people of italy xD
You could try pyautogui- not sure how reliable it would be for you but might help in some way?
Honestly, This channel is marvelous. It has helped me a lot. 'a lot' is even an understatement
Hey bro I cloned a website and now I am opening that website code in vs code editor but after doing the necessary editing only text is changing not the images. Like I am putting my image URL on the place of website image URL but after saving it and opening it with live server the preview is showing me the images of cloned website not mine and in inspect element it is showing the image code of cloned website not mine why. I am trying from 6 hours and nothing is works for me. Will you plz tell me how can I change the images and edit it. Is this any api who is sending data from the backend.
Great content john, new subscriber here
thanks, welcome
I also like non scraping content, keep it up
👌
By Stock, I undestand something else - would be great to see some python automation on stock market.
Yes my bad sorry, I can see it wasn’t clear
@@JohnWatsonRooney no problem and thank you for your work - looking forward to the next videos.
what python website you are using ?
would be good to do an example where there isn't a schema on the webpage as more websites are now not using that
for me nothing beats monokai
Great video as always!Have you tried crawlee for Python?
I haven't - is it worth looking at? have you tested it?
i had this error b'400 - Bad request'
I kept getting a not defined on the line where it says "beer_list = [ ]" which I noticed you didn't why would that have happened?
Can you help me write a python script that will make a ppt that describes all the steps and processes for the SDLC of a team of 100 IT professionals? Thanks
The Zorr of Web Scraping.. Thanks 😊 giving this to us
Thank you for this. Learnt so much. The try exception in the function helped a lot as well.
Amazing
Great stuff
Just found your channel. Can’t wait to take a deep-dive. I have been automating and scraping for 20+ years and I’m hoping joining a community will step up my game into the future. Looking forward to learning w/ you.
Thank you and welcome in!
Man, this is gold. Thanks for sharing!
You know if you clicked Preview instead of response it would have formated the json without having to go to a website
Very nice job. Good luck.
installing scrapy-playwright using pipx on linux is causing issues, this is the error: Traceback (most recent call last): File "/home/user/.local/share/pipx/venvs/scrapy/lib/python3.12/site-packages/twisted/internet/defer.py", line 1999, in _inlineCallbacks result = context.run( File "/home/user/.local/share/pipx/venvs/scrapy/lib/python3.12/site-packages/twisted/python/failure.py", line 519, in throwExceptionIntoGenerator return g.throw(self.value.with_traceback(self.tb)) File "/home/user/.local/share/pipx/venvs/scrapy/lib/python3.12/site-packages/scrapy/core/downloader/middleware.py", line 54, in process_request return (yield download_func(request=request, spider=spider)) File "/home/user/.local/share/pipx/venvs/scrapy/lib/python3.12/site-packages/scrapy/utils/defer.py", line 81, in mustbe_deferred result = f(*args, **kw) File "/home/user/.local/share/pipx/venvs/scrapy/lib/python3.12/site-packages/scrapy/core/downloader/handlers/__init__.py", line 83, in download_request raise NotSupported( scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': No module named 'scrapy_playwright'