Run Your Web Scraper Automatically Once a DAY

John Watson Rooney

zhlédnutí 29 370

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 17. 11. 2020
In this video i'll show you one way of running your web scrapers automatically in the cloud, using cronjobs. we utilise a linux vm from Digital Ocean and download and run our code at a set interval. I cover creating a droplet, downloading code from git and isntalling requirements.
Code & Commands: github.com/jhnwr/whiskey-cronjob
Digital Ocean (Affiliate Link) - m.do.co/c/c7c90f161ff6
Crontab Guru - crontab.guru/
-------------------------------------
Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
-------------------------------------
Sound like me:
microphone amzn.to/36TbaAW
mic arm amzn.to/33NJI5v
audio interface amzn.to/2FlnfU0
-------------------------------------
Video like me:
webcam amzn.to/2SJHopS
camera amzn.to/3iVIJol
lights amzn.to/2GN7INg
-------------------------------------
PC Stuff:
case: amzn.to/3dEz6Jw
psu: amzn.to/3kc7SfB
cpu: amzn.to/2ILxGSh
mobo: amzn.to/3lWmxw4
ram: amzn.to/31muxPc
gfx card amzn.to/2SKYraW
27" monitor amzn.to/2GAH4r9
24" monitor (vertical) amzn.to/3jIFamt
dual monitor arm amzn.to/3lyFS6s
mouse amzn.to/2SH1ssK
keyboard amzn.to/2SKrjQA
Věda a technologie

Komentáře • 67

@alejandrofrank7900 Před 3 lety ⁺¹²
Oh man, this was a life saver. I really enjoy your series, everytime I need to scrape something I know you must have done something similar!
@ShenderRamos Před rokem ⁺²
John you’re great, been watching your content for the past few days, you explain everything so well and show all different scenarios for beginners or more advanced, ive a few years of experience with python and scraping but still learning a lot from you 🙏🏾
@huzaifaameer8223 Před 3 lety ⁺⁴
Yo man really really really appreciated!💚
U fulfilled my 2nd request also!
Keep posting quality content like this!
Looking forward for Rest APIs! With django and react!
@tubelessHuma Před 3 lety ⁺¹
Thanks a lot John to fulfill the request so soon. Really appreciated. Keep growing dear.❤👍
@tdye Před 3 lety ⁺⁴
I loved all of this. I even learned a bit about Digital Ocean that was wayyy better than most tutorials out there. Thank you so much
@JohnWatsonRooney Před 3 lety
Your welcome I’m glad you liked it
@thatguy6664 Před 3 lety ⁺⁷
Thank you! Deployment is my biggest obstacle now but videos like this really help. You covered ssh, deployment and cron jobs in less 14 minutes - incredible!
@JohnWatsonRooney Před 3 lety ⁺¹
Thanks! Glad you found it helpful!
@yaish9547 Před 2 lety ⁺²
excellent stuff John . Concise and to the point . Top quality . Thanks alot !
@JohnWatsonRooney Před 2 lety ⁺¹
Thank you!
@boiboi1988 Před 2 lety
Thanks John. You helped me on this one again for my school thesis work. :)
@christopherpage327 Před 3 lety ⁺¹
This content is gold, it puts everything into perspective.
@JohnWatsonRooney Před 3 lety
Thanks very kind!
@YahiaHegazy Před 3 lety ⁺¹
I greatly appreciate you uploading this video. Also thank you for the cool link!
@JohnWatsonRooney Před 3 lety
Thanks!
@EPguitars Před rokem ⁺¹
Man! Linus bless you! It was short and very helpful for me, thanks!
@nishchalparne3436 Před 3 lety ⁺²
DUDE YOU WILL REACH MILLION SUBS VERY FAST FOR SURE!!! IT WILL BE GREAT IF YOU MAKE FULL COURSE ON SCRAPING AND ML!!!
@wkowalski Před 3 lety ⁺¹
Just what I was hoping for... Thanks very much for this! You're awesome.
@JohnWatsonRooney Před 3 lety
You're very welcome!
@wkowalski Před 3 lety
@@JohnWatsonRooney any chance of you doing a followup on how to run selenium on DigitalOcean?
@IanDangerfield Před 2 lety ⁺¹
Thanks for this, answered a question I had.
@testdeckel4752 Před rokem ⁺¹
Great video, very compact! Thank you so much
@JohnWatsonRooney Před rokem
Thanks for watching!
@saifashraf2135 Před 6 měsíci
Perfect video.
@ghaithmoe9573 Před 3 lety ⁺¹
What a great video !
@JohnWatsonRooney Před 3 lety
Thank you!
@joseignaciosolorzanosilva784 Před 3 lety ⁺¹
THANK YOU!
@kaybecking2244 Před 2 lety
Nice vid!
@ed-salinas-97 Před 3 lety ⁺¹
I'm a little bit new to Linux commands, but I wasn't getting the cron.log file to show up in my Home directory. I ended up having to change the permissions of my Home directory, and then it worked. I was using an Ubuntu VM instance on GCP, though. Not sure if that makes a difference from Digital Ocean.
@marcusmerc615 Před 3 lety ⁺¹
Hi sir. I have an app scraper. My app scrapes news site and extract urls, writes to .txt file. I deployed my app to Heroku. But Heroku doesn't have file system and doesn't update .txt file. Can you show methods connect databases and Heroku? For exmple external clouds or Postgres.
@j4ck3 Před 11 měsíci ⁺¹
my scraper is done with nodejs so not the same process but still very helpful. thanks!
@santisaldivar Před rokem
Hey John,
I hope this message finds you well. While I performed the apt upgrade towards the end I received a message reading "Daemons using outdated libraries" I just hit entered and the pink looking screen went away. Is this something you think I should look into?
@shahraanhussain7465 Před rokem ⁺¹
Awesome video
@JohnWatsonRooney Před rokem
Thanks!
@shahraanhussain7465 Před rokem
@@JohnWatsonRooney could you please make a video on python selenium stale exception which occurs after using driver.back()
@MythicalMysteryQuest Před rokem
Does this digital ocean support chrome-webdriver to scrape the website using selinium?
@main5344 Před 7 měsíci
John, it wants me to be in a venv when inputting ur code at 6:18 . This hasn't happen previously, like it is in ur vid, is this new?
@arturoisraelperezvargas7261 Před 2 lety
Thanks and
Do you have a video where do you use Google big Query?
@itsaj007 Před rokem ⁺¹
quick question, i have a web scraper script that works on local computer but not on a rdp/vps, tried different ip etc. but no luck
@JohnWatsonRooney Před rokem
hard to say without knowing what the error is, when you say different IPs did you mean VPS IPs or proxies? If its not that its usually an environment issue, like different python versions, or env variables
@gotenksjd Před 2 lety
John hi! One question... when I run the script manually it runs ok, but when it is schedulled to run in the crontab it is not running. The paths are complete. What could I be missing?
@gotenksjd Před 2 lety ⁺¹
The path for the creds file inside must also have the path of the linux server too. Solved it!😊
@alexmulo Před 3 lety ⁺¹
Hi John, why do you prefer the cloud over a local solution using a raspberry pi for example? Thanks
@JohnWatsonRooney Před 3 lety
Generally yes, as I don’t have to worry about it as it’s all managed. I do have a Pi video coming though!
@alexmulo Před 3 lety
@@JohnWatsonRooney is there any particular technical reason why you used digitalocean over a dedicated scraping host? I am asking this to understand which option will be more suitable for my use/approach
@user-vg4kj7mx2z Před 3 lety ⁺¹
hi John ! Thanks for video ! I have a question about mail, its havto 'gmail' or another type of email ?
@JohnWatsonRooney Před 3 lety ⁺¹
You can use any email, I just find it easiest to use a new gmail account for setting up
@kaladappanimi4269 Před 3 lety ⁺¹
Hi John great video as usual but I'm having problems scrapping a table of a sport site. I have tried selenium i have managed to get the first table....
There are two table with identical class name...
I have tried indexing but it's seems not to be working
@JohnWatsonRooney Před 3 lety
Have you tried opening with selenium then saving the html by page_source? This will give you the html which would be easier to get the data from with bs4 - I did a video on it not long ago. Thanks!
@kaladappanimi4269 Před 3 lety
@@JohnWatsonRooney Please post the link to that exact video..
Thank you
@M3ntalMaze Před 2 lety ⁺¹
Anyway of doing this with Heroku?
@JohnWatsonRooney Před 2 lety
I don’t think heroku allows cron on its free tier so you may need to upgrade
@emadkamel1961 Před rokem ⁺¹
This is great and valuable lesson.
However, what if I would like to run my scraper code without turning on my machine / laptop
Is it possible to integrate my code some how into my Wordpress site for example? Or get it to run in the cloud? If yes, can you kindly elaborate and or share the link of your relevant tutorial.
Looking forward to hearing back from you.
@JohnWatsonRooney Před rokem ⁺¹
Hey, yes you can absolutely run in the cloud. I use digital ocean to run a Linux machine in the cloud that I run scripts on a cron job
@emadkamel1961 Před rokem ⁺¹
@@JohnWatsonRooney Thank you for getting back to me.
Is there a way to integrate or house the code in a WordPress website?
@JohnWatsonRooney Před rokem ⁺¹
@@emadkamel1961 Not that I know of no - if you had access to the server wordpress was running on you could run the code but it wouldn't be related to the wordpress site itself
@emadkamel1961 Před rokem
@@JohnWatsonRooney Thanks again.
@huzaifaameer8223 Před 3 lety ⁺¹
Hey man can please make a video on how to make a csv file of scraped data with name as current date or time?
@JohnWatsonRooney Před 3 lety ⁺¹
Sure I’ll look into it
@huzaifaameer8223 Před 3 lety
@@JohnWatsonRooney thanks🥳
@Cephasprincely Před 2 lety ⁺²
Bro is it possible to use this method in windows?
@JohnWatsonRooney Před 2 lety ⁺¹
Yes, it’s called scheduled tasks in windows - but I’ve never used it so I’m afraid I don’t know how it compares or how to use it
@Cephasprincely Před 2 lety ⁺¹
@@JohnWatsonRooney oh I would check that😃😃
@StraightCoding Před 2 lety ⁺¹
Interesting Video
@aaronbell759 Před rokem ⁺¹
For some reason I can't get my cronjobs to work or test correctly on my AWS EC2 instance. I have this setup and the test file isn't updating.
From the top directory on EC2, I have to navigate to a home folder, then an ubuntu folder, then in ubuntu folder is my working directory where I have a test.py script and a test cronlog file. I also have a main folder for the actual project I have trying to schedu.e
Any tips? Below is my test setup
* * * * * usr/bin/python3 /home/ubuntu/test.py >> /home/ubuntu/testcron123.log
@JohnWatsonRooney Před rokem
Not sure specifically but I would check user permissions on the folder and check which users crontab you are using, I know users and permissions tripped me up before
@aaronbell759 Před rokem
@@JohnWatsonRooney turns out i didnt have a forward slash in front of the usr in the path to the python interpreter... 5 hours of looking into this, haha.

Další v pořadí

Automatické přehrávání

Best Web Scraping Combo? Use These In Your Projects