Beginners Guide To Web Scraping with Python - All You Need To Know

Sdílet
Vložit
  • čas přidán 23. 07. 2024
  • The web is full of data. Lots and lots of data. Data prime for scraping. But manually going to a website and copying and pasting the data into a spreadsheet or database is tedious and a time consuming. Enter web scraping! This guide will show you how to get started in scraping web data to your hearts content in 8 minutes!
    _____________________________
    📲🔗🔗📲 IMPORTANT LINKS 📲🔗🔗📲
    _____________________________
    • 💻PROJECT PAGE💻 - github.com/gigafide/basic_pyt...
    • Python 3 - www.python.org/downloads/
    • BeautifulSoup - www.crummy.com/software/Beaut...
    • Scraper Testing Website - quotes.toscrape.com/
    • Thonny - thonny.org/
    _____________________________
    📢📢📢📢 Follow 📢📢📢📢
    ____________________________
    redd.it/5o3tp8
    / tinkernut_ftw
    / tinkernut
    / tinkernut
    00:00 Introduction
    00:42 Setup
    01:16 Background
    02:23 Legality Concerns
    02:51 Writing The Code
    06:47 Conclusion

Komentáře • 178

  • @michaelmagill5466
    @michaelmagill5466 Před 2 lety +112

    This editing is fantastic, the explanations are clear and concise and completely without obfuscation. You, sir, are a gentleman.

    • @chanson8508
      @chanson8508 Před 5 měsíci +1

      Big faxxx! so many nonsense intro to scraping vids, but not this one : ))

    • @Greshma123
      @Greshma123 Před 3 měsíci

      I’m sorry 😢 I’m not going

    • @SonicFusedWith_Goku
      @SonicFusedWith_Goku Před 2 měsíci

      Bro this is crazy

    • @SonicFusedWith_Goku
      @SonicFusedWith_Goku Před 2 měsíci

      I was trying to make a code to get stuff from my math homework website

  • @benjaminofurhie8178
    @benjaminofurhie8178 Před měsícem +4

    I have searched for scraping tutorials for the last one month, but this is the BEST .Thanks so much

  • @JoaquinRoibal
    @JoaquinRoibal Před rokem +25

    Great introduction. Clear, concise and covered related topics without being distracting. I look forward to your other videos on Python.

  • @Sivarajansam931
    @Sivarajansam931 Před 2 lety +66

    When world needed him the most, He returned.

  • @lemonbread378
    @lemonbread378 Před rokem +6

    currently planning for my computer science A level project and wanted to learn what this web scraping thingamejiggy was all about
    this video was an amazing introduction! simple, clear, but not over proffessional
    didn't leave me feeling overwhelmed, and i'm going to watch more of your tuts now, cheers mate!

  • @sauceboss38
    @sauceboss38 Před 2 lety +14

    This is exactly what I was looking for. Very concise and helpful, thank you!

  • @algj
    @algj Před 2 lety +3

    This is crazy to see your videos again being recommended :o
    it has been years since I saw your last video!

  • @JccChanco
    @JccChanco Před měsícem

    So far in my life, this has been the smoothest learning process I have ever experienced. Thank you kind sir!

  • @HayCorvus
    @HayCorvus Před 3 měsíci +1

    I grew up in the early youtube days. I was a enamored by the computers knowledge that I could only get from channels like Tinkernut. There really was no schools that offered nuanced coding/web lessons when I was growing up. It wasn't until I went to college and got my degree in Computer Science that I'd be able to build a foundation in computational theory and all sorts of other fun subjects related to computers.
    Thanks for helping me along the way to that journey, Tinker!

  • @Squid666
    @Squid666 Před 3 měsíci

    I always end up back here when I need a refresher on scraping ❤ thank you!

  • @renaaaa05
    @renaaaa05 Před 3 dny

    I was given a task in my internship that involved web scraping and this was very helpful, thank you!

  • @proxyscrape
    @proxyscrape Před rokem +1

    I love that you used a Raspberry Pi in this tutorial. It's amazing to mess around on and do little experiments.

  • @benjaminblack8653
    @benjaminblack8653 Před 2 lety +7

    So glad to see you posting again! I missed your videos so much. I believe my first video of yours was either How to Setup a Webserver or How to Make an Operating System. Both excellent videos!

  • @wrzq
    @wrzq Před 6 měsíci

    Beautiful tutorial, exactly what I've been looking for. Thanks a lot, Man!

  • @htstube1
    @htstube1 Před rokem +1

    great video! seems very straight forward and easy to follow. I will be trying it out in the next day or two

  • @YeshuaIsTheTruth
    @YeshuaIsTheTruth Před rokem

    These are the kinds of programming videos we need!

  • @goodbook6865
    @goodbook6865 Před rokem

    Awesome video! Short and to the point. Thank you!

  • @santiagoSosaH
    @santiagoSosaH Před 2 lety

    wooooow it's been years that i didn't see a video about tinkernut. i think about 10 years ago i learned sql and php with your tutorial about making a webpage with users passwords etc.
    man so nice to see a video of you.

  • @Geeksmithing
    @Geeksmithing Před 2 lety +1

    Hey man, this is great!! Happy to another video from ya!

  • @webslinger2011
    @webslinger2011 Před 2 lety +25

    Your technological code geniusness shall be added to my own. Seriously looking for this. Thanks!

  • @TheJoyOfGaming
    @TheJoyOfGaming Před 2 lety +5

    haha awesome man. I don't even do coding but couldn't resist following along just to try it! Cheers!

  • @lundebc
    @lundebc Před 2 lety +1

    Thanks for this tutorial, Looking forward to the next part.

  • @dugumayeshitla3909
    @dugumayeshitla3909 Před rokem

    One of my favorite channels for learning ... you rock

  • @kedrovasuma2857
    @kedrovasuma2857 Před 2 lety +17

    This smart man is still alive

    • @ten132
      @ten132 Před 2 lety

      I was abput to comment the same lmao.

  • @colinbrown6629
    @colinbrown6629 Před 2 měsíci

    Amazing video to get you started with scraping, thanks!

  • @Syndesi
    @Syndesi Před 2 lety +13

    cool tutorial :D
    for more complicated data I use xpath, although its syntax is a bit weird at first.
    furthermore: validate, validate and validate your data. you do not want a program which crashes randomly, only because a value is missing, empty or malformed :)

  • @teomanefe
    @teomanefe Před 2 lety +5

    I actually needed this!

  • @bng3832
    @bng3832 Před 2 lety +1

    I swear to god you are the best!
    I know see why youtube dont recommend great videos. Its because youtube dont want people to study tech!!

  • @donsurlylyte
    @donsurlylyte Před 2 lety +1

    dude, that intro proves you have a bright future in infomercials!

  • @gamerguy9533
    @gamerguy9533 Před 3 měsíci

    Thanks! Super basic but it was what I needed to make my code start working!

  • @mudasir2168
    @mudasir2168 Před rokem

    Awesome stuff.....much appreciated!

  • @pulp6667
    @pulp6667 Před 2 lety

    Thank you for this video I created another scraper for eth, it's rough but it's my first and I am so happy

  • @mrklean0292
    @mrklean0292 Před 3 měsíci

    Man... I've seen other web scraping tutorials and they take you ten miles down the road and through all types of advanced garbage at you. Granted, I know what you have shown here is the quick and easy way, but that's all I have wanted to get an understanding of, what it is, and how it basically works. Thank you.

  • @NasimKhan-tk3ij
    @NasimKhan-tk3ij Před 11 měsíci

    Overall, I highly recommend this video to anyone who is interested in learning Python. It is a comprehensive and informative resource that will teach you need to know to get started with this powerful programming language.

  • @InspiredInsights4U
    @InspiredInsights4U Před 2 lety +4

    A survey businessman could use web scraping to scrape a competitors website for product pricing to include product numbers photos prices and then use this to monitor their price changes and or adjust their own prices on their website to stay just a slight bit more competitive

  • @Code___Play
    @Code___Play Před 4 měsíci

    Very practical and helpful video with very detailed explanation!

  • @craftedpixel
    @craftedpixel Před 2 lety +2

    The legend is back!

  • @Corkyjett
    @Corkyjett Před 2 lety

    this tutorial was great!! thank you!

  • @liamhughes7093
    @liamhughes7093 Před rokem

    Great video. With the phrase "web scraper", I can't help but picture a function that returns a digital box chevy with candy paint, 26" chrome rims, tinted windows, and triple 15" subs in the trunk with some Too $hort going. I hope someone else from Northern California is thinking the same thing, and cracks up seeing this.
    But thank you for your fantastic educational video! cheers.

  • @arjunaudupi7956
    @arjunaudupi7956 Před 2 lety +4

    @tinkernut you are the reason for me being a software developer..
    Thanks dude. Keep up the good work..

  • @thecryptocheckpoint5083

    Wow really great production . Lots of history and info

  • @KowboyUSA
    @KowboyUSA Před 2 lety +2

    Just the inexpensive project I needed.

  • @deepvoyager01
    @deepvoyager01 Před 5 měsíci

    Thank you for the video
    it helped me to understand how scrapper works

  • @kenjohnsiosan9707
    @kenjohnsiosan9707 Před rokem

    it's a coincidence that I have a task to scrape data and format it to CSV then send it to email. thank you for this tutorial, sir.

  • @jackschwabe4929
    @jackschwabe4929 Před 10 měsíci

    great video. very easy to impliment and understand

  • @Warkeds
    @Warkeds Před 2 lety

    This channel is awesome!!

  • @harrystone7954
    @harrystone7954 Před 2 lety

    very logical and understandable explanation

  • @CareerHubSpot
    @CareerHubSpot Před rokem

    Concise and precise

  • @mmuneebahmed
    @mmuneebahmed Před 2 lety +2

    Thanks for sharing the expertise! However, I get the following error when running the code.
    writer.writerow([quote.text, author.text])
    UnicodeEncodeError: 'latin-1' codec can't encode character '\u201c' in position 0: ordinal not in range(256)

  • @user-vz7ff8ps8k
    @user-vz7ff8ps8k Před 8 měsíci

    Thanks a lot for this clear video! How would I retrieve more information associated with the quote? For instance I would like to receive and print both the author and the associated tags.

  • @myriadtechrepair1191
    @myriadtechrepair1191 Před 2 lety +6

    Our lord has returned.

  • @RodWorldTours-fo6mh
    @RodWorldTours-fo6mh Před 7 měsíci

    Most well earned subscriber ever

  • @redentorg.bucalingjr.6320
    @redentorg.bucalingjr.6320 Před měsícem

    Very nice presentation...

  • @NitishKumarIndia
    @NitishKumarIndia Před 11 měsíci

    I love this man

  • @KontrolStyle
    @KontrolStyle Před rokem +1

    well explained, ty

  • @desecrated.eviscerated
    @desecrated.eviscerated Před 8 měsíci +3

    if you get an error, try replacing the line of code: file = open('scrapped_quotes.csv', 'w', encoding='utf-8', newline='')

  • @fearlessAx
    @fearlessAx Před rokem +3

    Hey, I'm getting "NameError: name 'page_to_scrape' is not defined"

  • @Raxer_th
    @Raxer_th Před 2 lety +8

    This channel used to have like 100k views. Now its down to just less than 10k. Idk why. When I was around 13, I wanted to make an fps game and found his video to be very interesting. I follow this channel since then. Tinkernut was the reason I started learning programming. After watching his HTML tutorial (create a website from scratch). Even though I neither have com-sci degree nor working as a programmer, I'm still learning python during my freetime. Thank you Daniel.

  • @lucasn0tch
    @lucasn0tch Před 2 lety +3

    Long time no see.
    This may be useful for tracking stock for a PS5/Xbox/Switch/GPU in these times.

    • @JoaoPedro-ki7ct
      @JoaoPedro-ki7ct Před 2 lety

      Even a Switch is being scalped?
      I heard about PS5, Xbox Series X|S, GPUs but not about the Switch itself.

  • @JayD-jn9or
    @JayD-jn9or Před 3 měsíci

    Thanks for the vid! After a VERY VERY long time i'm getting back into casual coding and looking to casually make some scraping info programs for games with the option to select which info the person wants to see.
    So if the site allows scraping would it be better to have my app in progress be independant, have checks done once a minute or every dive minutes? Or have the info scraped, processed and posted on a site i create and retrieved for ppl using the the app? That is if i start shareing the app. My concern is annoying the site owners by checking too often, forgive me if its a silly question, i'm not experiance with scraping.

  • @nikitadorosh244
    @nikitadorosh244 Před 4 měsíci

    Nice stuff, X.

    • @RobloxPrompt
      @RobloxPrompt Před 3 měsíci

      Yeah, I thought it was very nice too. For me I use visual studio and I found it to be very helpful since I was able to use python and install the pips for python via command prompt then use visual studio code. Though what my primary application would be for finding different sites from a website. Would be interesting for finding src's and href's. Nice name btw. I like the commonality of it.

  • @slankk
    @slankk Před 2 lety

    What a great video

  • @codingmaster24
    @codingmaster24 Před 2 lety +1

    Best yotuber.

  • @sagarnewpane8549
    @sagarnewpane8549 Před 2 lety +4

    I need more content on Rasberry PICO !!

  • @OtherDalfite
    @OtherDalfite Před 2 lety +2

    Halloween intro? At the end of November? This videos been a while in the making huh?😂

  • @serhiyranush4420
    @serhiyranush4420 Před 2 lety +1

    Great explanation. Simple and up to the point. Had to look up, though, what the zip function did, but, I guess, it's even better that I had to find it out on my own.
    However, the quotation marks are not saved right in csv file, instead, they show as 3 weird characters. They do display correctly in Thonny, though.
    Also, the authors are not put into a separate column, but in the same one with the quote.
    Also, the quote with a semicolon in it got broken at this semicolon in two parts, and the second part was placed into a separate column.
    Also, in the csv file open I had to put encoding = "utf-8" after the "w", because I was getting an encoding error. Could this somehow be causing the about problems?

    • @kaiperdaens7670
      @kaiperdaens7670 Před 6 měsíci +1

      same problems here(except the third), I am happy that it isn't just me but I dont know how to fix them bc I am new to this.

  • @ahoj113
    @ahoj113 Před 2 lety +1

    Cool!

  • @ArqitectTV
    @ArqitectTV Před rokem +1

    What if the data you are searching for is obtainable but is on separate pages within a given site.

  • @DrDre001
    @DrDre001 Před 2 lety

    Nice! I need to learn puthon

  • @dillkhalifa
    @dillkhalifa Před 6 měsíci

    you owe me bro. i just subscribed to your channel😂😂

  • @nikro7239
    @nikro7239 Před 5 měsíci

    when I write to csv file for some reason there is always one free row (with literally nothing) between the actual rows with data

  • @Jean_villegas
    @Jean_villegas Před měsícem

    Thanks

  • @RigzoTV
    @RigzoTV Před 2 lety +2

    Need more advance lessons on scraping.

  • @HayaBaqir
    @HayaBaqir Před 7 měsíci

    What are the pips we need to install?

  • @InvinsableNoob
    @InvinsableNoob Před 2 lety

    The avatar has returned 🙌

  • @elisabeth9626
    @elisabeth9626 Před rokem

    Dankeschön ❤

  • @vik237
    @vik237 Před 2 lety

    what raspiberry pi you use?

  • @DTMPro
    @DTMPro Před 2 lety +13

    Where can we find out if we are allowed to scrape data from a specific website so that eventually we don't end up in trouble?
    Does scraping code/process works the same way for scraping product prices, e.g. trying to replicate camel for amazon or that takes additional authorization from amazon?

    • @Tinkernut
      @Tinkernut  Před 2 lety +13

      Excellent question! All popular websites have a scraping/crawling text file called "robots.txt". This tells what can and can't be scraped from a website. Here is an example of Amazon's robots.txt file (spoiler, you can't scrape much) www.amazon.com/robots.txt

    • @jimavictor6022
      @jimavictor6022 Před 2 lety +1

      @@Tinkernut what about those non popular websites with no robot.txt file

    • @JoaoPedro-ki7ct
      @JoaoPedro-ki7ct Před 2 lety +2

      @@jimavictor6022 As long as you don't scrape things like other people's documents from governamental sites or usernames plus passwords you should be fine with the rest.
      What website owners are really worried about are their website availability (whether they are online or offline) and bandwidth usage as they pay X for X amount of gigabytes consumed. (they pay for each gigabyte they send and receive from users)
      So as long as you don't consciously/unconsciously take down their site you're fine.

    • @JoaoPedro-ki7ct
      @JoaoPedro-ki7ct Před 2 lety +3

      @@jimavictor6022 On top of that they have their automated way to detect bots, the worst that can happen is getting your IP "banned" or simply restricted from viewing their webpages, that will happen way, way, way... before you getting sued by them.

    • @jimavictor6022
      @jimavictor6022 Před 2 lety +2

      @@JoaoPedro-ki7ct I really appreciate the reply. Thank you..

  • @jenschristiannrgaard4878
    @jenschristiannrgaard4878 Před 7 měsíci

    how much more difficult is it if I want all sub-pages where you would normally find more information?

  • @Mcmiddies
    @Mcmiddies Před 2 lety

    Hey Tinkernut. Welcome back to my feed.

  • @santoshpandey23
    @santoshpandey23 Před 5 měsíci

    Thanks, this was very good, can you share any link where you have done the same for teh website which require username and password, can you please share the same, thanks a ton

  • @kyrianrahimatulla1561
    @kyrianrahimatulla1561 Před 2 lety

    I had no clue it was this easy, but how do I find out which websites I'm not allowed to scrape? All I get from Google is ways to prevent scraping on my own website (which I don't have, but that's beyond the point).

  • @DroidEagle
    @DroidEagle Před 2 lety +2

    dude where were u?

  • @ura9390
    @ura9390 Před rokem

    Can you do one for people who never used code?

  • @flobbie87
    @flobbie87 Před 2 lety

    Last time i did something like that i used a line mode browser to flatten the webpage.

  • @angeloj.willems4362
    @angeloj.willems4362 Před 2 lety

    Cool goggles, where can I get a pair?

  • @Autoscraping
    @Autoscraping Před 6 měsíci

    An extraordinary piece of video material that has proven highly useful for our new team members. Your generosity is immensely appreciated!

  • @almutabbil-jn2pt
    @almutabbil-jn2pt Před měsícem

    The code didn't create any csv file although I didn't get any error ! why is that?

  • @durrium
    @durrium Před 2 lety

    What do i do if the page gives 404 ???

  • @lolkek6807
    @lolkek6807 Před 5 měsíci

    what if I want just the first quote?not all

  • @Pixilmb12
    @Pixilmb12 Před 8 měsíci

    I use IDLE, but for soup reason in the 'soup.findAll' function it says 'nameerror - name 'soup' not defined' :(

  • @royalhermit
    @royalhermit Před 2 lety +1

    What is line 10 "w"? I am getting NameError: name 'scraped_quotes' is not defined

    • @ashrude1071
      @ashrude1071 Před 2 lety +1

      You probably have a typo

    • @Tinkernut
      @Tinkernut  Před 2 lety +2

      Running it with my code from github works fine github.com/gigafide/basic_python_scraping/blob/main/basic_scrape_csv_export.py

  • @martinrages
    @martinrages Před 2 lety +1

    Can websites detect scraping? If so, how do i escape the dutch AIVD

    • @JoaoPedro-ki7ct
      @JoaoPedro-ki7ct Před 2 lety

      Yes, they have their ways to detect automated requests, but what they do when they detect "bots" is up to each website.

    • @LiEnby
      @LiEnby Před 2 lety +1

      yes and no, you can check for things like user agent string or try run javascript or something like that, however its actually a really hard problem to solve because a scraping script can look indistinguishable from a browser ..

  • @hussainmahady5295
    @hussainmahady5295 Před 2 lety +1

    Awesome 🔥 bro. Can you make a tutorial about tunnelling and vpns

    • @Tinkernut
      @Tinkernut  Před 2 lety

      Sure can! I made them both a few years ago ;-) Just search my channel

  • @reghawkins73
    @reghawkins73 Před rokem +1

    I had to add encoding to the line--- file = open("scraped_quotes.csv", "w", encoding='utf-8')

  • @ejonesss
    @ejonesss Před 2 lety

    how can a web site ban scraping since once the data is downloaded it is open for the taking?
    unless the scraping script acts as a browser and they can figure out based on user agents or lack there of.
    in witch can you be able to intercept the data from the html source from the browser so it is as if you saved the page as an html file and ran it through the script then refreshed the page and repeat?

    • @LiEnby
      @LiEnby Před 2 lety

      technically speaking, there is basically no way to stop it, besides maybe recaptcha, but even then you can simply just have a human do the captcha

    • @pakistaniraveasylum1396
      @pakistaniraveasylum1396 Před 2 lety

      Law

    • @LiEnby
      @LiEnby Před 2 lety

      @@pakistaniraveasylum1396 it's never even been tried in a court tbh

    • @linuxramblingproductions8554
      @linuxramblingproductions8554 Před rokem

      @@pakistaniraveasylum1396 thats like trying to make inspect element illegal it just doesn’t work

    • @pakistaniraveasylum1396
      @pakistaniraveasylum1396 Před rokem

      @@linuxramblingproductions8554 yea the law and bureaucracy in general is retarded

  • @mrmxyzptlk8175
    @mrmxyzptlk8175 Před rokem +2

    Error: "No module named bs4"

    • @recursion.
      @recursion. Před 11 měsíci +1

      Facing the same, were you able to fix it?

  • @AllanYacaman
    @AllanYacaman Před 27 dny

    this seems so refreshing? Why did he stop uploading?

  • @DarthJeep
    @DarthJeep Před 2 lety

    Davy504 fan? "Scrape it..." Just kinda reminded me of the ol' "SLAP IT!" line. lol

  • @havenurmom5375
    @havenurmom5375 Před 18 dny

    this is entertaining the first thirty seconds lol