Video není dostupné.
Omlouváme se.

The Rvest & RSelenium Tutorial - Web Scrape Dynamic Tables in R

Sdílet
Vložit
  • čas přidán 19. 08. 2024

Komentáře • 51

  • @imfm
    @imfm Před rokem +2

    I need to automate pulling data from several websites with atrocious autogenerated spaghetti code. I was trying with Rvest alone and httr and other solutions. I was getting nowhere fast. Then I found this video and boom, I'm in. I can't thank you enough Samer.

  •  Před 11 měsíci

    Very well explained. I didn't' know about {RSelenium}, looks really powerful. Thanks!

  • @delabungsu6817
    @delabungsu6817 Před 2 lety

    Thank you Samer.

  • @user-oy6vj6bz9z
    @user-oy6vj6bz9z Před rokem

    many thanks. great explaination, super clear !

  • @AngelFelizF
    @AngelFelizF Před rokem

    Great video, thanks for sharing

  • @MrNachtduiker
    @MrNachtduiker Před 2 lety

    awesome, thanks

  • @respanol1970
    @respanol1970 Před rokem

    Amazing!!!

  • @tarasst6887
    @tarasst6887 Před rokem

    Great!!!!

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Před rokem

    Neat. I was struggling with some dataset (tiny one) that has commas.

  • @arunrajesh5137
    @arunrajesh5137 Před rokem +1

    Watching this tutorial immediately after your Introduction to RSelenium. Really enjoyed learning it from you Samer. How do we navigate to a webpage with username and password from RSelenium ?

    • @SamerHijjazi
      @SamerHijjazi  Před rokem

      Thank you, Arun! You can do so by identifying the username and password input boxes and sending the username and password to those boxes using the sendKeysToElement function from RSelenium

    • @arunrajesh5137
      @arunrajesh5137 Před rokem

      @@SamerHijjazi thank you so much...

  • @sarahsuzz
    @sarahsuzz Před 5 dny

    I keep getting an error "element not found" when using xpath to locate my "nextpage" button - it is an aria-label and it's located in the div section of the DOM - not sure what I am doing wrong. I have checked my code for typos, very carefully. Can you help?

    • @sarahsuzz
      @sarahsuzz Před 5 dny

      I found my issue - my aria-label was not an "a tag" it was a button

  • @huongheidinguyen337
    @huongheidinguyen337 Před rokem +1

    Thank you for the tutorial. I'm practicing scraping Sephora product reviews and ran into a problem. On my last page, there is still a Next page button (it is just disabled), so there was no error and my Next-page loop didn't end. Do you have any suggestions on how to end the loop in this case?

    • @SamerHijjazi
      @SamerHijjazi  Před rokem

      if there is a way for you to determine how many pages there are, you can set that as your limit in the loop so that it does not go over that number.

  • @user-qp3tb7gi1d
    @user-qp3tb7gi1d Před 7 měsíci

    Can you help please? Error in checkError(res) :
    Undefined error in httr call. httr output: Failed to connect to localhost port 4567 after 2254 ms: Connection refused
    What can be a problem?

  • @devypratiwi8103
    @devypratiwi8103 Před 8 měsíci

    hello thanks for sharing the video!
    so i've already watched and followed all the steps but i got an error saying
    Error in java_check() :
    PATH to JAVA not found. Please check JAVA is installed.
    but something that makes confuses is i've also already installed my JAVA till it complete but the error keeps saying that JAVA is not found. Do you know how to solve this issue? thankyou

  • @user-nu9tv5wo3x
    @user-nu9tv5wo3x Před 8 měsíci

    Hi,
    Thank you so much for this. I am not that big on coding and this solution is really easy to follow. Excuse me if I am being too dumb. I ran into a problem when you refer to the pagination command at 5:25 using the aria label. I am trying to scrape a transfermarkt table and that field is looking pretty different for me:
      
    As you can see, it's a href and not an aria label. There is a link to the next page on every page and I do not know how to iterate this. Works fine if I want to do the first two page but then It's obviously not working. Could you maybe help me out what I should copy paste to the findElement function? Or is this a whole different situation and I have to do something new? Thank you for your help in advance :)

  • @user-mw7do5sr6x
    @user-mw7do5sr6x Před 7 měsíci

    how do i setup the server in firefox browser ?

  • @yehitzmedapirc
    @yehitzmedapirc Před rokem

    Hi! What can I do if I my "Next button" is different every time?
    I do not have a "next" button, I have ti click on the 1, then 2 etc on the page.
    Thanks!

    • @SamerHijjazi
      @SamerHijjazi  Před rokem

      Try to see if the different next buttons have a similar attribute that you can use.

  • @celmywall
    @celmywall Před rokem

    Thank you for your extraordinary tutorial. I'd like to have your opinion on this error: Error in rbindlist(list(all_data, df)) :
    Column 1 of item 2 is length 3 inconsistent with column 2 which is length 4. Only length-1 columns are recycled.
    > Thank you so much.
    Hey, I solved the error easily. Thanks anyways.

  • @shoakromyusupov7297
    @shoakromyusupov7297 Před rokem

    Really helpful video. Would like to ask if you can make similar video to scrape data from social media sites like Instagram, LinkedIn or from your own preference ?

    • @SamerHijjazi
      @SamerHijjazi  Před rokem

      Thank you! I don't think I will. LinkedIn is very difficult to scrape (plus they can close your account for it), and Instagram has its own API.

  • @cameronl1434
    @cameronl1434 Před rokem

    Sorry I am very much a beginner with all this so sorry if this is a stupid question. I have a data table which I want to extract the information from but when I inspect the code it doesn't have an ID. How can I go about selecting the date table without an ID? Thank you in advance

    • @zahrarahmati8612
      @zahrarahmati8612 Před rokem

      Hello Samer, I have exactly the same problem. Would you please help with this?

    • @SamerHijjazi
      @SamerHijjazi  Před rokem

      Not a stupid question at all! Try using a different attribute to identify your table by.

  • @jasonnunez6066
    @jasonnunez6066 Před rokem

    Would it be possible to hop on a zoom for help with a scraping project? I would really appreciate it

    • @SamerHijjazi
      @SamerHijjazi  Před rokem

      I'm currently not offering that. But I might be in the future :)

  • @haraldurkarlsson1147
    @haraldurkarlsson1147 Před rokem

    Don't we have to check whether the site allows scraping first?

    • @SamerHijjazi
      @SamerHijjazi  Před rokem

      Sure! This is only for demonstration purposes. But it's good practice to check first.

  • @eleonoras.2878
    @eleonoras.2878 Před rokem

    Thank you very much for providing such a great explanation! I've encountered an issue in that I'm only seeing a limited selection of chromedriver versions. Unfortunately, none of these versions seem to be compatible with my current Google Chrome version. Would you by any chance have any suggestions on how I might go about resolving this problem? Your insights would be greatly appreciated. :)

    • @SamerHijjazi
      @SamerHijjazi  Před rokem +1

      Thank you for the great feedback! I would suggest running the wdman::selenium function, which will download the latest drivers. Then when you run rsDriver, refer to the chromedriver version that corresponds to yours.

    • @eleonoras.2878
      @eleonoras.2878 Před rokem

      @@SamerHijjazi I appreciate your response and assistance. Thank you very much. :)

    • @SamerHijjazi
      @SamerHijjazi  Před rokem

      @@eleonoras.2878 my latest Selenium video might actually be able to solve your issue. czcams.com/video/BnY4PZyL9cg/video.htmlsi=RP74unOe8SvxWvPV

  • @ahmed007Jaber
    @ahmed007Jaber Před 2 lety

    thank you for this;
    getting the below error
    Error in java_check() :
    PATH to JAVA not found. Please check JAVA is installed.
    whenver running
    rs_driver_object

    • @SamerHijjazi
      @SamerHijjazi  Před 2 lety

      You need to make sure the JDK is properly installed on your machine. If you're on a windows machine, this tutorial is useful: czcams.com/video/IJ-PJbvJBGs/video.html

  • @retobunzli2088
    @retobunzli2088 Před 2 lety +1

    Hey Samer, love the tutorial but ran into an issue I couldn't resolve yet. I am using RSelenium to click on a tab that contains the data I want, which works fine if I run the lines of code one after the other, but not in a for loop. I have a list of links the loop should iterate through and some tries it didn't even click the tab for the first list item, other times it stopped after just a couple.. after just adding a bunch of clickElement() commands it worked for a bit longer (but not directly related to the number of commands added) and then stopped again. Any idea how to make it run more stable? My R memory usage is kinda high, could it be due to that? Am a total noob at R, but confusing that it works manually but not in the loop
    Edit: Also, the netstat free_port function always gives me an 'Error in strsplit(local, ":") : non-character argument'.. I wrote it exactly as you have, so no idea why it doesn't work.. if I define a port manually it (e.g. 14415 or '14415') it says 'Error: port should be an integer value'.. my knowledge of maths might be limited but last time I checked 14415 was an integer lol

    • @SamerHijjazi
      @SamerHijjazi  Před 2 lety +1

      Thank you for the great feedback. I'd have to look at your code to be able to see what's going wrong

    • @retobunzli2088
      @retobunzli2088 Před 2 lety

      ​@@SamerHijjazi ​ Thanks for the quick response. Thought it might be a common or known issue.. I have posted the code in a reddit thread titled "Impossible to run RSelenium's clickElement() in a loop??" 6 days ago
      Only if you have time and interest tho, don't wanna force you to look at my spaghetti code haha

    • @SamerHijjazi
      @SamerHijjazi  Před 2 lety

      @@retobunzli2088 can't find it. Looks like the post was removed. Can you reply to this comment with your loop?

    • @retobunzli2088
      @retobunzli2088 Před 2 lety

      @@SamerHijjazi yeah, just saw it did get removed. The loop looks like this:
      for (link in links) {
      remDr$navigate(link)
      object = remDr$findElement(...)
      results_object$clickElement() issue here (?)
      table i need = remDr$findElement(...)
      same table html = (...)$getPageSource()
      and so on, exactly like you did in the video. It worked line by line, which means the css selectors should be fine, just that the click command doesn't reliably execute.. since the code above probably doesn't help much, the site is (google) 'iaaf 100m times men', then for every athlete i want to go to their profile, click the results tab (this is where it fails randomly) where all the 100m times from the current season are listed, and then extract these values via html table (or similar). The links seem to be correct too, just something about the dyanamic nature of the specific site confuses the clickElement()

    • @SamerHijjazi
      @SamerHijjazi  Před 2 lety

      @@retobunzli2088 My guess is your loop is running too quickly, hence when it gets to the clickElement part, it's not able to locate the element due to the web page loading. I would suggest you include a small break in your loop to create a pause long enough for the site to load. You can do so by using the Sys.sleep function

  • @MohammadMohammad-mj6pc

    👌👌👌. can you create a video tutorial for chromote package.

    • @SamerHijjazi
      @SamerHijjazi  Před 2 lety +1

      This is a good idea! I'd like to explore the package

  • @glanegons
    @glanegons Před 2 lety

    Too good mate, is it possible to share your code? Thanks

    • @SamerHijjazi
      @SamerHijjazi  Před 2 lety

      Thank you for your feedback. I've added the link to the code in the description. :)