Web Scraping Using PHP - Parse IMDB.com Movies HTML

Sdílet
Vložit
  • čas přidán 8. 09. 2024

Komentáře • 76

  • @clevertechie
    @clevertechie  Před 7 lety +8

    Download source code here:
    clevertechie.com/php/97/web-scraping-php-parse-imdb.com-movies-html

    • @hanesmitter1469
      @hanesmitter1469 Před 5 lety +2

      THAT code is not working for me

    • @8ack2Lobby
      @8ack2Lobby Před 4 lety +5

      @@hanesmitter1469 you can use mine here it is working fine (Note: changing the page number does not do anything so you need to change the start parameter's value when you wanna goto next page)

    • @ItsAlkhanza
      @ItsAlkhanza Před 3 lety +1

      @@8ack2Lobby Thanks for the code

    • @user-jm6hu4gz7x
      @user-jm6hu4gz7x Před 3 lety

      @@8ack2Lobby can u but in GitHub

    • @ShawnRitch
      @ShawnRitch Před 11 měsíci +1

      @@8ack2Lobby Thank you sir ! ! ! Still works after all these years. Gotta love PHP :)

  • @tanjimrahat9674
    @tanjimrahat9674 Před 3 lety

    I feel this channel should have more subscriber than it has. Thumbs up!

  • @anthonymcevans8191
    @anthonymcevans8191 Před 6 lety +7

    The most useful content I’ve ever ever seen.
    I definitely will subscribe.

  • @chesterleespencer5808
    @chesterleespencer5808 Před 2 lety +1

    It's Enlightening, thank you so much...

  • @BorislavJordanov
    @BorislavJordanov Před 4 lety +1

    You just earned a sub! Love the clarity of your voice. Thanks for all your explanations!

  • @amadeuszdobies9399
    @amadeuszdobies9399 Před 7 lety +5

    Regular expressions are always pain in the ass haha :D

  • @groovykeyz9262
    @groovykeyz9262 Před 7 lety +2

    When I view page source I get a blank page. I don't know why that happens. I have a blank page when I run parse_imdb.php then whenever I call the function scrape_imdb(2000,2000,76,76); and it is loading endlessly

  • @sudhagars641
    @sudhagars641 Před 3 lety

    Nice tutorial, Thanks sir

  • @redblue7733
    @redblue7733 Před 5 lety +1

    Thanks for video and time. Finaly one good yt author.

  • @primepryme
    @primepryme Před 6 lety +1

    i do love your way of explaining thing

  • @mehranehsandoost2799
    @mehranehsandoost2799 Před 7 lety +1

    i have problem with some pages
    when i use :
    $ch = curl_init();
    curl_setopt($ch,CURLOPT_URL,$url);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,True);
    $res = curl_exec($ch);
    echo $res;
    it shows a page like �w�Z�Ѹt���� d
    what should I do to solve this problem?

  • @andrewmusholt9327
    @andrewmusholt9327 Před 5 lety +1

    Fun and useful skill!!! Thank you

  • @alexlopusy3744
    @alexlopusy3744 Před 5 lety +1

    thank so much for your tutorial sir , helps alots

  • @terminatortutorials
    @terminatortutorials Před 7 lety

    I've never seen an attempt to parse XML using a regex that won't break on some content.The real trouble is nested tags.Nested tags are very difficult to handle with regular expressions.If you want to find a very specific pattern in a (ht|x)ml file, go on, regex is perfect for that.But if you are searching for something in in every Foo tag, that could have attributes in different orders, that can be nested, that can be malformed (and still valid), then use a parser, because thats not pattern matching anymore.

    • @zsoltoroszlany7172
      @zsoltoroszlany7172 Před 6 lety

      Arslan Hajdarevic C# XPath nightmare, it is even worse. Basically everything would be fine if (x)html would be valid.

  • @darylzhang3124
    @darylzhang3124 Před 6 lety +1

    Could you explain what is the difference between starting with ! and starting with / when I use the regular expression functions of PHP?
    Because I used preg_match_all('/regular expression/',$result,$matcg) and it worked except the regular expression, '/Directors?:
    (.*?)
    /s',used for obtaining the names of directors.
    Thank you.

    • @zsoltoroszlany7172
      @zsoltoroszlany7172 Před 6 lety +1

      Daryl Zhang Think of the ! like quote in between you have your regex, and since you used ! as a barrier you have to escape exclamation mark in your regexp, without this the regexp parser will think your regex ends there. Example you cant do this somethink like this !.+?(.+?)!! you have to escape the \! before

  • @johnowen-jones6702
    @johnowen-jones6702 Před 5 lety

    Found a couple of issues first is the url needs to be https instead of http, the pagination has changed it now gives 50 results and a second page starts at 51 I made a variable pageblock and then incremented it in the for loop by 50 so 1, 51, 101 not quite perfect but the strange thing is the array is reversed. page 3 is 0-49 page2 50-100 and page 1 is 100 to 150; kind of expected it to fill page1, 2,3, but this is reversed.

  • @foxtechsb6048
    @foxtechsb6048 Před 3 lety

    Nice tutorial, Thanks, how to inset scarp data into database table?

  • @westjr5085
    @westjr5085 Před 7 lety

    great vid...what about websites with user authentication and load the data from react js?? I have the log in portion but I can load the element data because it is delivered from react

  • @FaceBook-bd3xo
    @FaceBook-bd3xo Před 5 lety

    hi man/how to parse data from different pages? example.i want to parse content from the main page(date of concert,heading and image) and inner page(when we clicked on the heading we are step up in the inner page) with preambule.

  • @yusufmzn4541
    @yusufmzn4541 Před 6 lety

    i need an answer for my Question
    Question : who is better for making movies website HTML or Wordpress??
    and i'm learned HTML and CSS and JavaScript

  • @andrewlong3073
    @andrewlong3073 Před 7 lety

    Hi Clever Techie, thanks for a great video. Just one thing : where does the $match variable come from?

  • @sergeyz4591
    @sergeyz4591 Před 4 lety

    Keep on doing stuff like this because it teaches newbies to write really stupid scrapers.

  • @kizitopedro8808
    @kizitopedro8808 Před 5 lety

    Hi I like your previous video though I didnt quite understand how you get the movies from IMDb. But am working on a website for movies and also TV series, so is their any way you can show me how to also add TV series files from IMDb and how it works,like am I to register in IMDb or what please

  • @8ack2Lobby
    @8ack2Lobby Před 4 lety +1

    That code is not working anymore. So, if you need a working one, here it is (NOTE: credit always goes to my Teacher Clever Techie for teaching me all these things). So, enjoy!

  • @nguyenduyquang3533
    @nguyenduyquang3533 Před 3 lety

    Can't scan data when the system requires login, what should I do? , look forward to your help

  • @GauravSharma-cw1hf
    @GauravSharma-cw1hf Před 2 lety

    The link you provide download source code is not work.

  • @nxson8727
    @nxson8727 Před 2 lety

    so i this like a imdb website made with php and html?

  • @mohammedissam3651
    @mohammedissam3651 Před 4 lety

    How did you know that they uses php ?
    IMDb.com they could use c# cus when you load HTML file it doesn’t load backend script with it .
    Do you find it out with try and error,
    Or they said that they use php in their website?

  • @StoreRunDotCom
    @StoreRunDotCom Před 7 lety

    so .*? is a wildcard and in parenthesis is the wildcard you want to capture?

  • @nazartjara9397
    @nazartjara9397 Před 7 lety +1

    Why do not you use xpath?

  • @mahmoudsamyessawy
    @mahmoudsamyessawy Před 6 lety

    Thank you very much , but I am afraid you did it the wrong way , I think we have to us a HTML parsing library instead of regex , because regex cann't parse element by element , for example if a movie have no Gross nor Votes and this movie happened to be the last one of the current page (i .e : their is no next ) , how could we get this using the regex , if we us an Html parsing library we could walk movie by movie and parse all its info , any way I enjoyed the video thank you again :)

  • @kakatoji
    @kakatoji Před 4 lety

    how to scraping and bypassing the recapctha

  • @jcltradingcompany3558
    @jcltradingcompany3558 Před 7 lety

    Hey Clever Techie - From where did you get the regular expression to match with regex101? How could I get that information from any website. ?

    • @jcltradingcompany3558
      @jcltradingcompany3558 Před 7 lety

      Hey Clever Techie, I went through your Regual Expression Tutorial and understood. Thanks a ton. I was confused before thinking where did you get those regular expression you created.. Your tutorials are simply great.

  • @rahulsharma-wy3sy
    @rahulsharma-wy3sy Před 5 lety

    I tried this but some content is not parsing. Can you please help

  • @VivekKumar-br9ie
    @VivekKumar-br9ie Před 6 lety

    Useful Video

  • @johnjohnson7538
    @johnjohnson7538 Před 7 lety

    Great tutorial,
    In the regular expression used to collect the titles, at 2:37, you have
    " '!(.*?)!' " what is the reason for the first '?'
    That is,
    why do you start with '!

    • @clevertechie
      @clevertechie  Před 7 lety

      .*? is non-greedy meaning it will stop when it will encounter the first character which is placed right after it, in this case a double quote. .* is greedy and will go on matching absolutely ANYTHING (spaces, line breaks, ALL the content....) you unleash the cracken with it.

  • @senghortkheang9831
    @senghortkheang9831 Před 7 lety

    Dear Sir,
    Do you ever scrape data from block JavaScript?
    Do you have a solution & simple code?
    Can you help me? I will wait for help from you.

  • @shivam2153
    @shivam2153 Před 6 lety

    I am not getting the required output. Can you tell me what all is required to run this php file for the desired output

  • @naveshkintali1219
    @naveshkintali1219 Před 4 lety

    Brother can you please make a resume parser in Php?
    There is no resume parser in php....not available on internet and CZcams. I tried getting on github but nothing is working.. If u make then your video will be unique and most of them needs it. Please make it in php

  • @soraya7576
    @soraya7576 Před 5 lety

    i need to know how scheduler crawler parser and indexierer will work... help

    • @clevertechie
      @clevertechie  Před 5 lety

      I already have videos on all those things you metioned

  • @munauwarm7449
    @munauwarm7449 Před 6 lety

    how can I insert it into database ?

  • @NandamuriManikanta
    @NandamuriManikanta Před 7 lety

    Hey @Clever Tehie how are you making GET calls to imbd webiste and how are you dealing with CORS problem.Please revert TIA

    • @PeterParker-sy9bp
      @PeterParker-sy9bp Před 7 lety +1

      he is not performing a get request. he is performing a curl request and storing what has returned.And it is returning the source code of the given url. you can watch his video on curl.

  • @shivam2153
    @shivam2153 Před 6 lety

    what all is needed to run this code

  • @TechieUpgrader
    @TechieUpgrader Před 2 lety

    How to scrape amazon prime video website

  • @vashantir
    @vashantir Před 5 lety

    In case a newbie pops in...there's actually 50 elements. Don't forget arrays start at 0.

  • @amguruprasath8037
    @amguruprasath8037 Před 6 lety

    "Moved Permanently
    The document has moved here."
    I get this error.

    • @thisisnotok2100
      @thisisnotok2100 Před 5 lety

      it matters where you curl to, make sure there are no redirects

  • @mehranehsandoost2799
    @mehranehsandoost2799 Před 7 lety

    thank you for tutorial

  • @chanlito4896
    @chanlito4896 Před 7 lety

    No offense but do yourself a favor and use node for web scraping.

  • @aristonia1991
    @aristonia1991 Před 2 lety

    This is such overkill IMDB has an api that's all you need...

  • @frosty1433
    @frosty1433 Před 7 lety

    You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML..........

    • @zsoltoroszlany7172
      @zsoltoroszlany7172 Před 6 lety

      Shea Sollars Tried XPath? I am thinking going with regex too, since XPth is a nightmare in C# believe me, of course maybe there are better tools other tan HtmlAgilityPack, but lets say "hap" follows a strict path to an element and usually results the wrong elemnt altough the XPath is correct.