YouTube Data Analysis | END TO END DATA ENGINEERING PROJECT | Part 2

Sdílet
Vložit
  • čas přidán 26. 06. 2024
  • In this video, you will execute the END TO END DATA ENGINEERING PROJECT using Kaggle CZcams Trending Dataset.
    If you are someone who wants to learn Data Engineering by doing hands-on projects then this video is for you!
    👉🏻Watch Part 1 Of This Video Here - • CZcams Data Analysis ...
    ✨Visit ProjectPro for more projects - bit.ly/3uBzam5
    ✨ Tags ✨
    data engineering projects, big data project, data engineering project hands-on, hands-on data engineering projects, learn data engineering, data engineering roadmap, how to become data engineer, data engineering free projects, big data engineering, big data
    ✨ Hashtags ✨
    #dataengineer #project #darshil

Komentáře • 273

  • @youraverageguide
    @youraverageguide Před 2 lety +3

    Darshil is a great teacher! Great project.

  • @Ujwalarao
    @Ujwalarao Před 7 měsíci

    Thanks for bringing me close to the real use case scenario of Data Engineering.

  • @dsilvera1578
    @dsilvera1578 Před rokem +1

    Darshil I learned alot. I believe this is helping many persons. Thanks for all the effort you put into this.

  • @nitishkaushik2076
    @nitishkaushik2076 Před 2 lety +3

    simple and to the point explanation. Great work bro 👍🏻

  • @kuldeep_garg
    @kuldeep_garg Před rokem +1

    You are doing such a great work, please should learn from you how to teach by this learning by doing method…
    Please do some more projects like this using real time data, big data also so that we can learn that also.
    And thanks again this tutorial is helping a lot🎉❤

  • @srinivasn4510
    @srinivasn4510 Před 7 měsíci

    best project from scratch Thanks bro☺☺

  • @ajitagalawe8028
    @ajitagalawe8028 Před rokem

    Too good. Learned alot.Thank you

  • @umerimran3833
    @umerimran3833 Před rokem

    that was the awesome project, Thank you!

  • @user-gh6gj8cp4f
    @user-gh6gj8cp4f Před 17 dny

    Great work Darshil!
    I have only 1 suggestion after finishing the whole project along with the video which took me around a total of 6-8 hours except the dashboard. My suggestion is that you can take a minute extra and explain the code properly so that we viewers can understand what transform actions are we taking in the ETL because that would make more sense to the video overall and why you chose the steps were there before and after ETL step becomes clearer.
    Though, thanks for this wonderful project and I am probably moving to the Azure analytics project after this one.

  • @piyushpaikroy3579
    @piyushpaikroy3579 Před rokem +1

    hey Darshil.... I hope the project is complete!!

  • @assieneolivier5560
    @assieneolivier5560 Před 6 měsíci +1

    Finally get this project done. Great project to learn data engineering!!

  • @ericalbertobernal101
    @ericalbertobernal101 Před rokem

    Great job !!!

  • @rajrockzz3797
    @rajrockzz3797 Před 2 lety

    Great video bro..

  • @shivakrishna1743
    @shivakrishna1743 Před rokem

    Thanks, for this!

  • @saitarun3246
    @saitarun3246 Před rokem

    Great video Darshil, thank you so much!!

    • @lifefacts7368
      @lifefacts7368 Před 3 měsíci

      I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....

  • @user-wi6fk8hu6x
    @user-wi6fk8hu6x Před měsícem

    Thank you so much for the Amazing video

  • @sharafmomen2460
    @sharafmomen2460 Před rokem +2

    Really great project! Just wanted to ask, when more data ends up in the landing area, will the rest of the processes after automatically go through the pipeline you created? Because it seemed like some parts you had to do manually, like using AWS Lambda.

  • @pawandeore1656
    @pawandeore1656 Před 6 měsíci +1

    informative

  • @prasannakusugal4333
    @prasannakusugal4333 Před rokem

    Thanks for the great video Darshil !!! Leant allot of new things :)

    • @lifefacts7368
      @lifefacts7368 Před 3 měsíci

      I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....

  • @user-uu8su3xl1f
    @user-uu8su3xl1f Před 7 měsíci

    hey bro, your videos was very understandable. could you make the video deeply about the quick sight.

  • @user-do4jn9ld9p
    @user-do4jn9ld9p Před 4 měsíci +1

    Finally completed this project. Thankyou so much for this! You're a gem :)

    • @SatishSharma-rh4su
      @SatishSharma-rh4su Před 4 měsíci +1

      Heyy, I am facing an error while joining cleaned and raw data. Can you please help?

    • @sreyassawant2205
      @sreyassawant2205 Před 4 měsíci +1

      Hey I am facing same issue. Can you help out?

  • @robertmoncriefglockrock8957

    This is a simple error I ran into gonna post it here incase others have the same.
    When trying to run the job @21:25 I was getting "NameError: name 'gluecontext' is not defined.
    When adding the line "df_final_output = DynamicFrame.fromDF(datasink1, gluecontext, "df_final_output")" I accidentally forgot to capitalize glueContext, instead I put gluecontext
    Thank you for this walkthrough, I start my new Data Engineering job tomorrow and the company uses AWS so this has helped me tremendously. You are doing magic my friend

    • @vishnuvardhan9082
      @vishnuvardhan9082 Před 4 měsíci

      Hi Robert, Hope you are doing well. it's been over a year since you posted and joined your new company. Just wanted to check if this new job was your first data engineering job or if you were already experienced in DE? And how are things at your new workplace?

  • @iamayuv
    @iamayuv Před 29 dny

    00:03 Creating a crawler to understand and analyze data stored in AWS S3 buckets
    05:48 Query execution and data type casting
    11:43 Preprocessing and Efficiency for Querying
    17:16 Writing data into the target bucket and creating partitions
    22:05 Create a glue crawler to clean and catalog data
    27:25 Data processing pipeline created using AWS Glue Studio
    32:23 Created an analytical pipeline using AWS Glue to transform and store data
    37:18 Building reporting version of data makes it easier for data scientists to analyze and query the data.
    42:01 Create a dashboard to visualize data from CZcams

    • @mrbcan7215
      @mrbcan7215 Před 19 dny

      Hi, Can you explain me how the "raw_statistic" table has been created autamatcally after he created the crawl_1, When I tried same processor, it didint work on me

  • @SCREENERA
    @SCREENERA Před rokem

    Thanks a lot Darshil and project pro

    • @N12SR48SLC
      @N12SR48SLC Před rokem

      not able to see region column in my schema, also all columns showing string as the datatype 16:07

  • @rahulgyani2965
    @rahulgyani2965 Před 3 měsíci +1

    Hi Darshil, Thank you for this video. I have a question for you, when you have created a cleaned version for csv to parquet, why didnt we use lambda function instead of glue job?

  • @jayasingh7810
    @jayasingh7810 Před 4 měsíci

    Thanks Darshil!! Finally made this cool project after overcoming all those errors. Really good explanation.

    • @GustavoFringe-dv2yg
      @GustavoFringe-dv2yg Před 3 měsíci

      @@aishwaryapatel7045facing same issue

    • @lifefacts7368
      @lifefacts7368 Před 3 měsíci

      I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....

  • @akshayrajput9049
    @akshayrajput9049 Před 2 lety +2

    great work darshil bro ...can you send the ppt if possible

  • @vishalkamlapure3344
    @vishalkamlapure3344 Před rokem +1

    Thank you Darshil for this wonderful project.. I have been looking for such project for long time.

    • @ishan358
      @ishan358 Před 10 měsíci

      How did you solve runtime error

    • @chayanshrangraj4298
      @chayanshrangraj4298 Před 10 měsíci

      @@ishan358 What kind of error are you getting?

    • @lguerrero17
      @lguerrero17 Před 10 měsíci

      ​@@chayanshrangraj4298 can You help me with an error ? In the step of aws glue to do the join of the tables

    • @chayanshrangraj4298
      @chayanshrangraj4298 Před 10 měsíci

      @@lguerrero17 Sure! What is the error that you are facing?

    • @lguerrero17
      @lguerrero17 Před 10 měsíci

      @@chayanshrangraj4298 When I try to do create the etl to generate table analytics , it creates the table but doesn't generate columns and rows.

  • @ajtam05
    @ajtam05 Před rokem +6

    Another great video. Only thing is...AWS has updated Glue console along w/ other consoles. I believe I updated accordingly, except for the schema datatypes (which it looks like I change update after the job is run). But for the script...it does look entirely different. Could you assist w/ an updated vid on using the new Glue consoles?

    • @jenithmehta9603
      @jenithmehta9603 Před rokem +5

      I am facing the same issue

    • @ajtam05
      @ajtam05 Před rokem +1

      @Jenith Mehta If you scroll to the bottom of the navigation pane there is "LEGACY" versions. I realized after I posted this but I used that. Hope that helps. 😀

    • @rohanchoudhary672
      @rohanchoudhary672 Před rokem +2

      @@ajtam05 can't find that

  • @ashutoshdixit2049
    @ashutoshdixit2049 Před 4 měsíci

    Good work ,it's a grate project ,helped out learning many things.

    • @sanikaapatil7279
      @sanikaapatil7279 Před 2 měsíci

      can you help me idont understand with new interface of awsglue

  • @ajtam05
    @ajtam05 Před rokem

    Has anyone used ProjectPro before? I'm considering investing into it, but just wanted to see if anyone has experience with it yet? Looks promising.

  • @imenbenhassine9710
    @imenbenhassine9710 Před 10 měsíci +1

    @darshil thanks for the effort, great job!! I just finished the project and so proud of myself, my very first project switching from DA to DE. thanks a lot

    • @ybalasaireddy1248
      @ybalasaireddy1248 Před 10 měsíci

      Hey hi did you get lambda timeout error by any chance?

    • @allenclement5672
      @allenclement5672 Před 10 měsíci +1

      hey iam seeing the new aws glue ui. how did you create the job there? iam facing a lot of confusion on what to select and to navigate in that .the video ui is different.

    • @vighneshbuddhivant8353
      @vighneshbuddhivant8353 Před 10 měsíci

      @@allenclement5672 hey did you solve this issue

    • @uditkapadia7104
      @uditkapadia7104 Před 10 měsíci

      ​@@ybalasaireddy1248if you are getting time error...pls increase time

    • @KomilMustaev
      @KomilMustaev Před 9 měsíci

      @@allenclement5672 same problems...
      Did u solve them? can u help me?

  • @krishanwannigama3450
    @krishanwannigama3450 Před 9 měsíci

    Anyone who is struggling with trigger ,please make the trigger in s3 bucket . That will work perfectly

  • @pankajchandel1000
    @pankajchandel1000 Před 8 měsíci

    is there a project where you used python notebooks or emr for processing data instead of lambda functions ?

  • @harshitjoshi2250
    @harshitjoshi2250 Před 10 měsíci +1

    To the folks struggling with glue script to filter regions out: try deleting the region files manually from S3 (make sure to enable bucket versioning so that objects are not permanently deleted). By doing this you can check if rest of your code is good, and even go on with rest of the video if its working.

    • @iamgdsclead4208
      @iamgdsclead4208 Před 10 měsíci

      I am getting timeout error

    • @chayanshrangraj4298
      @chayanshrangraj4298 Před 10 měsíci

      I think it's better if you just move the folders somewhere else so you won't have to upload them again in the future.

    • @lguerrero17
      @lguerrero17 Před 10 měsíci

      Which part of the video ?

    • @snehakadam16
      @snehakadam16 Před 8 měsíci

      Hey hi, can you please share the pyspark code?

  • @vasanthkumar8120
    @vasanthkumar8120 Před 3 měsíci

    Hey thanks for this great video. I want to know, how much does it cost to complete this entire project on AWS?

  • @Soulfulreader786
    @Soulfulreader786 Před rokem

    BEFORE TRIGGER DI D YOU CHANGE LAMBDA TO TAKE ALL RECORDS..LIKE INITIALLY IT WAS [
    "RECORDS"][0]

  • @meenachir1167
    @meenachir1167 Před rokem +1

    Hey.. How to convert from csv to parquet for other regions like Russia,Korea etc..

  • @observerXIII
    @observerXIII Před 5 měsíci

    why did you re-create the crawler at the start of the video?

  • @indra3054
    @indra3054 Před rokem

    👌🏻🙏🏻

  • @neha4024
    @neha4024 Před rokem +3

    Can you provide ETL script shown in the video. I am getting error even after adding predicate_pushdown

  • @merkarii
    @merkarii Před rokem

    You go so fast

  • @merkarii
    @merkarii Před rokem

    But good work

  • @zaheeruddinbaber6762
    @zaheeruddinbaber6762 Před rokem +1

    Hi, where to get the ppt you are using?

  • @Sdsatya
    @Sdsatya Před rokem

    Excellent !!

  • @adib4361
    @adib4361 Před rokem +2

    How do we showcase this project in our linkedin profile our in our resume

  • @SCREENERA
    @SCREENERA Před rokem +3

    Finally,after 100times of disappointments...i done it...Great Efforts and it's my very first Project in DataEngineering field...Thanks
    ...Errors are challenging but only whose who have a real interest in DataEngineering.He will definitely achieve it ..by.Done this project completely.

    • @N12SR48SLC
      @N12SR48SLC Před rokem +1

      An error occurred while calling o88.getDynamicFrame. User's pushdown predicate: region in ('ca','gb','us') can not be resolved against partition columns: [] in my job 23:00

    • @SCREENERA
      @SCREENERA Před rokem +1

      @@N12SR48SLC sorry bro.

    • @SCREENERA
      @SCREENERA Před rokem

      @@N12SR48SLC yes bro

    • @SCREENERA
      @SCREENERA Před rokem

      Don't forget close the activated services from AWS..after project got done

    • @SCREENERA
      @SCREENERA Před rokem

      @@N12SR48SLC what's the error

  • @nikhilrunku8877
    @nikhilrunku8877 Před 6 dny

    Hi Darshil, I have been trying to implement this project. At 13:28 you have created a job, but I am not able to see that option in the current version. All I can see is to create the ETL job visually. Can you please help me with this?

  • @GauravKhanna-cl7gd
    @GauravKhanna-cl7gd Před měsícem

    The Athena query works first time on the parquet file, and then I have to delete the unsaved folder in the cleansed bucket, has anyone dealt with this, I am still at the 5 min mark of this video. Really frustrating!!

  • @RizwanAnsari-lt3nf
    @RizwanAnsari-lt3nf Před 18 dny

    Is there any one facing issue with the lambda function as when I have added the trigger but the nothing new file is created once i upload the json file to the same bucket.

  • @shrutika6
    @shrutika6 Před rokem +3

    Do i have to pay anything to complete this project? Or it is completely free?

  • @ShaunDePonte
    @ShaunDePonte Před 9 měsíci

    You didn't answer the initial question as in video 1: How to categorise videos, based n their comments and stats and what factors affect how popular a youtube video will be

  • @satishmajji481
    @satishmajji481 Před 2 lety +2

    @Darshil Parmar - "region=us/" folder is not created for me; only ca and gb folders are created upon running the ETL job. PS: I added "predicate_pushdown = "region in ('ca','gb','us')" as well but floder is missing for "us" region. Can you please take a look at this?

    • @pdubocho
      @pdubocho Před rokem

      Same thing happened to me. Error occurred when initially using AWS CLI to load data into s3 buckets. After executing the command to upload the csv files, I did not hit enter after the upload was "complete". I just exited the cmd box. To fix I manually uploaded the data and re-ran processes from both videos
      edit: this is only valid if you go into your raw data s3 folder and don't find the folder "region=us"

    • @jarlezio2463
      @jarlezio2463 Před rokem

      because us is not present in the initial dataset

    • @Soulfulreader786
      @Soulfulreader786 Před rokem

      use aws cli for creating folders using cp command

  • @lifefacts7368
    @lifefacts7368 Před 3 měsíci

    I have parquet file error when i do string to bigint , i also delete the file from s3 bucket but not working , any one can please help me .....

  • @skateforlife3679
    @skateforlife3679 Před rokem +1

    INTERESTING POINT HERE : at 4:56
    How can you know what primary key to choose to do the INNER JOIN ? Before watchin i tried to make a.video_id = b.id
    Because it sounds logic that each row is unique, so the video_id should be used and compared to the id of the other table that are also unique video row.
    Am i wrong ? Anyone have idea ? Thanks a lot

    • @avanishyadav3705
      @avanishyadav3705 Před rokem +1

      Go and read data columns description for that
      While doing such type of join u will be given proper knowledge of data

  • @DeepUpadhyay-gs1ky
    @DeepUpadhyay-gs1ky Před 20 dny

    where is the link for discoed ?

  • @FRUXT
    @FRUXT Před rokem

    Thanks ! Why parket file ? Is not it more simple to keep everything in json or csv ?

  • @ueeabhishekkrsahu
    @ueeabhishekkrsahu Před 11 měsíci +1

    Where is the discord link?

  • @drcyrax
    @drcyrax Před rokem

    Sorry but i dont see any use of kafka, spark, hadoop etc
    Its just aws and python and SQL

  • @user-om7lq4yk8s
    @user-om7lq4yk8s Před 6 měsíci

    we created ETL job to join data so that when new data gets added to the bucket it will be automatically joined instead of running an SQL query. But shoudnt we trigger this ETL job for the data addition event in S3 ? Can anyone answer this

    • @bhumikalalchandani321
      @bhumikalalchandani321 Před 6 měsíci

      No, i think only 1 time lambda trigger from s3 happens for .json file to paruqet --> then cleasend s3 bucket if filled -> from there analystics data picked.. confirm this

  • @isaacodeh
    @isaacodeh Před 2 lety

    I did asked you a question on your channel about the wrangler which didn’t seems to be working for me. I don’t know if it has to do with location?

    • @DarshilParmar
      @DarshilParmar Před 2 lety

      Yes, it is only available in some locations

    • @isaacodeh
      @isaacodeh Před 2 lety

      @@DarshilParmar oh I see! Thanks for the work you do!! You have been very helpful!!!

    • @skateforlife3679
      @skateforlife3679 Před rokem

      @@DarshilParmar why is that ?

  • @SCREENERA
    @SCREENERA Před rokem +1

    Don't forget close the activated services from AWS..

    • @sivasahoo6980
      @sivasahoo6980 Před rokem +1

      Can you tell how to close it
      We have to delete bucket and etl job or something else

    • @SCREENERA
      @SCREENERA Před rokem

      @@sivasahoo6980 Delete all the services

  • @madmonk0
    @madmonk0 Před 4 měsíci +1

    Is there an updated version of this? The Legacy Glue UI cannot be accessed now

    • @florenceofori7930
      @florenceofori7930 Před 3 měsíci

      Iwas also searching for it. I'm wondering what to use now.

  • @herdata_eo4492
    @herdata_eo4492 Před 2 lety +4

    @projectpro pls consider monthly subs instead. billed 6 mths/yearly is too much.

    • @ProjectProDataScienceProjects
      @ProjectProDataScienceProjects  Před rokem +1

      Hey, we have some discounts going on and it's valid only for a few days, please share your email id and our team will get in touch with you. thanks

  • @nguyentiensu4088
    @nguyentiensu4088 Před rokem +1

    When my lambda function is triggered by an S3 event, the cleaned_statistics_reference_data table is created. But when I check by SQL command "SELECT * FROM cleaned_statistics_reference_data", the result is an empty table. I tested the lambda function with a test event, and everything is OK (there is data in the cleaned_statistics_reference_data table). Please help me with a solution! Thank you!

    • @drishtihingar2160
      @drishtihingar2160 Před 7 měsíci

      have you found the solution I am facing the same issue. Please help me

    • @nandinisingh9217
      @nandinisingh9217 Před 6 měsíci

      @@drishtihingar2160 facing the same problem have found any solution?

    • @drishtihingar2160
      @drishtihingar2160 Před 6 měsíci

      no not yet@@nandinisingh9217

    • @rohitmalviya8607
      @rohitmalviya8607 Před měsícem

      you have to upload json files through cli AFTER creating the trigger.. lambda wont process already existing json files

  • @ganeshb.v.s1679
    @ganeshb.v.s1679 Před 2 měsíci

    Hi I am in the last step of building ETL pipeline. I successfully created the glue job named 'de-on-youtube-parquet-analytics-version' . The contents in the de-on-youtube-analytics bucket are getting added but there is no creation of 'final_analytics' table happening. Please help me resolve the issue. Thanks in advance

    • @086_AASTHASHUKLA
      @086_AASTHASHUKLA Před 12 dny

      hi ,I created the glue job but it isnt creating the same files under raw_statistics as shown in the video how did you do it

  • @ruchipadhiyar7764
    @ruchipadhiyar7764 Před rokem

    why you have done analytics on only this three region ? region in ('ca','gb','us')

    • @pdubocho
      @pdubocho Před rokem +1

      He's testing his code in the job's native language English first to ensure his ETL job works before going through the trouble of converting foreign languages to udf

  • @prafulbs7216
    @prafulbs7216 Před 2 lety +3

    S3 rigger is not working for me, I tried many times. The data is not writing into s3 cleansed bucket(json files)

    • @eduhilfe1886
      @eduhilfe1886 Před rokem

      Please check is there any space while defining Prefix: youtube/raw_statistics_reference_data/ . If you are coping from s3 then there may be some space after youtube/

    • @harshalshende69
      @harshalshende69 Před rokem

      hello brother , have u find the solution i m getting same error

    • @prafulbs7216
      @prafulbs7216 Před rokem +2

      @@harshalshende69 i actually did manual work, by uploading the files.

    • @robertmoncriefglockrock8957
      @robertmoncriefglockrock8957 Před rokem

      @@eduhilfe1886 This fixed it for me thank You!!

    • @robertmoncriefglockrock8957
      @robertmoncriefglockrock8957 Před rokem

      @@harshalshende69 Check your s3 trigger, make sure CZcams/ doesn't ave a space after it

  • @prafulbs7216
    @prafulbs7216 Před 2 lety +2

    Add trigger to lambda function, not working for me. Tried many times, Please suggest.

    • @divyakhiani1116
      @divyakhiani1116 Před 9 měsíci

      facing same issue. Did you find a solution ?

    • @prafulbs7216
      @prafulbs7216 Před 9 měsíci

      @@divyakhiani1116 I redid the same steps once again, I guess. I don't remember, though !

  • @kopalsoni4780
    @kopalsoni4780 Před rokem +1

    Hey all, I am stuck at 40:35. I don't see the Database option for 'New Athena data source'. Not sure if QuickSight had an update since this video was created. Any suggestions?

    • @kopalsoni4780
      @kopalsoni4780 Před rokem +1

      Answering my own question, had to change the region which was a default selection.

    • @shrutika6
      @shrutika6 Před rokem

      Thanks. Can u plz tell me whether u need to pay anything to complete this project?

    • @duytrkhanh
      @duytrkhanh Před rokem

      thank you so much

    • @jolaoduwole4523
      @jolaoduwole4523 Před rokem

      I'm stucked at 12:55 I'm unable to go pass the id type error... I deleted parquet several times but still not working

    • @divyakhiani1116
      @divyakhiani1116 Před 9 měsíci

      I did everything in us-west-1 (California) region, but this region is not available in quicksight. Can you help please @@kopalsoni4780

  • @kaushiksarmah999
    @kaushiksarmah999 Před rokem +1

    Hello Sir, I m not able to convert the id field type to bigint
    i tried the steps as according to the video multiple times.
    Even looked online for the procedure but got noting as such.
    Can you help me sir?

  • @TheAINoobxoxo
    @TheAINoobxoxo Před 3 měsíci

    what to do about creating jobs
    @darshit #darshil
    should i use the script that is given as now aws has moved to visual etl and simple job creation has became complex for someone who doesnt know how to work with visual etl

  • @jenithmehta9603
    @jenithmehta9603 Před rokem +4

    Job creation UI has completly changed. I am stuck at that step.

    • @russophile9874
      @russophile9874 Před rokem +2

      go to the script tab and click edit. Paste the spark code from the githup repo, it will work.

    • @subashpandey518
      @subashpandey518 Před 6 měsíci

      @@russophile9874 could you please explain precisely what script tab and which edit? I am stuck on this step. thanks

  • @N12SR48SLC
    @N12SR48SLC Před rokem +1

    not able to see region column in my schema, also all columns showing string as the datatype

  • @ebubeonuegbu3467
    @ebubeonuegbu3467 Před měsícem

    I added this code: predicate_pushdown = "region in ('ca','gb','us')"
    and got this error
    "Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o103.getDynamicFrame. User's pushdown predicate: region in ('ca','gb','us') can not be resolved against partition columns: []"
    the source S3 data in my set up is partitioned by the "region" column.
    Please how do I resolve this?

    • @iamayuv
      @iamayuv Před 29 dny

      Bro have you found the solution ?

  • @vijayarana8208
    @vijayarana8208 Před rokem +1

    Hello, Darshil I am kind of stuck at(22.52) of the video. My job runs successfully but the raw_statistics folder is not created. I have described the region correctly in the code.
    any suggestion would be helpful ,

    • @anupammathur918
      @anupammathur918 Před rokem +3

      Check the s3 trigger remove the space after youtube/

    • @ajitagalawe8028
      @ajitagalawe8028 Před rokem +1

      caught with the same issue. In my case , files were created directly in raw_statistics. There are no sub-folders "region/". Could you please help me? Thanks

    • @ueeabhishekkrsahu
      @ueeabhishekkrsahu Před 11 měsíci

      Can you please share your script? I have created my job but it is not executing. Please share, it will be a great help.

    • @user-nu5vb4hx7o
      @user-nu5vb4hx7o Před 8 měsíci

      Actually in my case i am getting confused in creating the job because in the current aws ui it directly shows visual etl there is no option of target and data transform and no option of adding a job manually..if anyone could please help me with that

  • @BobbyCambos
    @BobbyCambos Před 11 měsíci +1

    It seems like for me at the 28:26 the parquet files didn't transformed. I checked the trigger and the region but still no finding solution. Do anyone has any idea ?

    • @user-zu3yp3bu5s
      @user-zu3yp3bu5s Před 11 měsíci +1

      Did you remove extra white space in prefix and retry it? I solve the same problem in this way.

    • @BobbyCambos
      @BobbyCambos Před 11 měsíci

      @@user-zu3yp3bu5s yes, and i had the same result. Only blank database with only the column names and no parquet file uploaded.

    • @ahmedopeyemi2980
      @ahmedopeyemi2980 Před 7 měsíci

      Thank you so much@@user-zu3yp3bu5s​ . After days of asking, I finally found a solution.

  • @subashpandey518
    @subashpandey518 Před 6 měsíci +1

    someone plz help, the UI for creating the job completely changed. I am not able to create new jobs.

  • @schrodinders_douchebag
    @schrodinders_douchebag Před 11 měsíci

    Just finished the project; Amazing work man!!!

    • @ybalasaireddy1248
      @ybalasaireddy1248 Před 10 měsíci +1

      hi, I need some help with the project will you be able to help?

    • @ishan358
      @ishan358 Před 10 měsíci

      ​@@ybalasaireddy1248how did you solve runtime error in lamba

    • @ishan358
      @ishan358 Před 10 měsíci

      How did you solved lambda runtime error

    • @lguerrero17
      @lguerrero17 Před 10 měsíci

      @@ybalasaireddy1248 Hi I am trying to do the project , we could support with us

    • @KomilMustaev
      @KomilMustaev Před 9 měsíci +1

      @@lguerrero17 hi, can u help me too, please?

  • @mayurkumar23
    @mayurkumar23 Před 4 měsíci

    Somebody help, I am getting this error:
    TYPE_MISMATCH: Unable to read parquet data. This is most likely caused by a mismatch between the parquet and metastore schema
    This query ran against the "de-yt-clean" database, unless qualified by the query.
    I have changed the schema still no progress.

    • @adarsharora6097
      @adarsharora6097 Před 3 měsíci

      you need to create your parquet file again now by running the lambda function again. this was covered in video too

  • @atharvasankhe1153
    @atharvasankhe1153 Před měsícem

    how to get the Jobs ( 13:30 ). Apparently, the Glue console has changed , so not sure how to go ahead

  • @shantanuumrani9163
    @shantanuumrani9163 Před rokem +3

    In 16:54, I'm not able to see the region source key in my output schema. What should I do?

  • @princenath3211
    @princenath3211 Před měsícem

    Can anyone Please explain how to setup the ETL Job , aws glue UI has been changed now there is now option like the one showed in the video, instead there is visual etl,notebook, script editor. many students are facing the same issue but no one is replying. Can anyone please help and write what needs to be done ?

    • @dhruvingandhi1114
      @dhruvingandhi1114 Před 12 dny

      Hello Same issue Facing
      Had you solved it then please share

  • @jatin7089
    @jatin7089 Před 3 měsíci +1

    I am stuck on creating Glue job as UI is different. Please anyone help here . Where to change data types. I am able to add source and target

    • @shubhamnikam4759
      @shubhamnikam4759 Před 3 měsíci

      stuck in same
      I added data target and source but not able to figure out how to change data type

    • @rohitmalviya8607
      @rohitmalviya8607 Před měsícem

      @@shubhamnikam4759 @use google gemini

  • @vishwajithsubhash6269
    @vishwajithsubhash6269 Před rokem +4

    I understand why we need to convert JSON to parquet, but why do we convert CSV to parquet it's already clean right?

    • @vineetsrivastava4906
      @vineetsrivastava4906 Před rokem

      parquet file format is more optimized and is faster- read more about it on internet

    • @sanikaapatil7279
      @sanikaapatil7279 Před 2 měsíci

      Actually in my case i am getting confused in creating the job because in the current aws ui it directly shows visual et/ there is no option of target and data transform and no option of adding a job manually.if anyone could please help me with that
      ​@@vineetsrivastava4906

  • @mahendrapatil8709
    @mahendrapatil8709 Před rokem

    Can we tranform json data into parquet through glue?

    • @suchitranair683
      @suchitranair683 Před rokem +1

      Heyya, are you doing the project currently hands on? I am looking for someone I can start the project with together.

    • @mahendrapatil8709
      @mahendrapatil8709 Před rokem

      No i am also looking for someone with whom i can do project

  • @user-qq3xg5qn2v
    @user-qq3xg5qn2v Před 8 měsíci +4

    Hey is there someone can help me? The UI of ETL Jobs has changed a lot and I cannot add a job successfully.

    • @chinmaymaganur7133
      @chinmaymaganur7133 Před 6 měsíci

      were you able to figure this out

    • @gabilinguas
      @gabilinguas Před 5 měsíci +3

      Hello! I faced this issue and I figure it out.
      You go to ETL Jobs, click on the button "create job from a blanck graph" and go to "job details" on the menu third item

    • @gabilinguas
      @gabilinguas Před 5 měsíci +1

      The second part, when it clicks next, you have to go to the visual (first item on the same tab you clicked on job details), add a node, first you choose the S3 bucket to choose the source, than next you add a new node from schema, and than a third node on the target tab

    • @RonitSagar
      @RonitSagar Před 5 měsíci

      Hey ​@@gabilinguas I am not able to get what you said in the last comment. Can you please little more😊

    • @gabilinguas
      @gabilinguas Před 5 měsíci

      Hey @@RonitSagar !
      You can follow basically the same steps described in the "Build ETL Pipeline" in 30:33 on the video.
      The process is almost the same, you just have to pay attention on the different details.

  • @RonitSagar
    @RonitSagar Před 5 měsíci

    While creating job i am not able to get region in the option. can you please help me at 16.10 min

    • @adarsharora6097
      @adarsharora6097 Před 3 měsíci +1

      because we are getting data from s3. Instead we need to select our source from Data Catalog

  • @parakh.17
    @parakh.17 Před 7 měsíci +1

    Hi, I am getting error while creating Glue ETL job 17:00 the UI is completely different and cannot proceed further, any help?

    • @Chandu_Art
      @Chandu_Art Před 7 měsíci +1

      same here.. stuck there

    • @srihariraman9409
      @srihariraman9409 Před 7 měsíci +3

      @@Chandu_Art ive set the job pipeline using the new UI, but the script editing is mismatched

    • @saiganesh5702
      @saiganesh5702 Před 7 měsíci +1

      hey parakh did your issue with ETL got resolved? if yes can you please help me with that

    • @yashiyengar7366
      @yashiyengar7366 Před 6 měsíci +1

      Same for me stuck at ETL job creating section

    • @dhruvingandhi1114
      @dhruvingandhi1114 Před 12 dny

      @@srihariraman9409 how did u set the new UI pipeline
      Please mention few steps

  • @banarasi91
    @banarasi91 Před rokem +1

    why my trigger is not invoked when file is uploaded in s3 ,although my test is properly working in lamda function,it is not showing any error also. i am not able to understand the issue

    • @sivasahoo6980
      @sivasahoo6980 Před rokem

      did you get any solution

    • @banarasi91
      @banarasi91 Před rokem +1

      @@sivasahoo6980 its been time since i posted but if i remember correctly it was somewhere with naming or syntax where there was extra space which i was not able to find then rewatcing evrything i got it, idont exactly remember where but may be in some path

    • @banarasi91
      @banarasi91 Před rokem

      hello guys,you might be getting error at the point of testing that is because of db name has been not changed in environment variable, please take care he has forget to change db name , if you notice in athena database name is db_youtube_cleaned but it should be de_youtube_cleaned, which is giving error in lamda final testing as "Entity not found"

    • @sivasahoo6980
      @sivasahoo6980 Před rokem

      @@banarasi91 thanks a lot
      yeah there is a extra space in path

    • @N12SR48SLC
      @N12SR48SLC Před rokem +1

      @@banarasi91 not able to see region column in my schema, also all columns showing string as the datatype 16:07. my etl job is also failing

  • @gnanu3530
    @gnanu3530 Před 4 měsíci

    from part 1 of this project i'm facing with the error below,
    let me know the solution for it.
    Test Event Name
    db_amazon
    Response
    Calling the invoke API action failed with this message: Failed to fetch
    Function Logs
    Request ID

  • @jayitankar7919
    @jayitankar7919 Před rokem +1

    Hi my Lamda triger for json files is not getting fired dont know whats wrong .

    • @aneeqbokhari4611
      @aneeqbokhari4611 Před rokem

      Yeah same. Have you figured it out?

    • @robertmoncriefglockrock8957
      @robertmoncriefglockrock8957 Před rokem

      @@aneeqbokhari4611 Same here

    • @bukunmiadebanjo9684
      @bukunmiadebanjo9684 Před rokem +1

      Had to stop here too. After deleting all files and re-uploading, trigger does nothing.

    • @shantanuumrani9163
      @shantanuumrani9163 Před rokem +1

      ​@@bukunmiadebanjo9684 The same thing happened with me. And one figured out to solve it?

    • @bukunmiadebanjo9684
      @bukunmiadebanjo9684 Před rokem

      @@shantanuumrani9163 didn't find a solution. The whole UI also looks different as AWS already made changes so I decided to move to a different course and abandoned this.

  • @SankarJankoti
    @SankarJankoti Před 2 lety +1

    Thank you so much for wonderful project. I am getting below error while testing the lamba function. can you please advise?
    Test Event Name
    s3-put
    Response
    {
    "statusCode": 200,
    "body": "\"Hello from Lambda!\""
    }

    • @shaikanishmib8391
      @shaikanishmib8391 Před 2 lety +1

      you need to deploy first then test the lambda function.

    • @SankarJankoti
      @SankarJankoti Před 2 lety

      @@shaikanishmib8391 deploy is disabled

    • @prafulbs7216
      @prafulbs7216 Před 2 lety

      Did trigger work for you

    • @SankarJankoti
      @SankarJankoti Před 2 lety

      @@prafulbs7216 I m stuck there. It is not working

    • @prafulbs7216
      @prafulbs7216 Před 2 lety +1

      @@SankarJankoti Yeah, So i just did manually one by for 3 regions(ca, ge, us) with lambda function only and continued.

  • @GustavoFringe-dv2yg
    @GustavoFringe-dv2yg Před 3 měsíci

    I facing error in that Pyspark code, please help me out

    • @vidishasharma6795
      @vidishasharma6795 Před 3 měsíci

      Try to change bucket name and database name in pyspark script according to your naming conventions you have used.

  • @anshulnegi1822
    @anshulnegi1822 Před měsícem

    7:10 correction: the characters are not in russian but in korean script , my gawwddd, indians and their obsession with russian

  • @samidhashah542
    @samidhashah542 Před rokem

    very fast

  • @skateforlife3679
    @skateforlife3679 Před rokem +2

    Thank you a lot for this project!
    It helps me to understand what tools we generally use as Data Engineer to build data pipelines etc. But, I don't feel like to have learned how to do it myself. I mean, i have followed you along, and understand what we made but, I need more explanations on how you process the data, how you get your bucket with AWS Lambda (the code is not explicit to get when doing this " bucket = event['Records'][0]['s3']['bucket']['name'] key = urllib.parse.unquote_plus(event['Records'][0]['s3']['object']['key'], encoding='utf-8')") Need exercises myself

    • @mananyadav6401
      @mananyadav6401 Před rokem +1

      You can go through the test event that we generated. There is a json in the test event that we are using to test the function. Try to navigate that , you will get the understanding how bucket name is captured etc. hope it helps

    • @skateforlife3679
      @skateforlife3679 Před rokem

      @@mananyadav6401 Oh yeah, i'll do that, good idea, thanks for your answer :) !

  • @manigowdas7781
    @manigowdas7781 Před 7 měsíci +1

    Just completed this project. Thanks for the Content , understanding AWS services and using them for our use case is really crazy thing! @DarshilParmar ❤
    #AWS CLI
    #S3
    #Lambda
    #Glue
    #Crawler
    #Glue Studio
    #Glue ETL
    #Athena
    #Database
    #Quicksight

    • @saiganesh5702
      @saiganesh5702 Před 7 měsíci

      hey manoj can you please help me with new etl job visual editor scripts I am facing trouble to understand that

    • @yashiyengar7366
      @yashiyengar7366 Před 6 měsíci

      @@saiganesh5702 even I am facing issues in ETL job creation section due to new UI

    • @chinmaymaganur7133
      @chinmaymaganur7133 Před 6 měsíci

      how did you set up etl glue job

    • @chinmaymaganur7133
      @chinmaymaganur7133 Před 6 měsíci

      @@saiganesh5702 were you able to figure this out

  • @bishop9168
    @bishop9168 Před 6 měsíci +1

    @Darshil. KINDLY ASSIST.
    Great Job Darshil!!
    So far so good, I got stuck on running the de-youtube-parquet-analytics-version job part 2 (minute 35:00) of the tutorial , I keep getting the error below:
    Error Category: UNCLASSIFIED_ERROR; An error occurred while calling o114.pyWriteDynamicFrame. Unable to parse file: RUvideos.csv

    • @gabilinguas
      @gabilinguas Před 5 měsíci

      How did you soved this problem?

    • @gabilinguas
      @gabilinguas Před 5 měsíci +1

      I found the solution. This error happens when your csv file have characters different than UTF-8. What you have to do is save the files again on the buckets respecting the UTF-8 format. If you put your csvs on google sheets or excel you can save with utf-8 formatting.

    • @reddynaveen1841
      @reddynaveen1841 Před 4 měsíci

      @@gabilinguasHi, I stuck at the same spot 35:00. Can you help me out ?

  • @JEETKUMAR8908612303
    @JEETKUMAR8908612303 Před 9 měsíci

    I have set the Lambda function "timeout" duration to 10 min, but still gives me timeout error.
    I have tried to increase the duration to 15 min also & again it got failed.
    Before timeout, the function created the parquet file in the destination folder, but no table was created in the glue catalog.
    can someone help me to fix this issue?