Designing a Data Pipeline | What is Data Pipeline | Big Data | Data Engineering | SCALER

Sdílet
Vložit
  • čas přidán 6. 09. 2024

Komentáře • 57

  • @SCALER
    @SCALER  Před 2 lety +3

    Check out our FREE masterclasses by leading industry experts now: bit.ly/3Apojjv

    • @ankitKumar-js1ow
      @ankitKumar-js1ow Před 2 lety +2

      I think scaler should have separate course for Data engineering with Dsa and system design with industry level courses as most of guys are working in data engineer field than as Data science
      Waiting for such quality course to move into product based company

    • @sandeepdash5652
      @sandeepdash5652 Před 6 měsíci

      @@ankitKumar-js1ow Till now they do not have a plan/module for Data Engineering .They are simply not interested ..And what they have is DE is just not digestable

  • @akhilcoder
    @akhilcoder Před 2 lety +26

    Regular content. Can be easily searched over internet.

  • @ArunSingh-rk7mm
    @ArunSingh-rk7mm Před 2 lety +3

    Thank you for talking about a demo pipeline, this could come in handy in interviews.

  • @NasimKhan-vu8oi
    @NasimKhan-vu8oi Před 2 měsíci +1

    Excellent presentation. Presented very nicely, concisely, and to the point.

  • @TheSoumyakole
    @TheSoumyakole Před 9 měsíci +1

    How can NOSQL (specifically Cassandra, MongoDB ) be good for ad-hoc analytical queries as mentioned during 12:05?

  • @StartDataLate
    @StartDataLate Před 3 měsíci

    here is a summary:
    00:57 - Understanding of data domains (example: finance data terminology, what is the relationship, primary key, foreign key. Give business side a clear image what can data engineers provide)
    02:57 - Choosing data sources (example: sql database, distributed file system, API, sensor data, web application generated)
    04:43 - Determine the data ingestion strategy( full load or incremental load)
    08:37 - Design the data processing plan (pipeline design real-time process, or batch process)
    11:11 - Set up storage for the pipeline output ( amazon s3 HDFS for datalake, AWS redshift, Hive for datawarehouse, dump back in transational databases)
    13:19 - Plan the data workflow (scheduler, Apache airflow, apache nifi, Azkaban)
    14:42 - Monitoring and governance tools (alert for pipeline failing, tools: Kibana, Grafana, DataDog, PagerDuty)

  • @arunsundar3739
    @arunsundar3739 Před 4 měsíci +1

    helps to see the big picture, thank you very much :)

  • @AkashKumar-kx9vj
    @AkashKumar-kx9vj Před 2 lety +1

    Shashank just makes everything so easy to understand

  • @umakantyadav9972
    @umakantyadav9972 Před 2 lety +1

    Thanks Shashank for explaining in very understandable manner,
    But i have one question you have not discussed about Staging Area??

  • @NehaSingh-wp4mf
    @NehaSingh-wp4mf Před 6 měsíci

    Very well explained and all important topics were covered, thankyou for your efforts. Very helpful.

    • @SCALER
      @SCALER  Před 6 měsíci

      Thanks! Glad this was helpful! 😃

  • @shaistaqureshi8408
    @shaistaqureshi8408 Před 2 lety +1

    I just wanna say thank you for this video

  • @FaizanKhan-ct7pc
    @FaizanKhan-ct7pc Před 2 lety +1

    As a data engineer, should you know all of these tech before getting a job or is it acquired during one?

    • @Watson22j
      @Watson22j Před rokem

      you can easily get an entry level job in data engineering if you know good sql, basic python, basic cloud and hadoop architecture.

  • @MarkyGoldstein
    @MarkyGoldstein Před měsícem

    Well presented, thanks

  • @AmitSharma-xv6sh
    @AmitSharma-xv6sh Před 10 měsíci

    This is really really a very detailed and great explanation of end-to-end data pipeline building architecture. Hatsoff to your hardwork and putting this video out there for us brother. It will definitely clear the doubts and picture about how pipeline work for data migration/ingestion/integration based projects.
    Thanks a lot. 🙏

    • @SCALER
      @SCALER  Před 10 měsíci

      Thanks! Glad this was helpful! 😃

  • @daniyaqureshi6201
    @daniyaqureshi6201 Před 2 lety +1

    Thank you for brilliant video

  • @ramangupta6159
    @ramangupta6159 Před 2 lety +1

    Grafana is a really good monitoring tool

  • @it3374
    @it3374 Před rokem +1

    Please 1 pipeline practical karke dikhao ...CZcams PE Aisa ek bhi vdo nhiye Jo big data ki pipe line create karke dikhaya ho...

  • @marksun6420
    @marksun6420 Před rokem +1

    Thanks

  • @Rk-mv8sz
    @Rk-mv8sz Před 2 lety +1

    Good content . Thank you🙏

  • @ruthmk
    @ruthmk Před 5 měsíci

    Double like 👍🏽
    Thank you

  • @divyanshtayal5077
    @divyanshtayal5077 Před 2 lety

    Make more vedios Gurudev thankyou very much

  • @avshekraj
    @avshekraj Před rokem

    thank you for the nice explanantion

    • @SCALER
      @SCALER  Před rokem +1

      Happy to hear that! 🙌🏼

  • @panktikhurana8906
    @panktikhurana8906 Před 2 lety +1

    Awesome content 🙂

  • @justdataengineer3138
    @justdataengineer3138 Před 2 lety

    When will complete Data Engineering course will be launched from Scaler?

  • @tamannamam3563
    @tamannamam3563 Před 2 lety

    I easily understand this video

  • @endpermia
    @endpermia Před rokem

    Thank you! This was really helpful and well-explained.

    • @SCALER
      @SCALER  Před rokem

      Happy to hear that! 🙌🏼

  • @shrutiikarla1055
    @shrutiikarla1055 Před 2 lety

    Thank you scaler

  • @nandlaljaiswal7217
    @nandlaljaiswal7217 Před 2 lety +1

    Need full course for Data Engineer

  • @obiradaniel
    @obiradaniel Před 2 lety

    Thank you.

  • @asishjoshi5774
    @asishjoshi5774 Před 2 lety

    very nice.. thanks a ton!

  • @shanayakhan839
    @shanayakhan839 Před 2 lety

    Redshift is already setup on the cloud, what about Hive?

  • @healthificteam8465
    @healthificteam8465 Před 2 lety

    Can't wait!

  • @saniyasharif9861
    @saniyasharif9861 Před 2 lety

    Brilliant video again

  • @krishnasaksena2364
    @krishnasaksena2364 Před 2 lety

    Thanks scaler! 🔥

  • @abhisekchowdhury8584
    @abhisekchowdhury8584 Před 2 lety

    Awesome Video

  • @cutipy433
    @cutipy433 Před 2 lety

    Very nice content

  • @saibabatelagamsetty2538

    Really good Content

  • @Sameerkhan-kt5jj
    @Sameerkhan-kt5jj Před 2 lety

    More Data engineering related content please

  • @saniyapoetry8386
    @saniyapoetry8386 Před 2 lety

    Very nice 🙂

  • @PankajKumar-vv5db
    @PankajKumar-vv5db Před 2 lety

    Here the data source is MySQL, what if there was data coming in from multiple sources.

  • @prachiipandeyy
    @prachiipandeyy Před 2 lety

    🔥🔥🔥

  • @parisreview4651
    @parisreview4651 Před 2 lety

    You guys did a great job.

  • @piyushjain419
    @piyushjain419 Před 2 lety +1

    Scaler knows what us students are searching for on google before an exam lol

  • @bangalibangalore2404
    @bangalibangalore2404 Před rokem

    Data Modelling part was missed I guess

  • @ashutoshrai5342
    @ashutoshrai5342 Před rokem +1

    Bumb explanation.What he is explaining is based on his experience.Its not at all generic.He himself needs to improve

  • @nemodbuniversity
    @nemodbuniversity Před rokem

    Aadha adhura gyan

  • @sheenagupta896
    @sheenagupta896 Před 2 lety +1

    Thank you for talking about a demo pipeline, this could come in handy in interviews.

  • @fazaila2047
    @fazaila2047 Před 2 lety

    Grafana is a really good monitoring tool