Johnny Chivers
Johnny Chivers
  • 112
  • 1 179 974
The Top AWS Services A Data Engineer Should Know In 2024
In this video we take a look at top AWS services you should know as a data engineer. We cover a use case from ingestion through to analytics looking at the best ways to orchestrate our pipelines.
SUPPORT THE CHANNEL:
ℹ️ Udemy Practice Exams: www.udemy.com/course/practice-exams-aws-certified-data-analytics-specialty-o/?referralCode=484C33C8FCA5C93803A5
☕ Buy Me A Coffee: www.buymeacoffee.com/johnnychivers
🖥️ My VPN: go.nordvpn.net/aff_c?offer_id=612&aff_id=74288&url_id=14830
▬▬▬▬▬▬ T I M E S T A M P S ⏰ ▬▬▬▬▬▬
00:43 - Ingest
02:22 - Storage
03:28 - Analytics
04:28 - Orchestration
05:29 - Monitoring & Discoverability
06:08 - AI/ML
06:49 - Outro
The video covers realtime ingestion using Amazon Kinesis as well as batch ingestion in with AWS Lambda, AWS Glue and Amazon EMR. We look at how we can store this data in Amazon S3, Amazon DynamoDB and Amazon DynamoDB before using Amazon Quicksight to build dashboards.
😎 About me
I have spent the last decade being immersed in the world of big data working as a consultant for some the globe's biggest companies.My journey into the world of data was not the most conventional. I started my career working as performance analyst in professional sport at the top level's of both rugby and football. I then transitioned into a career in data and computing. This journey culminated in the study of a Masters degree in Software
zhlédnutí: 2 656

Video

Amazon Bedrock on AWS [AWS TUTORIAL IN 10MINS]
zhlédnutí 2,8KPřed 8 měsíci
LINKS ℹ️ aws.amazon.com/bedrock/ ℹ️ proceedings.neurips.cc/paper_files/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf In this video we take a look at Amazon Bedrock on AWS. We cover the basics of GenAI and how you can get started on Amazon Bedrock using the AWS console to send text prompts at Foundation Models which are exposed via the Amazon Bedrock API. SUPPORT THE CHANNEL: ℹ️ Ude...
My Top 5 Tips For Passing The AWS Certified Data Analytics - Specialty Exam (DAS-C01)
zhlédnutí 3,3KPřed 10 měsíci
LINKS ℹ️ Udemy Practice Exams: www.udemy.com/course/practice-exams-aws-certified-data-analytics-specialty-o/?referralCode=484C33C8FCA5C93803A5 ℹ️ AWS Exam Guide and Questions Downloads: aws.amazon.com/certification/certified-data-analytics-specialty/ ℹ️ AWS Documentation: docs.aws.amazon.com/ In this video I share my top 5 tips for studying and passing the AWS Certified Data Analytics - Special...
What is Amazon DataZone? [AWS TUTORIAL in 12MINS]
zhlédnutí 3,4KPřed 10 měsíci
LINKS ℹ️ docs.aws.amazon.com/datazone/latest/userguide/produce-data-gs.html In this video we take a look at the Amazon DataZone Service available in AWS. Amazon DataZone is a data management service that enables you to catalog, discover, govern, share, and analyze your data. With Amazon DataZone, you can share and access your data across accounts and supported regions. Amazon DataZone simplifie...
AWS Glue Crawler [AWS Console 2023 Full Demo]
zhlédnutí 3,3KPřed 11 měsíci
LINKS ℹ️ GitHub: github.com/johnny-chivers/aws-glue-crawlers ℹ️ AWS Docs: docs.aws.amazon.com/glue/latest/dg/crawler-running.html In this video we cover what an AWS Glue Crawler is and how you can use it to populate the AWS Glue Data Catalog. We cover the basics of the AWS Crawler before diving into a full demo where we register data in S3 with the AWS Glue Data Catlaog using a crawler we defin...
What Table Format Should I Choose For My Data Lake? Hudi | Iceberg | Delta Lake
zhlédnutí 7KPřed 11 měsíci
LINKS TO FULL BLOG: ℹ️ AWS Blog: aws.amazon.com/blogs/big-data/choosing-an-open-table-format-for-your-transactional-data-lake-on-aws/ Using a blog recently posted on AWS I break down and discuss the key considerations when deciding on an open source format for your transactional data lake tables in AWS. We look at the general considerations you should factor into your decision making process be...
Run Spark Jobs On Amazon Athena [FULL TUTORIAL IN 12MINS]
zhlédnutí 3,5KPřed rokem
Have you ever been in a situation where you want to run spark code to analyse data, but don’t want to manage the underlying resources? Then using Amazon Athena’s Spark engine could be the solution for you. Amazon Athena allows you to submit spark code via fully manned spark engine in the form of a notebook. This allows you to carryout data analytics and exploration using Apache Spark without th...
Build Your Own Search Using Amazon OpenSearch Service [FULL COURSE in 15MIN]
zhlédnutí 27KPřed rokem
Want to build your own search solution? The Amazon OpenSearch Service on AWS could be the solution for you. OpenSearch is a distributed, community-driven, Apache 2.0-licensed, 100% open-source search and analytics suite used for a broad set of use cases like real-time application monitoring, log analytics, and website search. OpenSearch provides a highly scalable system for providing fast acces...
Apache Iceberg on AWS with S3 and Athena [FULL COURSE IN 30MIN]
zhlédnutí 18KPřed rokem
Do you face the situation on a daily bases where you data lake queries are slow? updates to the data are nearly impossible? And end users face issues reading or updating data? Then apache iceberg could be the solution you are looking for. Iceberg is an open source table format, that was originally created by Netflix but was handed over to the apache foundation, that allows for fast querying reg...
SQL For AWS Athena [FULL COURSE IN 40mins]
zhlédnutí 16KPřed rokem
In this video I cover how to use SQL with AWSAthena. Using the resources I have uploaded to GitHub we carryout a full tutorial on how to manipulate data and carry out data analytics tasks within the AWS Athena Ecosystem. Don't worry if you are new to SQL, AWS, or Athena I guide you through everything step by step. LINK TO GITHUB TUTORIAL RESOURCES: 💾 Code Repo: github.com/johnny-chivers/sql-for...
PySpark For AWS Glue Tutorial [FULL COURSE in 100min]
zhlédnutí 79KPřed rokem
In this video I cover how to use PySpark with AWS Glue. Using the resources I have uploaded to GitHub we carryout a full tutorial on how to manipulate data and carry out ETL tasks within the AWS Glue Ecosystem. Don't worry if you are new to PySpark, AWS, or Glue I guide you through everything step by step. LINK TO GITHUB TUTORIAL RESOURCES: 💾 Code Repo: github.com/johnny-chivers/pyspark-glue-tu...
AWS EMR Serverless - What is it? [FULL TUTORIAL in 25mins]
zhlédnutí 14KPřed rokem
ℹ️ johnnychivers.co.uk 📁 github.com/johnny-chivers/emr-serverless ☕ www.buymeacoffee.com/johnnychivers 📹czcams.com/video/ygccJS_58jE/video.html (AWS CZcams Video EMR Serverless) 00:37 - What is EMR Serverless? Part 1 00:58 - What is EMR? 01:34 - What is EMR Serverless? Part 2 02:30 - EMR Vs EMR Serverless 03:21 - Glue Vs EMR Serverless 04:40 - Tutorial: Setup Work 13:52 - Tutorial: Create EMR S...
Build An AWS Streaming Fraud Detection App [Full Tutorial using MSK and Kinesis]
zhlédnutí 3,1KPřed rokem
ℹ️ johnnychivers.co.uk 📁 fraud-detection.workshop.aws/en/intro.html 📁 github.com/johnny-chivers/tutorial-kafka-flink-dynamodb ☕ www.buymeacoffee.com/johnnychivers 00:00 - Intro 01:15 - What is the data context 02:43 - Flow of data 04:43 - Main services we are using 04:58 - What are we building 06:41 - Tutorial In this video we build a real time fraud detection app using AWS MSK and AWS Kinesis ...
AWS EMR Tutorial [FULL COURSE in 60mins]
zhlédnutí 57KPřed 2 lety
ℹ️ johnnychivers.co.uk 📁 emr-etl.workshop.aws/setup.html ☕ www.buymeacoffee.com/johnnychivers/e/70388 📁 github.com/johnny-chivers/emrZeroToHero ☕ www.buymeacoffee.com/johnnychivers 01:11 - Set Up Work 07:21 - What Is EMR? 10:29 - Spin Up A Cluster 15:00 - Spark ETL 32:21 - Hive 41:15 - PIG 45:43 - AWS Step Functions 52:09 - EMR Auto Scaling In this video we take a look at AWS EMR and work throu...
AWS Kinesis Tutorial for Beginners [FULL COURSE in 65 mins]
zhlédnutí 59KPřed 2 lety
ℹ️ johnnychivers.co.uk ☕www.buymeacoffee.com/johnnychivers/e/56915 📁 github.com/johnny-chivers/kinesisZeroToHero ☕ www.buymeacoffee.com/johnnychivers 00:09 - What the course will cover 00:54 - Set Up Work 05:43 - Kinesis Streams Theory 09:01 - SDK Vs KPL Theory 10:31 - Kinesis Data Streams Practical 12:03 - Kinesis SDK 15:54 - KPL Practical 22:26 - Lambda Consumer Theory 23:19 - Lambda Consumer...
AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]
zhlédnutí 244KPřed 2 lety
AWS Glue Tutorial for Beginners [FULL COURSE in 45 mins]
AWS MySQL Aurora Vs RDS - What one should I chose?
zhlédnutí 16KPřed 2 lety
AWS MySQL Aurora Vs RDS - What one should I chose?
Top 5 Trends For Data Engineering In 2022
zhlédnutí 3,8KPřed 2 lety
Top 5 Trends For Data Engineering In 2022
AWS EMR vs AWS SageMaker - What One Should I use?
zhlédnutí 2,1KPřed 2 lety
AWS EMR vs AWS SageMaker - What One Should I use?
AWS Glue ETL Vs EMR - Which one should I use?
zhlédnutí 36KPřed 2 lety
AWS Glue ETL Vs EMR - Which one should I use?
Realtime Streaming With AWS Glue Studio
zhlédnutí 4,4KPřed 2 lety
Realtime Streaming With AWS Glue Studio
AWS Glue Studio - Lets Get Hands On!
zhlédnutí 17KPřed 2 lety
AWS Glue Studio - Lets Get Hands On!
Using AWS Aurora For Full Text Search - Complete Tutorial
zhlédnutí 1,5KPřed 2 lety
Using AWS Aurora For Full Text Search - Complete Tutorial
AWS Postgres Aurora Vs RDS - What one should I chose?
zhlédnutí 15KPřed 2 lety
AWS Postgres Aurora Vs RDS - What one should I chose?
My Top 5 Linux Commands On AWS For Data Engineering - Using Cloud9!
zhlédnutí 801Před 2 lety
My Top 5 Linux Commands On AWS For Data Engineering - Using Cloud9!
What Do Cloud Data Engineers Do In AWS?
zhlédnutí 660Před 2 lety
What Do Cloud Data Engineers Do In AWS?
AWS Data Engineering Tutorial for Beginners [FULL COURSE in 90 mins]
zhlédnutí 88KPřed 2 lety
AWS Data Engineering Tutorial for Beginners [FULL COURSE in 90 mins]
How I Architected A Start Up WebApp Using AWS Amplify
zhlédnutí 522Před 2 lety
How I Architected A Start Up WebApp Using AWS Amplify
Beginners Guide To AWS CloudSearch
zhlédnutí 7KPřed 2 lety
Beginners Guide To AWS CloudSearch
Beginners Guide To AWS SQS
zhlédnutí 591Před 2 lety
Beginners Guide To AWS SQS

Komentáře

  • @milogodo100pre
    @milogodo100pre Před dnem

    hI, I've been trying to do an exercise which consist on ingest data from an website (currencies), store them, and then show in a graphic the data collected, that's very simple to say but very difficult for me to do it, do you have any information I will really appreciate it. I have the API key from the source of data

  • @mickyman753
    @mickyman753 Před dnem

    Johnny the speed comes from partition by column we use while creating? Like if I used a different column insyead of date and and used the date related queries , will it still be faster or not?

  • @bruh_1283
    @bruh_1283 Před 2 dny

    Is this legit

  • @philippesantossimoes9241

    Thanks! You helped me a lot! 😁

  • @smoocher
    @smoocher Před 9 dny

    Thank you for this, and you have a most delightful accent.

  • @Chuukwudi
    @Chuukwudi Před 11 dny

    Is the Gold folder redundant ? Seems like t is not needed. Or will it only be used if data in silver still requires further transformation ?

  • @Chuukwudi
    @Chuukwudi Před 11 dny

    Sometimes you hide your owner account id, othrtimes you can't be bothered 😆 . Thank you very much for your tutorials. You are the best!

  • @viewermm1588
    @viewermm1588 Před 13 dny

    Hi all, when creating iceberg table in Athena , I get " Exception encountered when executing query, this query ran against ...... database, unless qualified by the query . please post the error message on our forum ....., anyone know the solution ?

  • @DeepakkumarSLatentView

    just amazing 🥳

  • @spencerfunk6697
    @spencerfunk6697 Před 21 dnem

    !!!

  • @deepg6139
    @deepg6139 Před 21 dnem

    For a very large dataset (like around 15 billion rows overall) is it going to give good performance if we use iceberg to select/delete/update ?

  • @siddharthasahu7205
    @siddharthasahu7205 Před 23 dny

    Is there a way to overwrite the already present table? I cannot find this option anywhere at all.

  • @RahulSinghPatel-st6yb

    line 3:5: mismatched input 'SYSTEM_TIME'. Expecting: 'TIMESTAMP', 'VERSION' I'm getting this error while running the timestamp querry. can you please tell me why?

  • @pmdevengineer
    @pmdevengineer Před 26 dny

    thanks man

  • @harivigsp7934
    @harivigsp7934 Před měsícem

    Can we create an iceberg table to S3 using multi region access point?

  • @kila_whale
    @kila_whale Před měsícem

    7:06

  • @TheodoreRavindranath
    @TheodoreRavindranath Před měsícem

    Today, Aurora is costlier and Aurora serverless is even costlier!!

  • @wiseman9960
    @wiseman9960 Před měsícem

    Liked and subscribed 🤟

  • @sayeedahmad7400
    @sayeedahmad7400 Před měsícem

    The lambda function is not accepting the python codes as they are of previous version of python. What should I do?

  • @priyankajindal6545
    @priyankajindal6545 Před měsícem

    Hi Johnny, really appreciate your video. But when I created crawler in free trail access I am getting below error. Is there anything that you can help me on this? "One crawler failed to create The following crawler failed to create: "crawler_customer_csv" Here is the most recent error message: Account *************** is denied access."

  • @rahulsood81
    @rahulsood81 Před měsícem

    Another good video from the Chiverse.. :)

  • @crade47
    @crade47 Před měsícem

    amazing

  • @venkatrao7868
    @venkatrao7868 Před měsícem

    You are amazing and a natural teacher !!

  • @HikarusVibrator
    @HikarusVibrator Před měsícem

    Is this guy Scottish or Jamaican? Never heard an accent like this before it’s wild

  • @whocares_today
    @whocares_today Před měsícem

    amazing work

  • @danilomenoli
    @danilomenoli Před měsícem

    You are amazing❤

  • @alekhprasadsahu6705
    @alekhprasadsahu6705 Před měsícem

    best video

  • @abhishekprakash4793
    @abhishekprakash4793 Před měsícem

    thanks for easy to follow video ...looking foreword to more such content on azure

  • @chamila.fernando.us2fernan663

    You are Awesome. watching in 2024... ETL steps needs minor updating but I was still able to follow ! Keep up the great work !

  • @ajprasad6865
    @ajprasad6865 Před měsícem

    thank you so much

  • @mikebrown5142
    @mikebrown5142 Před měsícem

    Very good video, thank you! God bless - Matthew 11:28

  • @streethawk2503
    @streethawk2503 Před 2 měsíci

    Very thoroughly described. Thank you

  • @LeisDawut
    @LeisDawut Před 2 měsíci

    Hey Jonny, there were only 2 rows in the bronze/ingest object which you pulled using Firehose, how come there are so many rows after the glue job to silver layer?

  • @Essentialliv52
    @Essentialliv52 Před 2 měsíci

    Amazing.

  • @phambinhchau8188
    @phambinhchau8188 Před 2 měsíci

    Hi thanks for your content. I got the following error while create CFN stack "Please check the role provided or validity of S3 location you provided. We are unable to get the specified fileKey: modules/599e7c685a254c2b892cdbf58a7b3b4f/v1/flink-sql-connector-elasticsearch7_2.11-1.13.2.jar in the specified bucket: ee-assets-prod-us-east-1" Do we need to download the .jar file and upload manually to S3 to make it work ?

  • @ashsaksena
    @ashsaksena Před 2 měsíci

    This was an amazing tutorial. I understood every bit of it because of the way it was explained with hands-on. Loved hand typing of all commands which seemed very real world scenario. Thank you so much Johnny!

  • @kjewelson
    @kjewelson Před 2 měsíci

    happy new year

  • @nebolos
    @nebolos Před 2 měsíci

    Thanks @Johnny Chivers. This video unlocked a lot of confusions I had with RDS and Aurora. But doesn't Aurora global databases provide fault tolerance against Region outage?

  • @deltapulse
    @deltapulse Před 2 měsíci

    Great tutorial - but a pro tip. You *totally* need a keyboard. Ideally one that really fits your finger size, pressing force and such. I mean it as an advice, not as a rude comment though. Give some a try. :)

  • @shared_xp
    @shared_xp Před 2 měsíci

    I have not heard PIG in forever, really enjoyed that language.

  • @maxpayne6625
    @maxpayne6625 Před 2 měsíci

    I feel like now I am zero to a noob. It will take sometime to be a hero :)

  • @alecbg919
    @alecbg919 Před 2 měsíci

    Around 26 minutes after you queried the deleted data it said it scanned 5.76MB. That seems like a lot for just metadata!

  • @sanooosai
    @sanooosai Před 3 měsíci

    thank you sir

  • @benitinmagnate4937
    @benitinmagnate4937 Před 3 měsíci

    @25:00 "I'll talk about connections quickly", LOL! That's what AWS Glue, Azure Data Factory, SSIS, Informatica, are all about: CONNECTIONS! You are moving data from a source to a target, and to do that, you need to be connected to both, the source and the target. Basically, you are an S3 guy, LOL!

  • @cuatrofour4
    @cuatrofour4 Před 3 měsíci

    Thanks for the video! What do you think about using ECS/EKS to run your python ETLs inside docker containers? so you can execute your tasks/pods from MWAA after. In case you don't need spark, could be an alternative to EMR and cheaper than Glue.

  • @AlexXavier
    @AlexXavier Před 3 měsíci

    So clear! Thank you!

  • @user-ch1hk5ii8p
    @user-ch1hk5ii8p Před 3 měsíci

    Awesome !!! Johnny...

  • @LoveisHell85
    @LoveisHell85 Před 3 měsíci

    Love your vids. Could you maybe do a vid on airflow hosting on fargate + simple pipeline? Something practical

  • @SharequeSRQ
    @SharequeSRQ Před 3 měsíci

    Dude, your videos are so helpful, I got a Data Engineer job after practicing with your videos and they are still helpful.. More power to you man, I hope you get more success.

  • @ccc_ccc789
    @ccc_ccc789 Před 3 měsíci

    Thanks!