ETL from AWS DynamoDB to Amazon Redshift Using Amazon Kinesis Firehose Delivery Stream & AWS Lambda

Sdílet
Vložit
  • čas přidán 7. 09. 2024
  • ===================================================================
    1. SUBSCRIBE FOR MORE LEARNING :
    / @cloudquicklabs
    ===================================================================
    2. CLOUD QUICK LABS - CHANNEL MEMBERSHIP FOR MORE BENEFITS :
    / @cloudquicklabs
    ===================================================================
    3. BUY ME A COFFEE AS A TOKEN OF APPRECIATION :
    www.buymeacoff...
    ===================================================================
    This video shows how to perform ETL operations with AWS DynamoDB stream to Amazon Redshift using Amazon Kinesis firehose delivery stream and AWS lambda.
    This video has clean flow walk through using pictorial overview and also explanations of service by service connections used in it.
    It shows how to create , configure and make required connections to achieve this ETL operation.
    It also have lambda code walk through and final demo of the scenario by executing the this ETL pipeline.
    This video helps AWS SME , Data engineer , Architects etc.
    File used in demo can found at repo link : github.com/Rek...
    #dynamodb #redshift #kinesisfirehose #etl #aws #awslmbda

Komentáře • 27

  • @SandeepSingh-hn6it
    @SandeepSingh-hn6it Před 5 měsíci +1

    Really it very clear to understand, I have some doubts, how to incremental syncing if it happening then how to avoid duplication syncing on redshift, and how much delay to replicate unique record on redshift.

    • @cloudquicklabs
      @cloudquicklabs  Před 5 měsíci

      Thank you for watching my videos.
      Glad that it helped you.
      We can do INCREMENTAL sync without duplication using AWS Glue job.
      I shall create new video on this topic soon.

  • @anujsaraswat864
    @anujsaraswat864 Před 4 měsíci +1

    If I am putting json format sample data in firehose so in the copy command section, do I need to put Json or what?

    • @cloudquicklabs
      @cloudquicklabs  Před 4 měsíci

      Thank you for watching my videos.
      You would need to provide JSON format.

  • @anuragbond913
    @anuragbond913 Před rokem +1

    Has AWS stoped giving Free trial of Redshift because i could not find it in my Redshift cluster ? Anyone has any idea about this.

    • @cloudquicklabs
      @cloudquicklabs  Před rokem

      Thank you for watching my videos.
      I have not heard about free tier but you could use low cost Dev/Test options here. For more details about free tier here aws.amazon.com/redshift/free-trial/

    • @anuragbond913
      @anuragbond913 Před rokem

      @@cloudquicklabs Like in this video you have used free tier Redshift. I think AWS stoped the free tier and we have to use low cost Redshift cluster instead.
      Just one more thing can I use Redshift serverless instead of Redshift cluster, AWS provide 300$ worth free serverless Redshift.
      You videos are very helpful, Thanks for the good work 👍😊

  • @khandoor7228
    @khandoor7228 Před 2 lety +1

    great content on this channel!!

    • @cloudquicklabs
      @cloudquicklabs  Před 2 lety

      Thank you for watching my videos.
      Appreciate your encouragement here.
      Keep watching and keep learning.

  • @theskygivesusreasons
    @theskygivesusreasons Před rokem +1

    Hello! Do you know if you would be able to use Redshift Serverless with Kinesis Firehose instead of Redshift Provisioned Clusters? Thank you for the wonderful video!

    • @cloudquicklabs
      @cloudquicklabs  Před rokem +1

      Thank you for watching my videos.
      As per my reading currently serverless redshift does not support public endpoints and Kinesis fire needs public endpoints, hence currently it may not be supported. But shall create video on it once it starts supporting it. Thank you

  • @ansh1ta
    @ansh1ta Před rokem +1

    How do you handle the updates to the records in DynamoDB tables to get reflected back to Redshfit??

    • @cloudquicklabs
      @cloudquicklabs  Před rokem

      Thank you for watching my videos.
      This would require a customization may be you need think it with another Pipeline containing lambda updating record at Redshift when it is updating at DynamoDB.

    • @ansh1ta
      @ansh1ta Před rokem +1

      But can Lambda write to a Redshift table? My impression is that it can only query the tables.

    • @cloudquicklabs
      @cloudquicklabs  Před rokem

      Thank you for watching my videos.
      Yes lambda can as under the wood it is executions SQL queries on RDS database table..

  • @ansh1ta
    @ansh1ta Před rokem +1

    Can u please share what Roles and Permission are needed. I am getting an error when the Firehose is trying to connect to my Redshift cluster. Have opened the Security Groups to allow all communications, but still facing an issue

    • @cloudquicklabs
      @cloudquicklabs  Před rokem

      Thank you for watching my videos.
      I have given blanket permission ( admin) access to the role I am using in video. May be if you can the error in Firehose I can help you there.

  • @liumx31
    @liumx31 Před rokem +1

    Can this workflow be done in step function? Or could the Lambda directly write to Redshift?

    • @cloudquicklabs
      @cloudquicklabs  Před rokem

      Indeed this scenario could be achieved with many ways with serverless function. You are right we could do that.

    • @prashanthm2446
      @prashanthm2446 Před 3 měsíci

      @liumx31, I had the same question in my mind, glad you have already asked. Thanks @cloudquicklabs for answering.

  • @keane26mar30
    @keane26mar30 Před rokem +1

    File "/var/task/lambda_function.py", line 22, in lambda_handler
    firehoseRecord = convertToFirehoseRecord(ddbRecord)
    File "/var/task/lambda_function.py", line 8, in convertToFirehoseRecord
    newImage = ddbRecord['NewImage']. Hi sir do you know why im getting such an error

    • @cloudquicklabs
      @cloudquicklabs  Před rokem

      Thank you for watching my videos.
      Did you check if your DynamoDB colum names are as same it is mentioned in python code.

    • @keane26mar30
      @keane26mar30 Před rokem +1

      @@cloudquicklabs okay but may i know what policies did you use for your iam roles, especially the redshift ones

    • @cloudquicklabs
      @cloudquicklabs  Před rokem

      Thank you for coming back on this.
      I have given 'Administrator' access to this as it is a demo. But at production you could fine grain it.

    • @keane26mar30
      @keane26mar30 Před rokem

      @@cloudquicklabs AccessDenied
      User: arn:aws:sts::880387018372:assumed-role/voclabs/user2209860=KEANE_LOO_JUN_XIAN is not authorized to perform: redshift:DescribeClusterSubnetGroups on resource: arn:aws:redshift:us-east-1:880387018372:subnetgroup:* because no identity-based policy allows the redshift:DescribeClusterSubnetGroups action
      AccessDenied
      User: arn:aws:sts::880387018372:assumed-role/voclabs/user2209860=KEANE_LOO_JUN_XIAN is not authorized to perform: redshift:DescribeEvents on resource: arn:aws:redshift:us-east-1:880387018372:event:* because no identity-based policy allows the redshift:DescribeEvents action
      AccessDenied
      User: arn:aws:sts::880387018372:assumed-role/voclabs/user2209860=KEANE_LOO_JUN_XIAN is not authorized to perform: redshift:DescribeClusters on resource: arn:aws:redshift:us-east-1:880387018372:cluster:* because no identity-based policy allows the redshift:DescribeClusters action

    • @tusharmalhan2206
      @tusharmalhan2206 Před rokem +1

      @@cloudquicklabs Hi , its because in the lamnda code , we did not mention the key "NewImage" which is the result of the erorr, cause in ur input json too , it requires a key which further will extract the ID , name, phone number .. etc ...