10 Spark Streaming Read from Kafka | Real time streaming from Kafka

Sdílet
Vložit
  • čas přidán 27. 07. 2024
  • Video covers - How to read streaming data from Kafka? How to read real time data from Kafka? How to use Kafka as a Source for Real time Spark Streaming?
    Chapters:
    00:00 - Introduction
    00:34 - Example Device JSON Payload
    01:09 - Import Kafka JAR Libraries
    03:08 - Read from Kafka Source
    06:27 - Extract JSON data from column using from_json
    URLs:
    Github Code - github.com/subhamkharwal/spar...
    Device data samples - github.com/subhamkharwal/spar...
    To setup Kafka with Spark in Local environment - • 03 Spark Streaming Loc...
    JSON data Flattening and reading from files - • 07 Spark Streaming Rea...
    Keywords: Apache Spark, PySpark, Spark Streaming, Real-time Data Processing, Data Streaming, Big Data Analytics, PySpark Tutorial, Apache Spark Tutorial, Streaming Analytics, Spark Structured Streaming, PySpark Streaming, Big Data Processing.
    New video in every 3 days ❤️
    Make sure to like and Subscribe.
  • Věda a technologie

Komentáře • 21

  • @VenkatesanVenkat-fd4hg
    @VenkatesanVenkat-fd4hg Před 5 měsíci

    Superb playlist

  • @burak3941
    @burak3941 Před 3 měsíci

    Thanks for sharing this fantastic list.
    I only realized something: you are using it to run Kafka brokers port:9092, but you could also retrieve the information from 29092. What did I miss :)
    Many thanks

    • @easewithdata
      @easewithdata  Před 3 měsíci

      Thanks. For external application in docker 9092 port is open but for internal containers we can use 29092. But both are correct.

  • @worldthroughmyvisor
    @worldthroughmyvisor Před 5 měsíci

    excellent tutorial
    Here are the questions that i faced in hewlett packard interview on spark streaming application, probably you can create a video on these too
    1. Suppose you read a message from kafka and our application fails to process it, how do you ensure that the same message is processed again successfully. What he was trying to refer was that out of 1000's of messages being read from kafka, how do we ensure we can process the ones successfully that failed, as our application will continue to read the new messages that are coming in, and this unsuccessful message was read once and it failed.
    2. How do you ensure parallelism with a kafka producer and a spark streaming read API, there will be 100's of messages incoming at any given point and naturally our spark application cannot process them one at a time, instead spark can process them parallely by reading from multiple partitions. How do you configure your app to do that.
    I think these questions can give us a much better idea as to how a prod spec spark streaming application will work. Appreciate if you can create some content on these questions. Thanks

    • @easewithdata
      @easewithdata  Před 4 měsíci

      Thank you for posting the questions. Answer for Question 2 can be found in today's video.

    • @worldthroughmyvisor
      @worldthroughmyvisor Před 4 měsíci

      thank you @@easewithdata

  • @nikschuetz4112
    @nikschuetz4112 Před měsícem

    if you can cast it to a string, why not able to cast it to json, or map it out and use a json parser?

    • @easewithdata
      @easewithdata  Před měsícem

      Yes you can definitely cast as per your wish.

  • @unknown_fact1586
    @unknown_fact1586 Před 2 měsíci

    I can't see output on docker console. but started the consumer console and can see the data. thanks for the video. can you also make a video on how to read streaming jobs details on spark UI?

    • @easewithdata
      @easewithdata  Před 2 měsíci

      Please make sure to share with your network over LinkedIn

  • @somyaranjankar5804
    @somyaranjankar5804 Před 3 měsíci

    getting below error message while reading from kafka
    Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".
    How to fix this

    • @easewithdata
      @easewithdata  Před 3 měsíci

      Did you import the Jar file for Kafka in the Spark Session ??

    • @somyaranjankar5804
      @somyaranjankar5804 Před 3 měsíci

      @@easewithdata Yes i did , here is the sample
      from pyspark.sql import SparkSession
      spark = (
      SparkSession
      .builder
      .appName("Streaming from Kafka")
      .config("spark.streaming.stopGracefullyOnShutdown", True)
      .config('spark.jars.packages', 'org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0')
      .config("spark.sql.shuffle.partitions", 3)
      .master("local[*]")
      .getOrCreate()
      )

    • @somyaranjankar5804
      @somyaranjankar5804 Před 3 měsíci

      @@easewithdata Yes i did
      from pyspark.sql import SparkSession
      spark = (
      SparkSession
      .builder
      .appName("Streaming from Kafka")
      .config("spark.streaming.stopGracefullyOnShutdown", True)
      .config('spark.jars.packages', 'org.apache.spark:spark-sql-kafka-0-10_2.12:3.3.0')
      .config("spark.sql.shuffle.partitions", 3)
      .master("local[*]")
      .getOrCreate()
      )

    • @easewithdata
      @easewithdata  Před 3 měsíci

      No comments are deleted. Once your spark session os created, check in the Spark UI environment section if the kafka jar is downloaded and attached to cluster

  • @prathamesh_a_k
    @prathamesh_a_k Před 4 měsíci

    getting below error message while reading from kafka
    Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".

    • @easewithdata
      @easewithdata  Před 4 měsíci

      Hello Prathmesh,
      Did you import the jar to support kafka ?

    • @somyaranjankar5804
      @somyaranjankar5804 Před 3 měsíci

      import means separately needs to be import ?

    • @easewithdata
      @easewithdata  Před 3 měsíci

      You need to import kafka jar file in the SparkSession

    • @hamedtamadon6520
      @hamedtamadon6520 Před měsícem

      it's sometimes occur because of mismatch between downloaded jar file and spark version . you should find appropriate version of jar file which is compatible with your spark version