Install Apache PySpark on Windows PC | Apache Spark Installation Guide

Sdílet
Vložit
  • čas přidán 6. 02. 2023
  • In this lecture, we're going to setup Apache Spark (PySpark) on Windows PC where we have installed JDK, Python, Hadoop and Apache Spark. Please find the below installation links/steps:
    PySpark installation steps on MAC: sparkbyexamples.com/pyspark/h...
    Apache Spark Installation links:
    1. Download JDK: www.oracle.com/in/java/techno...
    2. Download Python: www.python.org/downloads/
    3. Download Spark: spark.apache.org/downloads.html
    Winutils repo link: github.com/steveloughran/winu...
    Environment Variables:
    HADOOP_HOME- C:\hadoop
    JAVA_HOME- C:\java\jdk
    SPARK_HOME- C:\spark\spark-3.3.1-bin-hadoop2
    PYTHONPATH- %SPARK_HOME%\python;%SPARK_HOME%\python\lib\py4j-0.10.9-src;%PYTHONPATH%
    Required Paths:
    %SPARK_HOME%\bin
    %HADOOP_HOME%\bin
    %JAVA_HOME%\bin
    Also check out our full Apache Hadoop course:
    • Big Data Hadoop Full C...
    ----------------------------------------------------------------------------------------------------------------------
    Apache Spark Installation links:
    1. Download JDK: www.oracle.com/in/java/techno...
    2. Download Python: www.python.org/downloads/
    3. Download Spark: spark.apache.org/downloads.html
    -------------------------------------------------------------------------------------------------------------
    Also check out similar informative videos in the field of cloud computing:
    What is Big Data: • What is Big Data? | Bi...
    How Cloud Computing changed the world: • How Cloud Computing ch...
    What is Cloud? • What is Cloud Computing?
    Top 10 facts about Cloud Computing that will blow your mind! • Top 10 facts about Clo...
    Audience
    This tutorial has been prepared for professionals/students aspiring to learn deep knowledge of Big Data Analytics using Apache Spark and become a Spark Developer and Data Engineer roles. In addition, it would be useful for Analytics Professionals and ETL developers as well.
    Prerequisites
    Before proceeding with this full course, it is good to have prior exposure to Python programming, database concepts, and any of the Linux operating system flavors.
    -----------------------------------------------------------------------------------------------------------------------
    Check out our full course topic wise playlist on some of the most popular technologies:
    SQL Full Course Playlist-
    • SQL Full Course
    PYTHON Full Course Playlist-
    • Python Full Course
    Data Warehouse Playlist-
    • Data Warehouse Full Co...
    Unix Shell Scripting Full Course Playlist-
    • Unix Shell Scripting F...
    -----------------------------------------------------------------------------------------------------------------------Don't forget to like and follow us on our social media accounts:
    Facebook-
    / ampcode
    Instagram-
    / ampcode_tutorials
    Twitter-
    / ampcodetutorial
    Tumblr-
    ampcode.tumblr.com
    -----------------------------------------------------------------------------------------------------------------------
    Channel Description-
    AmpCode provides you e-learning platform with a mission of making education accessible to every student. AmpCode will provide you tutorials, full courses of some of the best technologies in the world today. By subscribing to this channel, you will never miss out on high quality videos on trending topics in the areas of Big Data & Hadoop, DevOps, Machine Learning, Artificial Intelligence, Angular, Data Science, Apache Spark, Python, Selenium, Tableau, AWS , Digital Marketing and many more.
    #pyspark #bigdata #datascience #dataanalytics #datascientist #spark #dataengineering #apachespark

Komentáře • 375

  • @ipheiman3658
    @ipheiman3658 Před rokem +3

    This worked so well for me :-) The pace is great and your explanations are clear. I am so glad i came across this, thanks a million! 😄 I have subscribed to your channel!!

  • @yashusachdeva
    @yashusachdeva Před 5 měsíci

    It worked, my friend. The instructions were concise and straightforward.

  • @sisterkeys
    @sisterkeys Před 10 měsíci +3

    What I was doing in 2 days, you narrowed to 30 mins!! Thank you!!

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @eloscarc5782
    @eloscarc5782 Před 3 měsíci

    Your video helped me understand it better than other videos, now the other videos make sense. This was not as convoluted as I thought.

  • @susmayonzon9198
    @susmayonzon9198 Před rokem +2

    Excellent! Thank you for making this helpful lecture! You relieved my headache, and I did not give up.

    • @ampcode
      @ampcode  Před rokem

      Thank you so much!

    • @moathmtour1798
      @moathmtour1798 Před rokem +1

      hey , which version of hadoop did you install because the 2.7 wasn't available

  • @neeleshgaikwad6387
    @neeleshgaikwad6387 Před rokem +2

    Very helpful video. Just by following the steps you mentioned I could run the spark on my windows laptop. Thanks a lot for making this video!!

    • @ampcode
      @ampcode  Před rokem

      Thank you so much!😊

    • @iniyaninba489
      @iniyaninba489 Před 8 měsíci

      @@ampcode bro I followed every step you said, but in CMD when I gave "spark-shell", it displayed " 'spark-shell' is not recognized as an internal or external command,
      operable program or batch file." Do you know how to solve this?

    • @sssssshreyas
      @sssssshreyas Před měsícem

      @@iniyaninba489 add same path in User Variables Path also, just like how u added in System Variables Path

  • @ragisatyasai2469
    @ragisatyasai2469 Před rokem +1

    Thank for sharing this. Beautifully explained.

  • @alulatafere6008
    @alulatafere6008 Před měsícem

    Thank you! It is clear and much helpful!! from Ethiopia

  • @nedvy
    @nedvy Před rokem +1

    Great video! It helped me a lot. Thank you ❤

  • @cloudandsqlwithpython
    @cloudandsqlwithpython Před 11 měsíci +1

    Great ! got SPARK working on Windows 10 -- Good work !

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @saswatarakshit9488
    @saswatarakshit9488 Před 10 měsíci

    Great Video, awesome comments for fixing issues

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @HamsiniRamesh-ig6ih
    @HamsiniRamesh-ig6ih Před 3 měsíci

    This video was great! Thanks a lot

  • @juanmiguelvargascortes9933
    @juanmiguelvargascortes9933 Před 11 měsíci

    Excellent video!!! Thanks for your help!!!

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @joshizic6917
    @joshizic6917 Před 8 měsíci +6

    how is your spark shell running from your users directory?
    its not running for me

  • @pratikshyapriyadarshini4677
    @pratikshyapriyadarshini4677 Před 5 měsíci

    Very Helpful.. Thankyou

  • @veerabadrappas3158
    @veerabadrappas3158 Před rokem +1

    Excellent Video.., Sincere Thank You

  • @chrominux5272
    @chrominux5272 Před 5 měsíci

    Very useful, thanks :D

  • @davidk7212
    @davidk7212 Před rokem +1

    Very helpful, thank you.

  • @user-tr9pz1je7g
    @user-tr9pz1je7g Před rokem

    Very helpful, thanks!

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @indianintrovert281
    @indianintrovert281 Před 2 měsíci +16

    Those who are facing problems like 'spark-shell' is not recognized as an internal or external command
    On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too)
    And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)
    If it worked, like this so that more people benefit from this

  • @ramnisanthsimhadri3161
    @ramnisanthsimhadri3161 Před 2 měsíci +3

    I am not able to find the package type: pre-build for Apache Hadoop 2.7 in the drop-down. FYI - my spark release versions that i can see in the spark releases are 3.4.3 and 3.5.1.

  • @sanchitabhattacharya353
    @sanchitabhattacharya353 Před 4 měsíci +1

    while launching the spark-shell getting the following error, any idea??
    WARN jline: Failed to load history
    java.nio.file.AccessDeniedException: C:\Users\sanch\.scala_history_jline3

  • @gosmart_always
    @gosmart_always Před 9 měsíci

    Every now and then we receive alert from Oracle to upgrade JDK. Do we need to upgrade our JDK version? If we upgrade, will it impact running of spark.

  • @metaviation
    @metaviation Před rokem +1

    very clear one thank you

  • @Saravanan_G_Official
    @Saravanan_G_Official Před 2 měsíci +2

    is there any thing wrong with the latest version of the python and spark 3.3.1 ?
    i am still getting the error

  • @danieljosephs
    @danieljosephs Před 5 měsíci

    Very helpful video

  • @user-vq4oz9oc5o
    @user-vq4oz9oc5o Před rokem

    Brilliant, Thanks a ton

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @pooja1899
    @pooja1899 Před rokem +1

    Thank you for sharing this video

  • @prashanthnm3406
    @prashanthnm3406 Před měsícem

    Thanks bro fixed it after struggling for 2 days 2 nights 2hours 9mins.

    • @nickcheruiyot9069
      @nickcheruiyot9069 Před 27 dny

      Hello, I have been trying to install it for some days too, I keep getting an error when I try to run the spark shell command is not recognized any suggestions?

  • @somanathking4694
    @somanathking4694 Před 3 měsíci

    This works as smooth as butter. Be patient that's it! Once set up done, no looking back.

    • @SUDARSANCHAKRADHARAkula
      @SUDARSANCHAKRADHARAkula Před 2 měsíci

      Bro, which version of spark & winutils you've downloaded. I took 3.5.1 and hadoop-3.0.0/bin/winutils but not worked

    • @meriemmouzai2147
      @meriemmouzai2147 Před 2 měsíci

      @@SUDARSANCHAKRADHARAkula same for me!

  • @sicelovilane5391
    @sicelovilane5391 Před rokem +1

    You are the best. Thanks!

  • @Adhikash015
    @Adhikash015 Před rokem +1

    Bhai, bro, Brother, Thank you so much for this video

  • @user-vb7im1jb1b
    @user-vb7im1jb1b Před 11 měsíci

    Thanks for this video. For learning purposes on my own computer, do I need to install apache.spark (spark-3.4.1-bin-hadoop3.tgz) to be able to run spark scripts/notebooks, or just pip install pyspark on my python environment?

    • @practicemail3227
      @practicemail3227 Před 2 měsíci

      Hi, I'm in the same boat, can you tell me what did you do. I'm also learning currently and have no idea.

  • @nikhilupmanyu8804
    @nikhilupmanyu8804 Před 5 měsíci

    Hi, Thanks for the steps. I am unable to see Web UI after installing pyspark. It gives This URL can't be reached. Kindly help

  • @shankarikarunamoorthy4391
    @shankarikarunamoorthy4391 Před měsícem

    sir, spark version is available with Hadoop 3.0 only. Spark-shell not recognized as internal or external command. Please do help.

  • @NileshKumar9975
    @NileshKumar9975 Před rokem +1

    very helpful video

  • @rayudusunkavalli2318
    @rayudusunkavalli2318 Před 5 měsíci +4

    i did every step you have said, but still spark is not working

  • @Kartik-vy1rh
    @Kartik-vy1rh Před rokem +1

    Video is very helpful. Thanks for sharing

  • @manasa3097
    @manasa3097 Před 11 měsíci

    This really worked for me...I have completed spark installation but when I'm trying to quit from the scala the cmd is not working and it's showing the error as 'not found'.. can you please help me on this...

  • @ashwinnair2325
    @ashwinnair2325 Před měsícem

    thanks a lot pyspark is opening but when executing df.show() command on a dataframe i get below error
    Cannot run program "python3": CreateProcess error=2, The system cannot find the file specified
    is there any way to rectify it

  • @edu_tech7594
    @edu_tech7594 Před 11 měsíci +1

    my Apache hadoop which i downloaded previously is version 3.3.4 eventhough i should choose pre-built for Apache Hadoop 2.7?

    • @sriram_L
      @sriram_L Před 10 měsíci

      Same doubt bro.
      Did u install now

  • @ed_oliveira
    @ed_oliveira Před 6 měsíci +1

    Thank you!
    👍

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @pulkitdikshit9474
    @pulkitdikshit9474 Před 3 měsíci

    hi i installed but when I restarted my pc it is no longer running from cmd? what might be the issue?

  • @Manapoker1
    @Manapoker1 Před rokem +1

    the only tutorial that worked for me.....

  • @theeewebdev
    @theeewebdev Před 8 měsíci

    i have fallowed all these steps and installed those 3 and created paths too, but when i go to check in the command prompt... its not working.. error came... can anyone help me please to correct this

  • @amitkumarpatel7762
    @amitkumarpatel7762 Před 4 měsíci +2

    I have followed whole instruction but when I am running spark -shell is not recognised

  • @sibrajbanerjee6297
    @sibrajbanerjee6297 Před 29 dny +1

    I am getting a message of 'spark-version' is not recognized as an internal or external command,
    operable program or batch file. This is after setting up the path in environment variables for PYSPARK_HOME.

  • @gangadharg7
    @gangadharg7 Před 5 měsíci

    This is perfectly worked for me. Thank you very much.

  • @theeewebdev
    @theeewebdev Před 8 měsíci

    and when downloading the spark a set of files came to download not the tar file

  • @AmreenKhan-dd3lf
    @AmreenKhan-dd3lf Před 6 dny

    Apache 2.7 option not available during spark download. Can we choose Apache Hadoop 3.3 and later ( scala2.13) as package type during download

  • @juliocesarcabanillas2433
    @juliocesarcabanillas2433 Před 11 měsíci

    Love you dude

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @matheswaranp9574
    @matheswaranp9574 Před měsícem

    Thanks a Lot.

  • @nftmobilegameshindi8392
    @nftmobilegameshindi8392 Před 3 měsíci +4

    spark shell not working

  • @jeremychaves2269
    @jeremychaves2269 Před rokem

    thanks dude!

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @vennilagunasekhar5460

    Thank you so much

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @user-uc7qf6uf5c
    @user-uc7qf6uf5c Před 8 měsíci +1

    Great thanks

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @basanthaider3238
    @basanthaider3238 Před 8 měsíci

    I have an issue with the pyspark it's not working and it's related to java class I can't realy understant what is wrong ???

  • @bramhanaskari3152
    @bramhanaskari3152 Před rokem +1

    you haven't give solution for that warn procfsMetricsGetter exception is there any solution for that ?

    • @ampcode
      @ampcode  Před rokem

      Sorry for late response. This could happen in windows only and can be safely ignored. Could you please confirm if you’re able to kick off spark-shell and pyspark?

  • @prateektripathi3834
    @prateektripathi3834 Před 7 měsíci +4

    Did Everything as per the video, still getting this error : The system cannot find the path specified. on using spark-shell

    • @srishtimadaan03
      @srishtimadaan03 Před 2 měsíci

      On command prompt write 'cd C:\Spark\spark-3.5.1-bin-hadoop3\bin' use your own spark filepath(include bin too)
      And then write spark-shell or pyspark (It finally worked for me, hope it works for you too)

  • @Manoj-ed3lj
    @Manoj-ed3lj Před měsícem

    installed successfully but when i am checking hadoop version, i am getting an like hadoop is not recognized as internal or external command

  • @Jerriehomie
    @Jerriehomie Před rokem +2

    Getthing this error: WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped. People have mentioned to use python fodler path which I have as you have mentioned but still.

    • @bukunmiadebanjo9684
      @bukunmiadebanjo9684 Před rokem +1

      I found a fix for this. Change your python path to that of anaconda(within the environment variable section of this video) and use your anaconda command prompt instead. No errors will pop up again.

    • @ampcode
      @ampcode  Před rokem

      Sorry for late response. Could you please let me know if you are still facing this issue and also confirm if you’re able to open spark-shell?

    • @shivalipurwar7205
      @shivalipurwar7205 Před rokem +1

      @@bukunmiadebanjo9684 Hi Adebanjo, my error got resolved with you solution. Thanks for your help!

  • @Cardinal_Seen
    @Cardinal_Seen Před 10 měsíci

    Thank you. :D

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @manikantaperumalla2197
    @manikantaperumalla2197 Před měsícem

    java,python and spark should be in same directory?

  • @user-oy8gu5cs9j
    @user-oy8gu5cs9j Před rokem +1

    ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
    I am getting above error while running spark or pyspark session.
    I have ensured that winutils file is present in C:\hadoop\bin

    • @ampcode
      @ampcode  Před rokem

      Could you please let me know if your all the env variables are set properly?

  • @moathmtour1798
    @moathmtour1798 Před rokem +1

    hello, which Hadoop Version should i install since the 2.7 is not available anymore ? thanks in advance

    • @ampcode
      @ampcode  Před rokem

      You can go ahead and install the latest one as well. no issues!

    • @venkatramnagarajan2302
      @venkatramnagarajan2302 Před 9 měsíci

      @@ampcode Will the utils file still be 2.7 version ?

  • @nagarajgotur
    @nagarajgotur Před rokem +2

    spark-shell is working for me, pyspark is not working from home directory, getting error 'C:\Users\Sana>pyspark
    '#' is not recognized as an internal or external command,
    operable program or batch file.'
    But when I go to python path and run the cmd pyspark is working. I have setup the SPARK_HOME and PYSPARK_HOME environment variables. Could you please help me. Thanks

    • @ampcode
      @ampcode  Před rokem

      Sorry for late response. Could you please also set PYSPARK_HOME as well to your python.exe path. I hope this will solve the issue😅👍

    • @bintujose1981
      @bintujose1981 Před rokem

      @@ampcode nope. Same error

  • @nagalakshmip8725
    @nagalakshmip8725 Před 2 měsíci

    I'm getting spark- shell is not recognised as an internal or external command, operable program or batch file

  • @ganeshkalaivani6250
    @ganeshkalaivani6250 Před rokem +1

    can any one please help...last two days tried to install spark and give correct variable path but still getting system path not speicifed

    • @ampcode
      @ampcode  Před rokem

      Sorry for late reply. Could you please check if your spark-shell is running properly from the bin folder. If yes I guess there are some issues with your env variables only. Please let me know.

  • @rakeshkandula2318
    @rakeshkandula2318 Před 7 měsíci +2

    Hi, i followed exact steps (installed spark 3.2.4 as that is the only version available for hadoop 2.7). Spark-shell command is working but pyspark is thrwing errors.
    if anyone has fix to this please help me.
    Thanks

    • @thedataguyfromB
      @thedataguyfromB Před 7 měsíci

      Step by step solution
      czcams.com/video/jO9wZGEsPRo/video.htmlsi=aaITbbN7ggnczQTc

  • @harshithareddy5087
    @harshithareddy5087 Před 6 měsíci +3

    I don't have the option for Hadoop 2.7 what to choose now???

    • @LLM_np
      @LLM_np Před 5 měsíci

      did you get any solution?
      please let me know

  • @badnaambalak364
    @badnaambalak364 Před 6 měsíci +1

    I followed the steps & Installed JDK 17, spark 3.5 and python 3.12 when I am trying to use map function I am getting an Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe Error please someone help me

  • @khushboojain3883
    @khushboojain3883 Před 11 měsíci +1

    Hi, I have installed Hadoop 3.3 (the lastest one) as 2.7 was not available. But while downloading winutils, we don't have for Hadoop 3.3 in repository. Where do i get it from?

    • @sriram_L
      @sriram_L Před 10 měsíci

      Same here.Did u get it now?

    • @khushboojain3883
      @khushboojain3883 Před 10 měsíci

      @@sriram_L yes, u can directly get it from google by simply mention the Hadoop version for which u want winutils. I hope this helps.

    • @hritwikbhaumik5622
      @hritwikbhaumik5622 Před 9 měsíci

      @@sriram_L it still not working for me though

  • @sanketraut8462
    @sanketraut8462 Před 5 měsíci

    how to set up com.jdbc.mysql.connector using jar file, actually I am getting the same error that its not found while working in pyspark.

  • @Karansingh-xw2ss
    @Karansingh-xw2ss Před 9 měsíci +2

    i'm facing this issue can anyone help me to fix this 'spark-shell' is not recognized as an internal or external command,
    operable program or batch file'.

    • @nikhilupmanyu8804
      @nikhilupmanyu8804 Před 5 měsíci

      Try to add direct path at System Environment. It will fix the issue

  • @abhinavtiwari6186
    @abhinavtiwari6186 Před rokem +1

    where is that git repository link? Its not there in the description box below

    • @ampcode
      @ampcode  Před rokem +1

      Extremely sorry for that. I have added it in the description as well as pasting it here.
      GitHUB: github.com/steveloughran/winutils
      Hope this is helpful! :)

  • @akira.19.9
    @akira.19.9 Před 10 měsíci

    muy util !!

    • @ampcode
      @ampcode  Před 6 měsíci

      Thank you so much! Subscribe for more content 😊

  • @saikrishnareddy3474
    @saikrishnareddy3474 Před 10 měsíci +2

    I’m little confused on how to setup the PYTHONHOME environment variable

    • @thedataguyfromB
      @thedataguyfromB Před 7 měsíci

      Step by step
      czcams.com/video/jO9wZGEsPRo/video.htmlsi=aaITbbN7ggnczQTc

  • @sriramsivaraman4100
    @sriramsivaraman4100 Před rokem +2

    Hello when I try to run the command spark_shell as a local user its not working (not recognized as an internal or external command) and it only works if I use it as an administratror. Can you please help me solve this? Thanks.

    • @ampcode
      @ampcode  Před rokem

      Sorry for late response. Could you please try once running the same command from the spark/bin directory and let me know. I guess there might be some issues with your environment vatiables🤔

    • @dishantgupta1489
      @dishantgupta1489 Před rokem

      @@ampcode followed each and every step of video still getting not recognised as an internal or external command error

    • @ayonbanerjee1969
      @ayonbanerjee1969 Před rokem

      ​@@dishantgupta1489 open fresh cmd prompt window and try after you save the environment variables

    • @obulureddy7519
      @obulureddy7519 Před rokem

      In Environment Variables you give the paths in Users variable Admin. NOT IN System variables

  • @antonstsezhkin6578
    @antonstsezhkin6578 Před rokem +6

    Excellent tutorial! I followed along and nothing worked in the end :)
    StackOverflow told me that "C:Windows\system32" is also required in the PATH variable for spark to work. I added it and spark started working.

  • @ismailcute1584
    @ismailcute1584 Před 5 měsíci +3

    Thank you so much for this video. Unfortunately, I couldn't complete this - getting this erros C:\Users\Ismahil>spark-shell
    'cmd' is not recognized as an internal or external command,
    operable program or batch file. please help

  • @syafiq3420
    @syafiq3420 Před rokem +1

    how did you download the apache spark in zipped file? mine was downloaded as tgz file

    • @ampcode
      @ampcode  Před rokem

      Sorry for late response. You’ll get both options on their official website. Could you please check if you are using the right link?

    • @georgematies2521
      @georgematies2521 Před rokem

      @@ampcode There is no way now to download the zip file, only tgz.

  • @Mralbersan
    @Mralbersan Před 2 měsíci

    I can't see Pre-Built for Apache Hadoop 2.7 on the spark website

    • @meriemmouzai2147
      @meriemmouzai2147 Před 2 měsíci

      same problem for me! I tried the "3.3 and later" version with the "winutils/hadoop-3.0.0/bin", but it didn't work

  • @anastariq1310
    @anastariq1310 Před rokem +1

    After entering pyspark in cmd it shows "The system cannot find the path specified. Files\Python310\python.exe was unexpected at this time" please help me resolve it

    • @mahamudullah_yt
      @mahamudullah_yt Před 11 měsíci

      i face the same problem. is there any solution

  • @rakeshd3250
    @rakeshd3250 Před 7 měsíci

    not working for me i set up everything except hadoop version came with 3.0

  • @user-gc6ku9mp3d
    @user-gc6ku9mp3d Před rokem +6

    Hi, I completed the process step by step and everything else is working but when I run 'spark-shell' , it shows - 'spark-shell' is not recognized as an internal or external command,
    operable program or batch file. Do you know what went wrong?

    • @viniciusfigueiredo6740
      @viniciusfigueiredo6740 Před rokem +1

      I'm having this same problem, the command only works if I run CMD as an administrator. Did you manage to solve it?

    • @hulkbaiyo8512
      @hulkbaiyo8512 Před 11 měsíci

      @@viniciusfigueiredo6740 same as you, run as administrator works

    • @shivamsrivastava4337
      @shivamsrivastava4337 Před 11 měsíci

      @@viniciusfigueiredo6740 same issue is happening with me

    • @RohitRajKodimala
      @RohitRajKodimala Před 11 měsíci

      @@viniciusfigueiredo6740same issue for me did u fix it?

    • @santaw
      @santaw Před 8 měsíci +1

      Anyone solved this?

  • @itsshehri
    @itsshehri Před rokem +1

    hey pyspark isnt working at my pc. I did everything how you asked. Can you help please

    • @ampcode
      @ampcode  Před rokem

      Sorry for late response. Could you please also set PYSPARK_HOME env variable to the python.exe path. I guess this’ll do the trick😅👍

  • @BasitAIi
    @BasitAIi Před 10 měsíci +3

    In cmd the comand spark-shell is running only under C:\Spark\spark-3.5.0-bin-hadoop3\bin directory not globally
    same for pyspark

    • @s_a_i5809
      @s_a_i5809 Před 9 měsíci +2

      yeah man , same for me.. did you found any fixes... if, let me know :)

    • @BasitAIi
      @BasitAIi Před 9 měsíci

      @@s_a_i5809 add your Environment variables under system variables not user variables.

    • @ankitgupta5446
      @ankitgupta5446 Před 7 měsíci

      100 % working solution
      czcams.com/video/jO9wZGEsPRo/video.htmlsi=lzXq4Ts7ywqG-vZg

    • @lucaswolff5504
      @lucaswolff5504 Před 3 měsíci

      I added C:\Program Files\spark\spark-3.5.1-bin-hadoop3\bin to the system variables and it worked

    • @BasitAIi
      @BasitAIi Před 3 měsíci

      @@lucaswolff5504 yes

  • @DevSharma_31
    @DevSharma_31 Před 11 měsíci

    I am getting this error while running spark-shell or pyspark "java.lang.IllegalAccessError: class org.apache.spark.storage.StorageUtils$ (in unnamed module @0x46fa7c39) cannot access class sun.nio.ch.DirectBuffer (in module java.base) because module java.base does not export sun.nio.ch to unnamed module @0x46fa7c39" I tried all version of java as well as spark, Please help

  • @varunkumar5942
    @varunkumar5942 Před rokem +1

    Hello bro all your steps worked perfectly
    But when i try create spark session in jupyter notebook its showing a error 'Java gateway process exited before sending its port number'
    Java home path is placed correctly

    • @ampcode
      @ampcode  Před rokem

      Sorry for late reply. Could you please let me know if you have JDK, Python installed on your PC and environment variables perfectly set. If yes, we can discuss around this to solve your issue. Please let me know.

    • @varunkumar5942
      @varunkumar5942 Před rokem

      Yes both are are installed and variables are set.
      i tried with java version 8 too as suggested buy someone but it didnt work.

    • @swaroop7021
      @swaroop7021 Před 11 měsíci

      @@ampcode did every step perfectly but ran a command on command prompt to check version of python and java got it correct but when i ran spark shell command its showing not recognizable

    • @nithishprabhu
      @nithishprabhu Před 8 měsíci

      ​ @varunkumar5942 @@ampcode did you figure out how to resolve this issue?

  • @Nathisri
    @Nathisri Před 8 měsíci +1

    I have some issues in launching python & pyspark. I need some help. Can you pls help me?

  • @syamprasad8295
    @syamprasad8295 Před 10 měsíci +1

    while selecting a package type for spark, Hadoop 2.7 is not available now. Only Hadoop 3.3 and later is available. And winutils 3.3 is not available at the link provided at the git. What to do now? can I download Hadoop 3.3 version and can proceed with winutils2.7 ? Pls help.. Thanks In Advacnce

    • @ShivamRai-xo8fu
      @ShivamRai-xo8fu Před 9 měsíci

      I got same issue

    • @ankitgupta5446
      @ankitgupta5446 Před 7 měsíci

      100 % working solution
      czcams.com/video/jO9wZGEsPRo/video.htmlsi=lzXq4Ts7ywqG-vZg

  • @viniciusfigueiredo6740
    @viniciusfigueiredo6740 Před rokem +1

    I followed the step by step and when I search for spark-shel at the command prompt I come across the message :( 'spark-shell' is not recognized as a built-in command or external, an operable program or a batch file). I installed windows on another HD and did everything right, there are more people with this problem, can you help us? I'm since January trying to use pyspark on windows

    • @letsexplorewithzak3614
      @letsexplorewithzak3614 Před rokem +1

      Need to edit bottom "add this to env var path"
      path >> C:\Spark\spark-3.3.1-bin-hadoop2\bin\

    • @kiranmore29
      @kiranmore29 Před 10 měsíci

      @@letsexplorewithzak3614 Thanks worked for me

    • @nayanagrawal9878
      @nayanagrawal9878 Před 8 měsíci

      Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.

    • @jayakrishnayashwanth7358
      @jayakrishnayashwanth7358 Před 8 měsíci

      Even I'm facing the same issue ,can you tell in more detail like what to do add in system variables??As we already added Java , Hadoop, Spark and Pyspark_Home in the user varaibles as said in the video.@@nayanagrawal9878

    • @penninahgathu7956
      @penninahgathu7956 Před 5 měsíci

      @@nayanagrawal9878 thank you!!! I did this and it solved my problem

  • @James-br9cu
    @James-br9cu Před 5 měsíci

    Nice

  • @kchavan67
    @kchavan67 Před 8 měsíci +1

    Hi, following all the steps given in video, I am still getting error as "cannot recognize spark-shell as internal or external command" @Ampcode

    • @psychoticgoldphish5797
      @psychoticgoldphish5797 Před 7 měsíci

      I was having this issue as well, when I added the %SPARK_HOME%\bin, %HADOOP_HOME%\bin and %JAVA_HOME%\bin to the User variables (top box, in the video he shows doing system, bottom box) it worked. Good luck.

    • @thedataguyfromB
      @thedataguyfromB Před 7 měsíci

      Step by step spark + PySpark in pycharm solution video
      czcams.com/video/jO9wZGEsPRo/video.htmlsi=aaITbbN7ggnczQTc

  • @user-ef9vh7qz9h
    @user-ef9vh7qz9h Před rokem

    java.lang.IllegalAccessException: final field has no write access:
    I'm getting this error while running the code
    when I run the same code in another system it is getting executed.
    Any idea?

  • @shahrahul5872
    @shahrahul5872 Před rokem +1

    on apache spark's installation page, under choose a package type, the 2.7 version seem to not be any option anymore as on 04/28/2023. What to do?

    • @shahrahul5872
      @shahrahul5872 Před rokem +2

      I was able to get around this by copying manually the URL of the site you were opened up to after selecting the 2.7th version from the dropdown. Seems like they have archived it.

    • @ampcode
      @ampcode  Před rokem

      Sorry for late reply. I hope your issue is resolved. If not we can discuss further on it!

  • @ankushv2642
    @ankushv2642 Před 7 měsíci

    Did not work for me. At last when I typed the pyspark in command prompt, it did not worked.

  • @ganeshkalaivani6250
    @ganeshkalaivani6250 Před rokem +1

    FileNotFoundError: [WinError 2] The system cannot find the file specified getting this error even i have installed all required intalliation

    • @ampcode
      @ampcode  Před rokem

      Sorry for late reply. I hope your issue is resolved. If not we can have a connect and discuss further on it!

  • @laxman0457
    @laxman0457 Před 10 měsíci +2

    i have followed all your steps,still i'm facing an issue.
    'spark2-shell' is not recognized as an internal or external command

    • @nayanagrawal9878
      @nayanagrawal9878 Před 8 měsíci

      Do everything that he said but not in User Variables but in System variables. I was facing the same problem but then I did the same in system variables and my spark started running.

    • @thedataguyfromB
      @thedataguyfromB Před 7 měsíci

      Step by step spark + PySpark in pycharm solution video
      czcams.com/video/jO9wZGEsPRo/video.htmlsi=aaITbbN7ggnczQTc

  • @karthikeyinikarthikeyini380
    @karthikeyinikarthikeyini380 Před 9 měsíci +1

    hadoop 2.7 tar file is not available in the link

    • @ankitgupta5446
      @ankitgupta5446 Před 7 měsíci

      100 % working solution
      czcams.com/video/jO9wZGEsPRo/video.htmlsi=lzXq4Ts7ywqG-vZg