Principal Component Analysis in R: Example with Predictive Model & Biplot Interpretation

Sdílet
Vložit
  • čas přidán 1. 06. 2024
  • Provides steps for carrying out principal component analysis in r and use of principal components for developing a predictive model.
    R code: github.com/bkrai/Top-10-Machi...
    00:00 Introduction - Principal Component Analysis in R
    00:05 Iris Data
    01:16 Partition Data
    02:06 Scatter Plots Correlation Coefficients
    05:02 Principal Component Analysis
    10:17 Orthogonality of Principal Component
    11:38 Bi - Plot interpretation
    18:31 Prediction with Principal Components
    19:50 Multinomial Logistic Regression Model with First Two PCs
    21:07 Confusion Matrix & Misclassification Error ‘Training Data’
    22:25 Confusion Matrix & Misclassification Error ‘Testing Data’
    22:48 PCA Advantage
    23:24 PCA Disadvantage
    What is Principal Component Analysis?
    - Principal Component Analysis (PCA) is a statistical technique widely used for dimensionality reduction in data analysis and visualization. It transforms a dataset consisting of possibly correlated variables into a set of linearly uncorrelated variables known as principal components. These components are ordered so that the first few retain most of the variation present in the original dataset. This makes PCA a powerful tool for extracting the most important features from a dataset, simplifying the complexity in high-dimensional data while preserving as much information as possible. The process involves calculating the eigenvalues and eigenvectors of the data's covariance matrix, which help in identifying the directions of maximum variance in high-dimensional data. By projecting the original data onto these new axes, PCA facilitates data compression, noise reduction, and the identification of underlying patterns, making it invaluable for exploratory data analysis, predictive modeling, and visualizing genetic data, among other applications.
    principal component analysis is an important statistical tool related to analyzing big data or working in data science field.
    Machine Learning videos: goo.gl/WHHqWP
    Becoming Data Scientist: goo.gl/JWyyQc
    Introductory R Videos: goo.gl/NZ55SJ
    Deep Learning with TensorFlow: goo.gl/5VtSuC
    Image Analysis & Classification: goo.gl/Md3fMi
    Text mining: goo.gl/7FJGmd
    Data Visualization: goo.gl/Q7Q2A8
    Playlist: goo.gl/iwbhnE
    R is a free software environment for statistical computing and graphics, and is widely used by both academia and industry. R software works on both Windows and Mac-OS. It was ranked no. 1 in a KDnuggets poll on top languages for analytics, data mining, and data science. RStudio is a user friendly environment for R that has become popular.

Komentáře • 356

  • @flamboyantperson5936
    @flamboyantperson5936 Před 7 lety +2

    This is great. I was looking for PCA and you have done it. Many many thanks to you sir.

  • @modelmichael1972
    @modelmichael1972 Před 7 lety +6

    Awesome video. Every R enthusiast needs to keep an eye on your channel. Thank you and keep up with great work!

  • @jonm7272
    @jonm7272 Před 3 lety +3

    Thank you for this extremely helpful, and easily understood tutorial, particularly the clear interpretation of the Bi-Plot. Much appreciated

    • @bkrai
      @bkrai  Před 3 lety

      You're very welcome!

  • @galk32
    @galk32 Před 5 lety +1

    One of the best PCA videos i ever seen, Thank you Mr. Rai.

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @theeoddname
    @theeoddname Před 6 lety +2

    Great Video! Excellent walk though on PCA and how it can be useful for actual classifications. Thanks for the upload.

    • @bkrai
      @bkrai  Před 6 lety

      +theeoddname thanks for the feedback!

  • @ramram2utube
    @ramram2utube Před 6 měsíci +1

    I revisited your video for interpretation of biplots in PCA. Many thanks.

    • @bkrai
      @bkrai  Před 6 měsíci

      You are welcome!

  • @philipabraham5600
    @philipabraham5600 Před 6 lety +2

    This is the best PCA explanation I have seen anywhere so far. Thank you for sharing your knowledge.

    • @bkrai
      @bkrai  Před 6 lety

      Thanks for the feedback!

  • @jonimatix
    @jonimatix Před 7 lety +2

    I really like your explanations in your videos. Keep them coming! Thanks

    • @bkrai
      @bkrai  Před 7 lety

      Thanks for the feedback!

  • @jacklu1611
    @jacklu1611 Před 2 lety +1

    The Bio-plot was explained very clearly, thank you Dr. Rai!

    • @bkrai
      @bkrai  Před 2 lety

      You are welcome!

  • @Rutvi_patel_1111
    @Rutvi_patel_1111 Před 7 lety +2

    Fabulous work in PCA ! Keep it up

    • @bkrai
      @bkrai  Před 7 lety +1

      Thanks for the feedback!

  • @Dejia_Space
    @Dejia_Space Před 4 lety +1

    Thank you!!Best explanation on Biplot on CZcams .

    • @bkrai
      @bkrai  Před 4 lety

      Glad it was helpful!

  • @srujananeelam6547
    @srujananeelam6547 Před 4 lety +1

    Fantastic session.Perfectly understood Biplot

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

  • @siddharthadas86
    @siddharthadas86 Před 6 lety +2

    Seriously awesome explanations! Thank you again.

  • @eldrigeampong8573
    @eldrigeampong8573 Před 4 lety +1

    Thank you so much Dr. Rai. Detailed teaching

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

  • @saurabhkhodake
    @saurabhkhodake Před 6 lety +2

    This video is worth its weight in gold

  • @NIKHILESHMNAIK
    @NIKHILESHMNAIK Před 4 lety +1

    You are too good sir. An absolute treat for ML enthusiasts.

    • @bkrai
      @bkrai  Před 4 lety +1

      Thanks for your comments!

  • @bucklasek1
    @bucklasek1 Před 2 lety +1

    Thanks for the video! It helped me a lot doing the forecasting for future values using PCA.

    • @bkrai
      @bkrai  Před 2 lety

      Very welcome!

  • @upskillwithchetan
    @upskillwithchetan Před 4 lety +2

    Really really great explanation sir, Thank you so much for making it very simple

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

  • @donne4real
    @donne4real Před 4 lety +1

    Wonderful job explaining the material.

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for your comments and finding it useful!

  • @nyatonkitnya4267
    @nyatonkitnya4267 Před 3 lety +1

    one really good video i have found. After watching few of your video now your videos are becoming a "turn to" when require. thanks

    • @bkrai
      @bkrai  Před 3 lety

      Glad to hear that!

  • @shawnmckenzie8699
    @shawnmckenzie8699 Před 4 lety +1

    To install ggbiplot, the code is now (17, Jan, 2020):
    library(devtools)
    install_github("vqv/ggbiplot")
    source: github.com/vqv/ggbiplot
    Excellent video and well explained these concepts. Thanks.

    • @bkrai
      @bkrai  Před 4 lety +1

      Thanks for the update!

  • @LlamaFina
    @LlamaFina Před 5 lety +1

    Great video! Thanks for sharing your knowledge.

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @ramram2utube
    @ramram2utube Před 2 lety +1

    Thanks a lot Sir for your nice presentation. You saved my time. Earlier I used your R codes on Kohonen NN and now for PCA for my training lectures. Your explanation is so lucid. I appreciate your noble service of sharing knowledge

    • @bkrai
      @bkrai  Před 2 lety

      You are most welcome!

  • @samdavepollard
    @samdavepollard Před 7 lety +2

    Thank You - this was extremely useful.
    Very nice channel you have here - easy sub.

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for comments!

  • @kashgarinn
    @kashgarinn Před 5 lety +1

    Great video, thanks for uploading.

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @sebvangeli
    @sebvangeli Před 7 lety +2

    Great work! Thank you

  • @asiangg
    @asiangg Před 6 lety +2

    Thank you. Learned a lot from your channel

  • @andreafiore8373
    @andreafiore8373 Před 3 lety +1

    Thank you, this video will be really helpful to complete my thesis :)

  • @ConeliusC33
    @ConeliusC33 Před 6 lety +3

    Your videos have been constant companions during the last months of my master thesis. It seemed as if every time I had to switch to another analysis technique you were allready waiting here. So thank you a lot for your guidance and clear explanations!
    The only thing I would appreciate would be if you could provide the basic R scripts. Even though the copying process might help with understanding each command due to step by step application, to type text of a tiny youtube screen shown in one half of my monitor to r studio in the other half is troublesome. Thanks!

    • @bkrai
      @bkrai  Před 6 lety

      Thanks for the feedback!

  • @adityapatnaik7078
    @adityapatnaik7078 Před 6 lety +2

    too good!! plz make more such videos...plz!

    • @bkrai
      @bkrai  Před 6 lety

      Thanks for comments! You may find this useful too:
      czcams.com/play/PL34t5iLfZddu8M0jd7pjSVUjvjBOBdYZ1.html

  • @affyy04
    @affyy04 Před 2 lety +1

    Thank you for this amazing video. Better than my university lectures

    • @bkrai
      @bkrai  Před 2 lety

      Thanks for comments!

  • @siddharthabingi
    @siddharthabingi Před 7 lety +2

    Great lecture. Thanks.

  • @koparka112
    @koparka112 Před rokem

    Thank you for the material. It is very clear and actually very relevant to my current work.
    As I understand, the conversion of the data comprises addition products of notmalized predictors and loadings.
    Maybe you would have time to post a PLS regression video, please? The intriguing part is the explanation of the model itself

  • @rainbowdu509
    @rainbowdu509 Před 7 lety +2

    Thanks much appreciated..
    it worked

  • @abdullahmohammed8521
    @abdullahmohammed8521 Před 3 lety +1

    Many thanks for you Dr. God bless you.

    • @bkrai
      @bkrai  Před 3 lety

      You are most welcome!

  • @ashishsangwan5925
    @ashishsangwan5925 Před 5 lety +1

    Awesome Explanation

    • @bkrai
      @bkrai  Před 5 lety

      make sure you run following before installing:
      library(devtools)

  • @MinhasA
    @MinhasA Před 5 lety +1

    thank you for the amazing video!

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @abhiagni242
    @abhiagni242 Před 6 lety +2

    thanks for the video sir... helped a lot :)

    • @bkrai
      @bkrai  Před 6 lety

      Thanks for the feedback!

  • @jinnythomas9815
    @jinnythomas9815 Před 3 lety +1

    Great Explanation....

  • @bindumadhavi6259
    @bindumadhavi6259 Před 5 lety +1

    Sooo much love you sir.This helped me perfect

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments!

  • @saifsplaka
    @saifsplaka Před 7 lety +1

    Hi Sir,Could you take one session on SVD in R and also some theoretical explanation on it. I m finding it very difficult to understand it with most of the material available on the net.

  • @numitayogesh9280
    @numitayogesh9280 Před 6 lety +2

    great lecture..please share your thoughts on machine learning introduction too

    • @bkrai
      @bkrai  Před 6 lety

      For machine learning such random forest, neural networks, support vector machines, and extreme gradient boosting, you can refer to following:
      czcams.com/play/PL34t5iLfZddu8M0jd7pjSVUjvjBOBdYZ1.html

  • @anigov
    @anigov Před 6 lety

    Dear Sir..thanks for a wonderful video. I have some questions.
    1) At 20:18, why did u choose to reorder by setosa?
    2)Why did you choose to use data as trg and not training to build mymodel given that trg has predictions from training
    3) Can PCA be used to choose k in kmeans. If so, how to go about it?
    Thanks again.
    Regards

  • @katherinechau5594
    @katherinechau5594 Před 2 lety +1

    your videos are great :)

  • @karimkardous5555
    @karimkardous5555 Před 6 lety +1

    Hello great video as always! However one question i had (even though you warned against hard interpretability of results) relates to how to interpret the coefficients. If we look at the coefficient table and read the first line (after the intercept), does that mean that with every increase of Sepal.Length there is a log odd increase of 14.05 in the probability of categorizing the specie as Versicolor, relative to a Setosa? Thanks!

    • @bkrai
      @bkrai  Před 6 lety

      Your interpretation is correct.

    • @karimkardous5555
      @karimkardous5555 Před 6 lety +1

      Thank you! Keep up the good work! Your r videos are great!

    • @VenkateshDataScientistFarmer
      @VenkateshDataScientistFarmer Před 6 lety

      Sir ..ggbiplot is not installed hence cant work on this ..though i followed the video throughly

  • @sainandankandikattu9077
    @sainandankandikattu9077 Před 5 lety +1

    Awesome video! Could you plz add Partial least squares regression and principal components regression to your playlist! That would be of great help. Thanks in advance!

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for suggestions!

  • @PrimoSchnevi
    @PrimoSchnevi Před 3 lety +1

    Hello. I dont know anything about Principal Component Analysis in R: Example with Predictive Model & Biplot Interpretation and i will never need to since thats not in my line of work. I Appreciate your Intromusic though. You are a true champ Bharatendra and enrich this world with your presence. Also that intro music fucking slaps.

    • @bkrai
      @bkrai  Před 3 lety

      Thanks for comments!

  • @BbakMs
    @BbakMs Před 6 lety

    Sir, I am doing PCA analysis on DJ 30 Stocks and when I view pca$loadings for 30 variables, I noticed that some were not displayed. For example, Component 1 has -0.218 for Apple but then shows none for JPM, what does this mean?

  • @babadrammeh656
    @babadrammeh656 Před 2 lety +1

    R PCA IS VERY GOOD PACKAGE AND VERY HELPFULL

    • @bkrai
      @bkrai  Před 2 lety

      Yes, I agree!

  • @murilocintra180
    @murilocintra180 Před 6 lety +2

    Excellent demonstration of PCA, really helpful​. I just don't understand why in pc object, you use only training data instead of the entire data.

    • @bkrai
      @bkrai  Před 6 lety

      We only use training data so that we can later use test data to assess prediction model.

  • @maf4421
    @maf4421 Před 3 lety +1

    Thank you Dr. Bharatendra Rai for explaining PCA in detail. Can you please explain how to find weights of a variable by PCA for making a composite index? Is it rotation values that are for PC1, PC2, etc.? For example, if I have (I=w1*X+w2*Y+w3*Z) then how to find w1, w2, w3 by PCA.

    • @bkrai
      @bkrai  Před 2 lety

      For calculations you can refer to any textbook.

  • @mukeshchoudhary2842
    @mukeshchoudhary2842 Před 3 lety +1

    Great video.. What if we want to include factor-like "Control and Heat" for genotypes? Please suggest

    • @bkrai
      @bkrai  Před 2 lety

      It should work fine.

  • @WahranRai
    @WahranRai Před 2 lety +2

    19:12 It is only for purpose to show another way to get the principal component related to training because :
    identical(pc$x, predict(pc,training)) gives TRUE meaning that pc$x is same as predict(pc,training).

    • @bkrai
      @bkrai  Před 2 lety

      That's correct!

  • @tesfayewoldesemayate4506
    @tesfayewoldesemayate4506 Před 11 měsíci +1

    Nice presentation. when you are coding line 8 you said a sample of size 2, which size are you referring to? Thanks

    • @bkrai
      @bkrai  Před 11 měsíci

      For partitioning the data in to two, training and testing.

  • @anuraratnasiri5516
    @anuraratnasiri5516 Před 4 lety +1

    Thank you so... much!

    • @bkrai
      @bkrai  Před 4 lety +1

      Thanks for comments!

  • @wani212
    @wani212 Před 5 lety +1

    Thank you so much for this video. Will you please make a video on Broken-line regression in R?

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for the suggestion, I've added this to my list.

  • @md.tabibulislam9740
    @md.tabibulislam9740 Před 6 lety

    Firstly thank you for your helpful video. I have problem to add ellipse in the plot. I have 30 variables, first 29 is the numeric and last one is the factor variables. But i can,t plot the ellipse in the PCA plot. How can i solve this? Please help.

  • @nahalhoghooghi8575
    @nahalhoghooghi8575 Před 5 lety +1

    Great job, same as always. Can I use PCA for 2 or more categorical variables? Can I define those variables as 0 and 1 in PCA?

    • @bkrai
      @bkrai  Před 5 lety

      You can only use numeric variables. You can try using 0 and 1 and see if it works ok.

  • @johnstevenson6458
    @johnstevenson6458 Před 2 lety +1

    Great video. Do you have a suggested package for running binary logistic regression? From a brief scan of nnet it appears to only have arguments for multinomial response variables. Thank you.

    • @bkrai
      @bkrai  Před 2 lety

      You can refer to this:
      czcams.com/video/AVx7Wc1CQ7Y/video.html

    • @johnstevenson6458
      @johnstevenson6458 Před 2 lety +1

      @@bkrai sorry I was unclear in my message. I was hoping for a suggested package to run a binary logistic regression using PCA components as predictors - similar to what you have done here with multinomial. Any suggestions are welcome.

    • @bkrai
      @bkrai  Před 2 lety

      Yes, you can use the PCA components as predictors and run binary logistic regression as shown in the link that I sent earlier.

  • @SaranathenArunE
    @SaranathenArunE Před 5 lety +2

    brilliant sir..simple and sweet..thanks...nice music....if i have 10 DISCRETE VARIABLEShow to reduce to 2 or 3 components, please explain?

    • @bkrai
      @bkrai  Před 5 lety

      Thanks for comments! Note that this method is only for numeric variables.

  • @indranipal8131
    @indranipal8131 Před 4 lety +1

    Do you have a video on PCA for unsupervised learning via clustering and similarity ranking?

  • @garykuleck1320
    @garykuleck1320 Před 2 lety +1

    Dr. Rai,
    Thanks for this informative video. I am having a problem getting the predict function to work with the model created on the training dataset. I am getting two errors(paraphrased): 1. NAs not allowed in subscripted assignments; 2. newdata has 1900 rows but variables found have 8100 rows. I think it is looking for the same number of rows in the test dataset. Is there something I am doing wrong? Appreciate any feedback.

    • @bkrai
      @bkrai  Před 2 lety

      NAs occur when there is missing data. For handling missing values, refer to:
      czcams.com/video/An7nPLJ0fsg/video.html

  • @ainli4125466
    @ainli4125466 Před 2 lety

    Thank you for sharing, I get an error "Error in plot_label(p = p, data = plot.data, label = label, label.label = label.label, : Unsupported class: prcomp"", when I try to run the ggbiplot. Would you please advise how to fix it?

  • @nyadav378
    @nyadav378 Před 8 měsíci

    Very informative and nice presentation sir, sir can we estimate PCA for factor (for eg species) with unequal no. of observation.
    And we want to see the correlations in terms of each species viz for setosa or other two, how to do it? Please explain...Thank You

  • @abhishek894
    @abhishek894 Před 2 lety +1

    Thank you for this nice video Dr. Rai.
    I have a doubt. Why the predict function was used multiple times. After the prcomp function, all the data of Principle components were available in:
    pc$x.
    Why do we have to do:
    trg

    • @bkrai
      @bkrai  Před 2 lety

      In R you can get same thing in multiple ways. This is just for illustration.

    • @abhishek894
      @abhishek894 Před 2 lety +1

      @@bkrai Thank you Sir. That makes it clear.

    • @bkrai
      @bkrai  Před 2 lety

      @@abhishek894 You are welcome!

  • @inesceciliacardonadevoz5072

    Thanks for this video sir, very good class but I can´t get it. because Error ... could not find function "ggbiplot". Excuse me, which is your R version ?

    • @bkrai
      @bkrai  Před 3 lety

      Try this:
      library(devtools)
      install_github("vqv/ggbiplot")

  • @Pankajjadwal
    @Pankajjadwal Před 6 lety +2

    It was a fruitful video.Can you please share the code.

  • @Miichaelk
    @Miichaelk Před 4 lety +1

    Thanks for this video sir,
    Unfortunately, I have a problem with downloading the ggbiplot package, I tried the code you used in the video and I also googled, but I can not get it to work...
    Do you have any suggestions on how to download the package??
    Thanks in advance

    • @bkrai
      @bkrai  Před 4 lety +2

      Try this:
      library(devtools)
      install_github("vqv/ggbiplot")

    • @arsalanriaz7784
      @arsalanriaz7784 Před 3 lety

      @@bkrai thank you

  • @safeeqahmed3306
    @safeeqahmed3306 Před 5 lety +1

    Great video. I have one doubt. What does the stddev attribute of PC contain? Standard deviations of the variables are already in scale..so what does stddev represent? Thanks a lot

    • @bkrai
      @bkrai  Před 5 lety

      At what point in time do you see this?

    • @safeeqahmed3306
      @safeeqahmed3306 Před 5 lety +1

      Bharatendra Rai sorry it’s sdev attribute of pc and in 9:48 while showing the summary of pc, I would like to know what the standard deviation row denote..thanks a lot

    • @bkrai
      @bkrai  Před 5 lety

      It is standard deviation related to principal components. It helps to estimate what percentage of variability is captured by each principal component.

    • @safeeqahmed3306
      @safeeqahmed3306 Před 5 lety +1

      Bharatendra Rai thanks a lot. I understand this now

  • @ramp2011
    @ramp2011 Před 7 lety

    Another great video. Thank you..
    If your data has number of categorical columns, PCA will miss out the dependence of the target variable on these columns. Correct? In such cases what other technique can one use?
    Thx

    • @bkrai
      @bkrai  Před 7 lety +1

      PCA doesn't include target variable. At the time of developing a predictive model where we use a few principal components, we can include categorical variables. So, we don't miss out on the usefulness of any categorical variable that was excluded from PCA.

    • @vishnukowndinya
      @vishnukowndinya Před 6 lety

      hi sir, dose it mean that, if i have x1 x2 c3 x4 (c3 is a categorical) . At 1st i have to use all the numeric (x1 2 4) variables to build pca, then adding the cat var c3 my data changes as (pc1,pc2, c3) ???
      or
      in the begining itself i have to convert the categorical c3 into (1, 0) and then include all my input vars to form a pcs????

  • @prithvivasireddy5564
    @prithvivasireddy5564 Před 4 lety +1

    Awesome video sir...kudos... :)
    1 doubt though .... 20:48 - why are we using 2 components only? How do we know how many principal components to use?(species ~ PC1 + PC2)

    • @bkrai
      @bkrai  Před 4 lety

      2 PCs capture more than 95% of the variability in the data. Other 2 only add about 5%. So you can choose to have PCs that capture over 80% or 90% of the variability.

  • @golumworks
    @golumworks Před 2 lety

    If I just use addEllipses =TRUE, what determines the size of those ellipses? Also, if I specify ellipse.type = “confidence”, what confidence level is used to generate the ellipses? I used factoextra if that helps.

  • @dioagusnofrizal9773
    @dioagusnofrizal9773 Před 3 lety +1

    Thanks sir, why in this video use linear regression? Can i use k means to clustering from pc1 and pc2?

  • @azzeddinereghais7494
    @azzeddinereghais7494 Před 3 lety

    Good evening
    If you want to show the first dimension (Dim1) and the third dimension (Dim3)
    What to do or if you can provide the code for that
    Thanks

  • @mamadououattara210
    @mamadououattara210 Před 2 lety

    Hi Dr, How to I use PCA to generate a score based on several variables? Regards

  • @ramp2011
    @ramp2011 Před 7 lety +1

    Awesome video. Thank you. As time permits can you do a video on use of caret package? thank you

    • @bkrai
      @bkrai  Před 4 lety

      Saw this today. Thanks for comments!

  • @safezonesharing914
    @safezonesharing914 Před 5 lety +1

    Thank you for your VDO.
    My R version is 3.5.1 and it cannot allow ggbiplot.
    Do you have any package instead of ggbiplot ?

    • @bkrai
      @bkrai  Před 5 lety +1

      Try installing it by running this line:
      install_github("ggbiplot", "vqv")

    • @safezonesharing914
      @safezonesharing914 Před 5 lety

      @@bkrai Thank you for your kindly replying
      When I ran it, it would shown like this.
      Error in install_github("ggbiplot", "vqv") :
      could not find function "install_github"

    • @ashishsangwan5925
      @ashishsangwan5925 Před 5 lety

      @@safezonesharing914 I'm also getting the same error

    • @ashishsangwan5925
      @ashishsangwan5925 Před 5 lety +1

      @@safezonesharing914 try below command. It worked for me
      library(devtools)
      install_github("vqv/ggbiplot")

    • @alexandrec.2939
      @alexandrec.2939 Před 5 lety +1

      @@ashishsangwan5925 Arf, for few seconds I believed you were my saver ^^. But nope, your alternative didn't work as well

  • @jinnythomas9815
    @jinnythomas9815 Před 3 lety +1

    Thanks for the video
    Please publish video on Exploratory Factor Analysis,Confirmatory Factor Analysis application in a model
    Also please explain the difference from PCA

    • @bkrai
      @bkrai  Před 3 lety

      Thanks for the suggestion, I've added this to my list.

  • @mukhtaradamuabubakar370

    Nice video and very helpful, I have challenges while installing the ggbiplot and mnet packages (am using R version 3.6.3) please any advice on how to over come such challenge?

    • @mukhtaradamuabubakar370
      @mukhtaradamuabubakar370 Před 2 lety

      OK for the nnet package it was successfully installed. but still struggling with the ggbiplot (despite using your codes). thanks

  • @alessandrorosati969
    @alessandrorosati969 Před 11 měsíci +1

    can a dataset consisting of the principal components and the target variable be used to perform machine learning techniques?

    • @bkrai
      @bkrai  Před 11 měsíci

      Yes, this video shows an example of doing it.

  • @rainbowdu509
    @rainbowdu509 Před 7 lety +1

    Hi..good day bharatendra..I want to replace one my columns with value 1 for all its elements,what is the code in R studio..thanks for your time?

    • @bkrai
      @bkrai  Před 7 lety

      suppose you are using following data:
      data(iris)
      To add what you indicated to a "new" column, you can use:
      iris$new

    • @rainbowdu509
      @rainbowdu509 Před 7 lety

      thanx for ur ans ..I do already have a column with different values,I wanna replace all values on that column with just 1

    • @bkrai
      @bkrai  Před 7 lety +1

      So for iris data if you want to change all values for Sepal.Length variable to 1, you can use:
      iris$Sepal.Length

  • @vincentroy2044
    @vincentroy2044 Před 6 lety

    Hi ! Your video helped me a lot, thank you! However, there is also the "princomp" command, why do you use the "prcomp" command? Can I have an example of when to use "princomp" and when to use "prcomp"? thank you very much

    • @dhavalpatel1843
      @dhavalpatel1843 Před 4 lety

      If your number of features is more than the number of samples, use prcomp(). If your number of samples is more than the number of features, use princomp(). princomp() can’t deal with the data that number of features is more than the number of samples.

  • @harishnagpal21
    @harishnagpal21 Před 5 lety +1

    Hi Bharatendra, nice video. I have got couple queries. If there are large no of numeric variables and through PCA we find that they are highly correlated then before going for model building
    1) Do we need to remove highly correlated variables !
    2) which one to remove ! Thanks

    • @bkrai
      @bkrai  Před 5 lety +1

      You don't need to remove if you are using the components for developing a prediction model. This video provides a similar example.

    • @harishnagpal21
      @harishnagpal21 Před 5 lety

      thanks

    • @desert00200
      @desert00200 Před 5 lety +2

      Principal components are orthogonal to each other, saying differently they are uncorrelated and can be used as is in model building.

    • @bkrai
      @bkrai  Před 4 lety

      Thanks!

  • @mohammadj.shamim9342
    @mohammadj.shamim9342 Před 6 lety +1

    Dear Respected Sir,
    I wanted to install ggbiplot using the command you provided with us. but it gives me another message. The message is (Installation failed: SSL certificate problem: self signed certificate in certificate chain
    Warning message:
    Username parameter is deprecated. Please use vqv/ggbiplot) I used vqv/ggbiplot as well, but no good results.
    please guide me what shall I do?

    • @bkrai
      @bkrai  Před 6 lety

      Not sure what went wrong. May be some typo or something else. Probably you can try running commands using my R file.

  • @jayashriraghunath3210
    @jayashriraghunath3210 Před 4 lety +1

    Awesome explanation sir...👍👍can you make a video for independent component analysis using r in the same way sir?

    • @bkrai
      @bkrai  Před 4 lety

      Thanks, I've have added it to my list.

  • @seaatm
    @seaatm Před 5 lety +2

    Cool video! Can you do a video about Multiple Correspondance Analysis(MCA) for cualitative data? It would help me a lot

    • @bkrai
      @bkrai  Před 5 lety

      Thanks, I've added this to my list.

  • @dejunli6417
    @dejunli6417 Před rokem +1

    Hi, I want to know from where can I get the iris example data ? thank you!

    • @bkrai
      @bkrai  Před rokem

      It's inbuilt in R itself. You can access it by running first 3 lines shown in the video.

  • @soumyanayak445
    @soumyanayak445 Před 5 lety +1

    Sir why have you predicted the training and test data with respect to PC? can use trg data for making neural model and test using tst data set? and find correlation b/w act and predicted values?

    • @bkrai
      @bkrai  Před 5 lety +1

      When there are many variables, chances of having multicollinearity problem increases. And PCA helps to solve that problem. And yes, you can use neural network model.

    • @soumyanayak445
      @soumyanayak445 Před 4 lety +1

      @@bkrai sir can you please explain me the significance of the lines under the heading: prediction with principle components.As I am unable to understand why we are predicting twice on test data set. Please explain sir

    • @bkrai
      @bkrai  Před 4 lety

      To avoid over-fitting where you get very good result from training data but not so from testing.

  • @joujoumilor2898
    @joujoumilor2898 Před 5 lety

    As usual your videos are the best explained , Sir please I have a question actually my data contains both numeric and categorical variables, so what is the best method to reduce demension and what is the name of the pckage because I'm new in R .

    • @dhavalpatel1843
      @dhavalpatel1843 Před 4 lety

      You can binaries your categorical variables. Try to google and learn about One-of-k coding. That way you end up with only numerical data.

  • @aparnakanduri1111
    @aparnakanduri1111 Před 6 lety

    Hi Sir,
    How can we detect outliers in PCA

  • @deepikachandrasekaran3554

    Very useful video sir. Could you explain me what is the need to partition the data into training and testing data?

  • @diaraofany7053
    @diaraofany7053 Před 4 lety +1

    thank you so much for the video, this helped me a litle bit, because i still dont know how to use R to produce scatter plot for EOF QBO (Quasi Biennial Oscillation). Excuse me Sir, May you help me with the script?

    • @bkrai
      @bkrai  Před 4 lety

      For data visualization you can find this link useful:
      czcams.com/play/PL34t5iLfZddskPZVTm03hed8K93RsyP24.html

  • @rashmisajwan1724
    @rashmisajwan1724 Před 6 lety

    I'm using stata, are there any specific commands for principal component analysis PCA in PANEL DATA Or Just simply run PCA after standardizing variables?

    • @bkrai
      @bkrai  Před 6 lety

      I've not used stata, so difficult to say what command will be correct.

  • @raisulalam6051
    @raisulalam6051 Před 4 lety +1

    Thank you

  • @sunilbobb
    @sunilbobb Před 6 lety +1

    Sir - Requesting you to kindly give a lecture advanced r programming like on H20 packages etc..

    • @bkrai
      @bkrai  Před 6 lety

      Thanks for the suggestion, I've added this to my list.

  • @parametersofstatistics2145

    Thanks sir .....can u please tell me how start learning on R from beginning?

    • @bkrai
      @bkrai  Před 4 lety

      You can start with this playlist:
      czcams.com/play/PL34t5iLfZddv8tJkZboegN6tmyh2-zr_T.html

  • @vishnukowndinya
    @vishnukowndinya Před 6 lety

    hi sir, dose it mean that, if i have x1 x2 c3 x4 (c3 is a categorical) . At 1st i have to use all the numeric (x1 2 4) variables to build pca, then adding the cat var c3 my data changes as (pc1,pc2, c3) ???
    or
    in the begining itself i have to convert the categorical c3 into (1, 0) and then include all my input vars to form a pcs????

    • @bkrai
      @bkrai  Před 6 lety +1

      You cannot use categorical variables to do pca. You should only use numeric variabels.

  • @Jubo256
    @Jubo256 Před 5 lety +1

    Hello, you put training [5] to reference the column on trg variable....
    shouldn't it be training[ , 5]?

    • @bkrai
      @bkrai  Před 4 lety

      It is training[ , 5] in the video.

  • @dpk13071979
    @dpk13071979 Před 5 lety +1

    Hello sir, I have been a regular follower of ur videos on R. Must appreciate the content and the ease with which you explain the concept.I have a small query. In PCA I am not able to create a biplot as I am not able to run the command - install_github("ggbiplot", "vqv"). I am getting the following message - Error in parse_repo_spec(repo) :
    Invalid git repo specification: 'ggbiplot'.Your help will be highly appreciated. Thanks.

    • @dhavalpatel1843
      @dhavalpatel1843 Před 4 lety

      library(devtools)
      install_github("vqv/ggbiplot")
      Try this!!!!!

    • @bkrai
      @bkrai  Před 4 lety

      Thanks for the update!

  • @francisattahegwumah2047
    @francisattahegwumah2047 Před 5 lety +1

    Thank you very much for the video... I am in interested in learning R program from the basic. Please, can you teach me using some of your videos?

    • @bkrai
      @bkrai  Před 5 lety

      Here are some playlists that you can choose from based on your interest:
      Machine Learning videos: goo.gl/WHHqWP
      Becoming Data Scientist: goo.gl/JWyyQc
      Introductory R Videos: goo.gl/NZ55SJ
      Deep Learning with TensorFlow: goo.gl/5VtSuC
      Image Analysis & Classification: goo.gl/Md3fMi
      Text mining: goo.gl/7FJGmd
      Data Visualization: goo.gl/Q7Q2A8
      Playlist: goo.gl/iwbhnE