Weighted Gene Co-expression Network Analysis (WGCNA) Step-by-step Tutorial - Part 1

Sdílet
Vložit
  • čas přidán 29. 06. 2024
  • This is part 1 of step-by-step tutorial of Weighted Gene Co-expression Network Analysis (WGCNA).
    In this video I demonstrate how to perform Weighted Gene Co-expression Network Analysis (WGCNA) using a RNA-Seq dataset. I go over data manipulation, methods to detect outlier genes and samples in the dataset, normalization, picking soft threshold, identifying modules and visualizing modules as a dendrogram. I hope you find this video helpful! I look forward to your comments in the comment section below!
    Part 2 of this tutorial:
    • Weighted Gene Co-expre...
    Data:
    www.ncbi.nlm.nih.gov/geo/quer...
    Code:
    github.com/kpatel427/CZcamsT...
    WGCNA Tutorial:
    horvath.genetics.ucla.edu/htm...
    Chapters
    0:00 Intro
    0:40 WGCNA Workflow steps at a glance
    1:09 Study Design
    1:57 Fetch Data and read data in R
    2:56 Get metadata using GEOquery package
    5:00 Manipulate expression data
    8:53 Quality Control - Remove outlier samples and genes; using goodSampleGenes()
    11:27 Detecting outliers using hierarchical clustering
    12:22 Detecting outliers using Principal Component Analysis (PCA)
    17:16 Data Normalization using vst() from DESeq2 package
    20:51 filtering out genes with low counts
    22:38 Pick soft threshold
    28:48 Identify Modules
    31:15 maxBlockSize parameter
    33:35 Get module eigengenes
    34:34 Visualize modules as dendrogram
    You can show your support and encouragement by buying me a coffee:
    www.buymeacoffee.com/bioinfor...
    To get in touch:
    Website: bioinformagician.org/
    Github: github.com/kpatel427
    Email: khushbu_p@hotmail.com
    #bioinformagician #bioinformatics #wgcna #coexpressionnetworks #geneexpression #scalefreenetworks #proteinproteininteractionnetworks #sequencing #coverage #samtools #depthofsequencing #samflag #sam #bam #alignment #phred #fasta #fastq #singlecell #10X #ensembl #biomart #annotationdbi #annotables #affymetrix #microarray #affy #ncbi #genomics #beginners #tutorial #howto #omics #research #biology #GEO #rnaseq #ngs

Komentáře • 96

  • @asiyazhao3820
    @asiyazhao3820 Před rokem

    This is absolutely AMAZING! GREAT job and Many thanks!

  • @mocabeentrill
    @mocabeentrill Před rokem +5

    You're so smart! You made it look so easy! It took me 2 full weeks to complete this analysis. Picking the right soft threshold for SFT was helpful for me. I thank you profusely.

  • @sonaaritra
    @sonaaritra Před rokem

    Thanks! Your suggestions were very helpful.

  • @RamakrishnanRS
    @RamakrishnanRS Před 9 měsíci +1

    Great tutorial - I'm working through it slowly. One advice I have is to avoid the tidyverse route when you rename columns. If you used a simple indexed `match(gsub())` call instead of pivoting longer, inner joining then pivoting back wider, you'd not deal with the data at all, just with the vector of colnames. Saves a lot of memory that way.

  • @amarjeetyadav5661
    @amarjeetyadav5661 Před rokem

    thank you very much for making these tutorial videos

  • @nataliagarcia5404
    @nataliagarcia5404 Před rokem +1

    amazing!! i was struggling with this

  • @dennisscheper1
    @dennisscheper1 Před 23 dny

    Excellent. Thank you!

  • @user-mh7iv1rb9m
    @user-mh7iv1rb9m Před měsícem

    Your videos are so amazing

  • @mocabeentrill
    @mocabeentrill Před rokem +5

    Thanks!

  • @jaykishansolanki2935
    @jaykishansolanki2935 Před rokem

    Happy teacher's day ma'am Thank you for providing this amazing tutorials that help me a lot 🎉🎉🎉

    • @Bioinformagician
      @Bioinformagician  Před rokem

      I am really glad to hear my videos have been helpful! Thank you!

  • @saraalidadiani5881
    @saraalidadiani5881 Před rokem +1

    Thank you again for an excellent video. May you please explain how we have to choose the numbers for minModuleSize and maxBlockSize in blockwiseModules? thank you in advance, looking forward to hearing from you!

  • @PortleyPortions
    @PortleyPortions Před rokem +1

    The package "janitor" is excellent for cleaning up column names if you do not want to do it manually at 19:20

  • @hemangininaik0998
    @hemangininaik0998 Před rokem +8

    Please make a tutorial on WGCNA with TCGA samples.

    • @bobby5625
      @bobby5625 Před rokem

      This would be great! Please make one!

    • @gem_game12
      @gem_game12 Před 4 měsíci

      Please make one. It would be really helpful.

  • @MasMariusb
    @MasMariusb Před rokem

    Hi, nice video. May I ask why using vst(counts) rather than the actual DESeq2 normalization process?

  • @pariaalipour61
    @pariaalipour61 Před měsícem

    Thank you for the amazing video. I was wondering if I want to start from Seurat object of single cell data how should I process the data to follow your tutorial?

  • @ps_scholar3407
    @ps_scholar3407 Před rokem

    Kindly make a tutorial of GWAS and eQTL analysis.

  • @mehwishwahid183
    @mehwishwahid183 Před 10 měsíci

    very nice video . I have couple of quick questions first 1) is finding the trait module relation compulsary for WGCNA.? if yes then what is a trait file ??means what information should be included in the trait file ?
    @) how to find/identify the hub genes after networking modling

  • @merajulislam6179
    @merajulislam6179 Před 10 měsíci

    Effective vedio

  • @drgutharajasekar6275
    @drgutharajasekar6275 Před 10 měsíci +2

    Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
    Length of colors vector not compatible with number of objects in 'order'. madam getting this error. please help me.

  • @akshayavs3776
    @akshayavs3776 Před rokem

    During my analysis I am getting a lot of gene in the ME0 (they are no in any network) and when I compare to a trait I am getting maximum correlation with this group. But I do also have good correlation with other modules as well. I am tweaking the number of genes to select for the analysis and threshold params. But is there anything I am missing blatantly?

  • @freezingtolerance7493

    Hello, Nice to provide a good video. I just wonder... I got raw data of Rseq, but I do not have a metadata which is not provided from a company generating Rseq data. This means should I make a metadata in person?

  • @namratasahu4247
    @namratasahu4247 Před 11 měsíci

    Hello , thanks for the amazing tutorial. But I am getting error after performing WGCNA. Could you please help me out to solve , too few genes with valid expression levels in the required number of samples ?

  • @sanjaisrao484
    @sanjaisrao484 Před rokem

    Thanks for providing link of the tutorial, it was very useful

  • @athenanguyen442
    @athenanguyen442 Před rokem

    Thank you so much for this! Do you recommend doing anything different for longitudinal data?

    • @Bioinformagician
      @Bioinformagician  Před rokem

      It depends on the question you are asking. If you are interested in identifying genes that are significantly associated with a particular time point, then building a network for each time point individually would make sense. Otherwise, analyzing them all together would be the right way to go about it.

  • @marziyehsalehi2290
    @marziyehsalehi2290 Před 5 měsíci

    It is really helpful, thank you. I have a question, how if the maxBlockSize is 5000? how can I change the rest of code?

  • @learnersseekers904
    @learnersseekers904 Před rokem +3

    can you please make a tutorial video for de novo RNA seq assembly and its annotation

    • @Bioinformagician
      @Bioinformagician  Před rokem

      I will surely plan a video covering this. Thanks for the suggestion!

  • @grace-426
    @grace-426 Před 19 dny

    Thankyou mam.. I want to know that is it essential to have phenotypic data for ung this in my transcriptomics data?

  • @harshitasharma3675
    @harshitasharma3675 Před rokem

    Hello ma'am can you please explain how we can download data from GEO and convert the read count values to logfold and p-value

  • @nazifahumaira4762
    @nazifahumaira4762 Před 7 měsíci

    Hello ma'am, I am facing a problem. In my case, the author provided a normalized count matrix data that have decimal points. Should I work with that one because they did not provide any raw data?

  • @AAK00419
    @AAK00419 Před 11 měsíci

    Ma'am In the hclust plot I am getting the height scale as 20,40,60 so is there any parameter to set the height scale as 200000, 600000?

  • @amaliamurgueitio473
    @amaliamurgueitio473 Před 7 měsíci +3

    Hi, thanks for this tutorial and your other videos. I followed your tutotial step by step, the only difference was when I got to the point were you used 14000 genes, I had to use 7000 for RAM. Now I get this error, any idea how to fix it? plotDendroAndColors(bwnet$dendrograms[[1]], cbind(bwnet$unmergedColors, bwnet$colors),
    + c("unmerged", "merged"),
    + dendroLabels = FALSE,
    + addGuide = TRUE,
    + hang= 0.03,
    + guideHang = 0.05)
    Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
    Length of colors vector not compatible with number of objects in 'order'.

    • @user-gf4qt9mt4r
      @user-gf4qt9mt4r Před 7 měsíci

      I encountered the same error as you, did you solve it?

  • @aytacoksuzoglu2975
    @aytacoksuzoglu2975 Před rokem +2

    i used maxBlockSize = 7000 and when i tried to plot last dendrogram i got that error.
    "Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
    Length of colors vector not compatible with number of objects in 'order'."
    got any idea why ?

    • @aytacoksuzoglu2975
      @aytacoksuzoglu2975 Před rokem

      i solved ("i guess") i got low capacity of Ram so it divides the data, when i try to color it. it doesnt cuz i got like 3 part of data but 1 part of color ( for all samples) so it doesnt match. I Figure out that much lets see how can i fix it.

    • @amaliamurgueitio473
      @amaliamurgueitio473 Před 7 měsíci

      Hi, I have the same issue, may I ask how you fixed it?@@aytacoksuzoglu2975

  • @sonialamba2767
    @sonialamba2767 Před rokem

    Thank u so much for providing this video...i have a query that, the dataset you have selected has a single text file...but if we have the datset that have multiples text files, then how to deal with it?...please help as I am new to this field...

  • @suhasinivr5614
    @suhasinivr5614 Před rokem

    Hello mam it could be really usefull if you make a video on how to interpret the results(images) obtained from wgcna

    • @Bioinformagician
      @Bioinformagician  Před rokem

      I will surely plan on making a video on this. Thanks for the suggestion :)

  • @drgutharajasekar6275
    @drgutharajasekar6275 Před 3 měsíci

    hi madam, the significant genes are of all the modules or only modules assosiated to the trait.

  • @abdullahaltulea142
    @abdullahaltulea142 Před rokem

    Thanks for your effort. Do we have to batch correct before Deseq2? I read that Deseq2 does batch correction like this: design = ~ condition + batch.

    • @Bioinformagician
      @Bioinformagician  Před rokem

      Batch effects need to be corrected for before DESeq2. If you have batch information in your colData in a column called "batch", then you could provide it in your design like you mentioned.

    • @user-ej1lh5wl8f
      @user-ej1lh5wl8f Před rokem

      @@Bioinformagician If I have done "design = ~condition + batch", then I don't need to use ComBat to remove batch effect?

  • @yipan3694
    @yipan3694 Před rokem

    Hi, thanks for your video. It's really helpful! I have a question, however, what is randomSeed and what's the effect of changing it? I see the WGCNA manual also use 54321. What's the difference between that and 1234? Thanks very much.

    • @Bioinformagician
      @Bioinformagician  Před rokem

      Random seed to make the output of our R code reproducible. By setting a specific seed, the random processes in our script always start at the same point and hence lead to the same result. The result will not change if the seed is changed. You might want to set a different seed for your analysis however, to ensure your results are reproducible, you should always use the same seed for the particular analysis.

    • @yipan3694
      @yipan3694 Před rokem

      @@Bioinformagicianokay. Thanks very much.

  • @adampassman
    @adampassman Před rokem

    Thank you so much - can you recommend any packages for batch correction?

  • @user-pz5cb4zx3t
    @user-pz5cb4zx3t Před rokem +2

    Hi,
    Thank you for the informative videos,
    due to my ram (4 GB) I had to define '5000' instead of '14000' that you used in one block. as a result I'm having problems in the plotDendroAndColors, which does not show me the merged & unmerged part under the dendrogram. I've searched and I could not find a solution. Do you have any suggestions?

  • @amrsalaheldinabdallahhammo663

    Thank you genius, can you please make a video about mitch and how to use it in R :)

  • @saadzaheer3451
    @saadzaheer3451 Před 2 měsíci

    Hi there, does WGCNA work with TPM values? How should one proceed if all they have is TPM values? Regards

  • @ramachandran8106
    @ramachandran8106 Před rokem

    Please release" GWAS" tutorial videos....

  • @user-cz4qr4ot9x
    @user-cz4qr4ot9x Před 6 měsíci

    the data i download when i read in r it says epmty any solution? the data is a Tar file. i also have it in TXT file when i read it in r using read.table or or read,dilim function it reads it into only to variables like all the details in two columns. i am begginer at R and not good with coding any kind of help will be appreciated.

  • @divyaagrawal6740
    @divyaagrawal6740 Před rokem

    Does an equal number or matched Healthy and diseased patients matter for this analysis? Scientifically?

  • @nataliagarcia5404
    @nataliagarcia5404 Před rokem

    can you perform WGCNA analysis on a pre-filtered set of differentially expressed genes, in a more downstream analysis approach?

    • @Bioinformagician
      @Bioinformagician  Před rokem

      WGCNA is an unsupervised method. It is NOT recommend to be used on a data that is pre-filtered for differentially expressed genes.

  • @anithabavikatte192
    @anithabavikatte192 Před rokem

    For my analysis ,0.4 is the highest r2 value that I found, so can go with that values to choose power and mean connectivity?

    • @PoulomiChatterjee-me7oc
      @PoulomiChatterjee-me7oc Před 3 měsíci

      I was going through the same problem. Check if your expression matrix is in right format. It should have samples in rows and genes in column.

  • @abelardnsangou2794
    @abelardnsangou2794 Před rokem

    Please can you do a tutorial on Gene set Enrichment Analysis. (Idea behind that) Like you did for WGCNA?

    • @Bioinformagician
      @Bioinformagician  Před rokem

      Sure, I'll definitely plan a video covering GSEA.

    • @abelardnsangou2794
      @abelardnsangou2794 Před rokem

      @@Bioinformagician Ok Thank you very much

    • @SaniyaKhullar
      @SaniyaKhullar Před rokem

      I also have some videos on my channel related to that. Please do check out and see :)

  • @user-hb5zf7ze4q
    @user-hb5zf7ze4q Před 3 měsíci

    Plz, make a video on WGCNA with microarray dataset. plz plz plz

  • @sonaaritra
    @sonaaritra Před rokem +1

    Hello, thank you very much for making these tutorial videos. However, I have encountered an error while plotting the dendrogram with module colors mentioned and the end of this video. Previously when I tried with the same dataset that you used in your analysis, it worked fine. But now I'm trying with one of my microarray data and I got the following error:
    Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
    Length of colors vector not compatible with number of objects in 'order'.
    Due to this error, it is not generating the panel of colors at the bottom of the dendrogram. Please help me to sort out this problem.

    • @Bioinformagician
      @Bioinformagician  Před rokem +1

      It is hard for me to recreate this error and troubleshoot it without code and data.
      I can look into this if you can send me you code and normalized data.

    • @sonaaritra
      @sonaaritra Před rokem

      @@Bioinformagician Thanks. Should I email it to you?

    • @Bioinformagician
      @Bioinformagician  Před rokem

      @@sonaaritra yes please

    • @kartiksachdeva4323
      @kartiksachdeva4323 Před rokem +1

      Were you able to fix the error? if yes could you please tell the solution

    • @SwedishRagers
      @SwedishRagers Před rokem

      I encountered the same error. How was this solved??

  • @narens8511
    @narens8511 Před 6 měsíci

    it says " Error in data %>% gather(key = "samples", value = "counts") %>% data % : could not find function "%>%

  • @bobby5625
    @bobby5625 Před rokem

    Hi! Can I also use RSEM normalized gene expression data for WGCNA?

  • @Kaaaaaaaam
    @Kaaaaaaaam Před rokem

    Why did you set TOMtype = "signed"? I am trying to understand the difference between adjacency type and TOM type. See Signed vs. Unsigned Topological Overlap Matrix
    Technical report by Langfelder: "The take-home message from these notes is this: signed TOM takes into account possible anti-reinforcing connection strengths that may occur in unsigned networks. Since the anti-reinforcing connection strengths (practically) cannot occur in signed networks, in signed networks the signed and unsigned TOM are (practically) identical".
    Since you are using the blockwiseModule instead of the constructing the network step-by-step, I believe the adjacency type is "unsigned" by default. I think you want the networkType to equal "signed".

  • @quinattasneemrafique536
    @quinattasneemrafique536 Před 7 měsíci

    Hello ma'am! It would be so helpful if you would provide your script for WGCNA as a file. It becomes difficult to note down every command

    • @Bioinformagician
      @Bioinformagician  Před 7 měsíci

      You can get all my scripts from github: github.com/kpatel427/CZcamsTutorials/blob/main/WGCNA.R

  • @athenanguyen442
    @athenanguyen442 Před rokem

    What do you mean by merged and unmerged? Do you mean data merged with phenodata?

  • @ritikasingh8809
    @ritikasingh8809 Před rokem

    is it necessary that supplementary file must have rawcounts.txt.gz ?please reply and can I do co expression , if the file is in raw.tar

  • @fatimafarhan531
    @fatimafarhan531 Před rokem

    Thank you for this very informative video ! I was applying your tutorial on my dataset, however, I kept receiving this error when running the blockwiseModules :
    Error in colSums(!is.na(datExpr[useSamples, useGenes])) :
    'x' must be an array of at least two dimensions
    I searched for it online but couldn't find an explanation, could you help me please ?

    • @nanditapuri1916
      @nanditapuri1916 Před rokem

      I got the same error! It is maybe because norm.counts is not a 2-dimensional as in lists in lists

    • @nanditapuri1916
      @nanditapuri1916 Před rokem

      So i removed the previous step to convert them into numeric, and it worked for me.

  • @RajeshKumarDutta
    @RajeshKumarDutta Před 9 měsíci

    Thanks!

  • @emilyzhang2755
    @emilyzhang2755 Před 6 měsíci

    Thanks!