Weighted Gene Co-expression Network Analysis (WGCNA) Step-by-step Tutorial - Part 1

Bioinformagician

zhlédnutí 37 083

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 29. 06. 2024
This is part 1 of step-by-step tutorial of Weighted Gene Co-expression Network Analysis (WGCNA).
In this video I demonstrate how to perform Weighted Gene Co-expression Network Analysis (WGCNA) using a RNA-Seq dataset. I go over data manipulation, methods to detect outlier genes and samples in the dataset, normalization, picking soft threshold, identifying modules and visualizing modules as a dendrogram. I hope you find this video helpful! I look forward to your comments in the comment section below!
Part 2 of this tutorial:
• Weighted Gene Co-expre...
Data:
www.ncbi.nlm.nih.gov/geo/quer...
Code:
github.com/kpatel427/CZcamsT...
WGCNA Tutorial:
horvath.genetics.ucla.edu/htm...
Chapters
0:00 Intro
0:40 WGCNA Workflow steps at a glance
1:09 Study Design
1:57 Fetch Data and read data in R
2:56 Get metadata using GEOquery package
5:00 Manipulate expression data
8:53 Quality Control - Remove outlier samples and genes; using goodSampleGenes()
11:27 Detecting outliers using hierarchical clustering
12:22 Detecting outliers using Principal Component Analysis (PCA)
17:16 Data Normalization using vst() from DESeq2 package
20:51 filtering out genes with low counts
22:38 Pick soft threshold
28:48 Identify Modules
31:15 maxBlockSize parameter
33:35 Get module eigengenes
34:34 Visualize modules as dendrogram
You can show your support and encouragement by buying me a coffee:
www.buymeacoffee.com/bioinfor...
To get in touch:
Website: bioinformagician.org/
Github: github.com/kpatel427
Email: khushbu_p@hotmail.com
#bioinformagician #bioinformatics #wgcna #coexpressionnetworks #geneexpression #scalefreenetworks #proteinproteininteractionnetworks #sequencing #coverage #samtools #depthofsequencing #samflag #sam #bam #alignment #phred #fasta #fastq #singlecell #10X #ensembl #biomart #annotationdbi #annotables #affymetrix #microarray #affy #ncbi #genomics #beginners #tutorial #howto #omics #research #biology #GEO #rnaseq #ngs

Komentáře • 96

@asiyazhao3820 Před rokem
This is absolutely AMAZING! GREAT job and Many thanks!
@mocabeentrill Před rokem ⁺⁵
You're so smart! You made it look so easy! It took me 2 full weeks to complete this analysis. Picking the right soft threshold for SFT was helpful for me. I thank you profusely.
@Bioinformagician Před rokem ⁺¹
I am glad my video was helpful! Thank you!
@sonaaritra Před rokem
Thanks! Your suggestions were very helpful.
@RamakrishnanRS Před 9 měsíci ⁺¹
Great tutorial - I'm working through it slowly. One advice I have is to avoid the tidyverse route when you rename columns. If you used a simple indexed `match(gsub())` call instead of pivoting longer, inner joining then pivoting back wider, you'd not deal with the data at all, just with the vector of colnames. Saves a lot of memory that way.
@amarjeetyadav5661 Před rokem
thank you very much for making these tutorial videos
@nataliagarcia5404 Před rokem ⁺¹
amazing!! i was struggling with this
@dennisscheper1 Před 23 dny
Excellent. Thank you!
@user-mh7iv1rb9m Před měsícem
Your videos are so amazing
@mocabeentrill Před rokem ⁺⁵
Thanks!
@jaykishansolanki2935 Před rokem
Happy teacher's day ma'am Thank you for providing this amazing tutorials that help me a lot 🎉🎉🎉
@Bioinformagician Před rokem
I am really glad to hear my videos have been helpful! Thank you!
@saraalidadiani5881 Před rokem ⁺¹
Thank you again for an excellent video. May you please explain how we have to choose the numbers for minModuleSize and maxBlockSize in blockwiseModules? thank you in advance, looking forward to hearing from you!
@PortleyPortions Před rokem ⁺¹
The package "janitor" is excellent for cleaning up column names if you do not want to do it manually at 19:20
@hemangininaik0998 Před rokem ⁺⁸
Please make a tutorial on WGCNA with TCGA samples.
@bobby5625 Před rokem
This would be great! Please make one!
@gem_game12 Před 4 měsíci
Please make one. It would be really helpful.
@MasMariusb Před rokem
Hi, nice video. May I ask why using vst(counts) rather than the actual DESeq2 normalization process?
@pariaalipour61 Před měsícem
Thank you for the amazing video. I was wondering if I want to start from Seurat object of single cell data how should I process the data to follow your tutorial?
@ps_scholar3407 Před rokem
Kindly make a tutorial of GWAS and eQTL analysis.
@mehwishwahid183 Před 10 měsíci
very nice video . I have couple of quick questions first 1) is finding the trait module relation compulsary for WGCNA.? if yes then what is a trait file ??means what information should be included in the trait file ?
@) how to find/identify the hub genes after networking modling
@merajulislam6179 Před 10 měsíci
Effective vedio
@drgutharajasekar6275 Před 10 měsíci ⁺²
Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
Length of colors vector not compatible with number of objects in 'order'. madam getting this error. please help me.
@akshayavs3776 Před rokem
During my analysis I am getting a lot of gene in the ME0 (they are no in any network) and when I compare to a trait I am getting maximum correlation with this group. But I do also have good correlation with other modules as well. I am tweaking the number of genes to select for the analysis and threshold params. But is there anything I am missing blatantly?
@freezingtolerance7493 Před rokem
Hello, Nice to provide a good video. I just wonder... I got raw data of Rseq, but I do not have a metadata which is not provided from a company generating Rseq data. This means should I make a metadata in person?
@namratasahu4247 Před 11 měsíci
Hello , thanks for the amazing tutorial. But I am getting error after performing WGCNA. Could you please help me out to solve , too few genes with valid expression levels in the required number of samples ?
@sanjaisrao484 Před rokem
Thanks for providing link of the tutorial, it was very useful
@athenanguyen442 Před rokem
Thank you so much for this! Do you recommend doing anything different for longitudinal data?
@Bioinformagician Před rokem
It depends on the question you are asking. If you are interested in identifying genes that are significantly associated with a particular time point, then building a network for each time point individually would make sense. Otherwise, analyzing them all together would be the right way to go about it.
@marziyehsalehi2290 Před 5 měsíci
It is really helpful, thank you. I have a question, how if the maxBlockSize is 5000? how can I change the rest of code?
@learnersseekers904 Před rokem ⁺³
can you please make a tutorial video for de novo RNA seq assembly and its annotation
@Bioinformagician Před rokem
I will surely plan a video covering this. Thanks for the suggestion!
@grace-426 Před 19 dny
Thankyou mam.. I want to know that is it essential to have phenotypic data for ung this in my transcriptomics data?
@harshitasharma3675 Před rokem
Hello ma'am can you please explain how we can download data from GEO and convert the read count values to logfold and p-value
@nazifahumaira4762 Před 7 měsíci
Hello ma'am, I am facing a problem. In my case, the author provided a normalized count matrix data that have decimal points. Should I work with that one because they did not provide any raw data?
@AAK00419 Před 11 měsíci
Ma'am In the hclust plot I am getting the height scale as 20,40,60 so is there any parameter to set the height scale as 200000, 600000?
@amaliamurgueitio473 Před 7 měsíci ⁺³
Hi, thanks for this tutorial and your other videos. I followed your tutotial step by step, the only difference was when I got to the point were you used 14000 genes, I had to use 7000 for RAM. Now I get this error, any idea how to fix it? plotDendroAndColors(bwnet$dendrograms[[1]], cbind(bwnet$unmergedColors, bwnet$colors),
+ c("unmerged", "merged"),
+ dendroLabels = FALSE,
+ addGuide = TRUE,
+ hang= 0.03,
+ guideHang = 0.05)
Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
Length of colors vector not compatible with number of objects in 'order'.
@user-gf4qt9mt4r Před 7 měsíci
I encountered the same error as you, did you solve it?
@aytacoksuzoglu2975 Před rokem ⁺²
i used maxBlockSize = 7000 and when i tried to plot last dendrogram i got that error.
"Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
Length of colors vector not compatible with number of objects in 'order'."
got any idea why ?
@aytacoksuzoglu2975 Před rokem
i solved ("i guess") i got low capacity of Ram so it divides the data, when i try to color it. it doesnt cuz i got like 3 part of data but 1 part of color ( for all samples) so it doesnt match. I Figure out that much lets see how can i fix it.
@amaliamurgueitio473 Před 7 měsíci
Hi, I have the same issue, may I ask how you fixed it?@@aytacoksuzoglu2975
@sonialamba2767 Před rokem
Thank u so much for providing this video...i have a query that, the dataset you have selected has a single text file...but if we have the datset that have multiples text files, then how to deal with it?...please help as I am new to this field...
@Bioinformagician Před rokem
Does individual text file represent an individual sample?
@suhasinivr5614 Před rokem
Hello mam it could be really usefull if you make a video on how to interpret the results(images) obtained from wgcna
@Bioinformagician Před rokem
I will surely plan on making a video on this. Thanks for the suggestion :)
@drgutharajasekar6275 Před 3 měsíci
hi madam, the significant genes are of all the modules or only modules assosiated to the trait.
@abdullahaltulea142 Před rokem
Thanks for your effort. Do we have to batch correct before Deseq2? I read that Deseq2 does batch correction like this: design = ~ condition + batch.
@Bioinformagician Před rokem
Batch effects need to be corrected for before DESeq2. If you have batch information in your colData in a column called "batch", then you could provide it in your design like you mentioned.
@user-ej1lh5wl8f Před rokem
@@Bioinformagician If I have done "design = ~condition + batch", then I don't need to use ComBat to remove batch effect?
@yipan3694 Před rokem
Hi, thanks for your video. It's really helpful! I have a question, however, what is randomSeed and what's the effect of changing it? I see the WGCNA manual also use 54321. What's the difference between that and 1234? Thanks very much.
@Bioinformagician Před rokem
Random seed to make the output of our R code reproducible. By setting a specific seed, the random processes in our script always start at the same point and hence lead to the same result. The result will not change if the seed is changed. You might want to set a different seed for your analysis however, to ensure your results are reproducible, you should always use the same seed for the particular analysis.
@yipan3694 Před rokem
@@Bioinformagicianokay. Thanks very much.
@adampassman Před rokem
Thank you so much - can you recommend any packages for batch correction?
@Bioinformagician Před rokem
you can use ComBat-seq for batch correction
@user-pz5cb4zx3t Před rokem ⁺²
Hi,
Thank you for the informative videos,
due to my ram (4 GB) I had to define '5000' instead of '14000' that you used in one block. as a result I'm having problems in the plotDendroAndColors, which does not show me the merged & unmerged part under the dendrogram. I've searched and I could not find a solution. Do you have any suggestions?
@sarahmohammed515 Před 7 měsíci
I have similar issues! Did you figure it out? 😢
@marziyehsalehi2290 Před 5 měsíci
The same question
@marziyehsalehi2290 Před 5 měsíci
please let me know if you could solve it
@amrsalaheldinabdallahhammo663 Před rokem
Thank you genius, can you please make a video about mitch and how to use it in R :)
@Bioinformagician Před rokem ⁺¹
I will surely plan a video on covering this :)
@amrsalaheldinabdallahhammo663 Před rokem
@@Bioinformagician Thank you so much really can't wait to watch it !!!
@saadzaheer3451 Před 2 měsíci
Hi there, does WGCNA work with TPM values? How should one proceed if all they have is TPM values? Regards
@ramachandran8106 Před rokem
Please release" GWAS" tutorial videos....
@user-cz4qr4ot9x Před 6 měsíci
the data i download when i read in r it says epmty any solution? the data is a Tar file. i also have it in TXT file when i read it in r using read.table or or read,dilim function it reads it into only to variables like all the details in two columns. i am begginer at R and not good with coding any kind of help will be appreciated.
@divyaagrawal6740 Před rokem
Does an equal number or matched Healthy and diseased patients matter for this analysis? Scientifically?
@nataliagarcia5404 Před rokem
can you perform WGCNA analysis on a pre-filtered set of differentially expressed genes, in a more downstream analysis approach?
@Bioinformagician Před rokem
WGCNA is an unsupervised method. It is NOT recommend to be used on a data that is pre-filtered for differentially expressed genes.
@anithabavikatte192 Před rokem
For my analysis ,0.4 is the highest r2 value that I found, so can go with that values to choose power and mean connectivity?
@PoulomiChatterjee-me7oc Před 3 měsíci
I was going through the same problem. Check if your expression matrix is in right format. It should have samples in rows and genes in column.
@abelardnsangou2794 Před rokem
Please can you do a tutorial on Gene set Enrichment Analysis. (Idea behind that) Like you did for WGCNA?
@Bioinformagician Před rokem
Sure, I'll definitely plan a video covering GSEA.
@abelardnsangou2794 Před rokem
@@Bioinformagician Ok Thank you very much
@SaniyaKhullar Před rokem
I also have some videos on my channel related to that. Please do check out and see :)
@user-hb5zf7ze4q Před 3 měsíci
Plz, make a video on WGCNA with microarray dataset. plz plz plz
@sonaaritra Před rokem ⁺¹
Hello, thank you very much for making these tutorial videos. However, I have encountered an error while plotting the dendrogram with module colors mentioned and the end of this video. Previously when I tried with the same dataset that you used in your analysis, it worked fine. But now I'm trying with one of my microarray data and I got the following error:
Error in .plotOrderedColorSubplot(order = order, colors = colors, rowLabels = rowLabels, :
Length of colors vector not compatible with number of objects in 'order'.
Due to this error, it is not generating the panel of colors at the bottom of the dendrogram. Please help me to sort out this problem.
@Bioinformagician Před rokem ⁺¹
It is hard for me to recreate this error and troubleshoot it without code and data.
I can look into this if you can send me you code and normalized data.
@sonaaritra Před rokem
@@Bioinformagician Thanks. Should I email it to you?
@Bioinformagician Před rokem
@@sonaaritra yes please
@kartiksachdeva4323 Před rokem ⁺¹
Were you able to fix the error? if yes could you please tell the solution
@SwedishRagers Před rokem
I encountered the same error. How was this solved??
@narens8511 Před 6 měsíci
it says " Error in data %>% gather(key = "samples", value = "counts") %>% data % : could not find function "%>%
@bobby5625 Před rokem
Hi! Can I also use RSEM normalized gene expression data for WGCNA?
@Bioinformagician Před rokem
You mean RPKM normalized gene expression data?
@Kaaaaaaaam Před rokem
Why did you set TOMtype = "signed"? I am trying to understand the difference between adjacency type and TOM type. See Signed vs. Unsigned Topological Overlap Matrix
Technical report by Langfelder: "The take-home message from these notes is this: signed TOM takes into account possible anti-reinforcing connection strengths that may occur in unsigned networks. Since the anti-reinforcing connection strengths (practically) cannot occur in signed networks, in signed networks the signed and unsigned TOM are (practically) identical".
Since you are using the blockwiseModule instead of the constructing the network step-by-step, I believe the adjacency type is "unsigned" by default. I think you want the networkType to equal "signed".
@quinattasneemrafique536 Před 7 měsíci
Hello ma'am! It would be so helpful if you would provide your script for WGCNA as a file. It becomes difficult to note down every command
@Bioinformagician Před 7 měsíci
You can get all my scripts from github: github.com/kpatel427/CZcamsTutorials/blob/main/WGCNA.R
@athenanguyen442 Před rokem
What do you mean by merged and unmerged? Do you mean data merged with phenodata?
@Bioinformagician Před rokem
Can you provide timestamp?
@athenanguyen442 Před rokem
@@Bioinformagician 35:05. Thank you!
@Bioinformagician Před rokem
@@athenanguyen442 Oh I meant modules before merging and modules after merging.
@ritikasingh8809 Před rokem
is it necessary that supplementary file must have rawcounts.txt.gz ?please reply and can I do co expression , if the file is in raw.tar
@fatimafarhan531 Před rokem
Thank you for this very informative video ! I was applying your tutorial on my dataset, however, I kept receiving this error when running the blockwiseModules :
Error in colSums(!is.na(datExpr[useSamples, useGenes])) :
'x' must be an array of at least two dimensions
I searched for it online but couldn't find an explanation, could you help me please ?
@nanditapuri1916 Před rokem
I got the same error! It is maybe because norm.counts is not a 2-dimensional as in lists in lists
@nanditapuri1916 Před rokem
So i removed the previous step to convert them into numeric, and it worked for me.
@RajeshKumarDutta Před 9 měsíci
Thanks!
@emilyzhang2755 Před 6 měsíci
Thanks!

Další v pořadí

Automatické přehrávání

Weighted Gene Co-expression Network Analysis (WGCNA) Step-by-step Tutorial - Part 2