RNAseq analysis | Gene ontology (GO) in R

Sanbomics

zhlédnutí 54 651

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 24. 07. 2024
GO is one of the most basic but important steps when analyzing bulk or single-cell transcriptomics output. It allows you to interpret the results and see which biological pathways or processes are enriched in your results. Here I do it in R with output from Deseq2, but only a list of gene symbols, entrez ids, or ensembl ids is required.
Notebook: github.com/mousepixels/sanbom...
0:00 Intro
0:43 Running enrichGO
3:25 Understanding output
4:10 Plotting output
Věda a technologie

Komentáře • 126

@capita007 Před rokem ⁺¹
Thank you for this straight to the point video! Besides explaining in an accessible way, it is very direct and informative!
@sanbomics Před rokem
No problem! Thank you for letting me know!
@user-td7sh6db6c Před 5 dny
Great, accessible video. Making my beginner to RNAseq path easier
@rahulramekar1373 Před rokem
Thank you for the video; the part with gene symbols and mouse data really helped me, cheers for your good work
@sanbomics Před rokem
Glad it helped!
@philipkirianki1033 Před 8 měsíci
How can one bacteria convert locus tags into Ensemble IDs?
@ganeshmuthugangadhar Před rokem
Really great video and thanks for sharing one :)
@sanbomics Před rokem
Thanks! No problem!
@user-cj1sh8qu5h Před měsícem
I love this, thank you!
@erinbiggar3344 Před rokem ⁺¹
Hey! Thank you for the video. Question--> is it possible to remove the grid lines from the plot?
@user-nw3zn8nw3t Před 6 měsíci
Thank you for your lesson. Honestly, I spent 3 days to get the result from gene list as a beginner. I did not know that the gene list (value) from my data frame should be converted into a vector form. Then the enrichGO recognized my gene list.
@sanbomics Před 6 měsíci
Glad it helped!
@philipkirianki1033 Před 8 měsíci
Thanks for the informative video.
My Deseq output excel file (from a bacterium) has Locus tags instead of geneIDs. How can I convert locus tags into Ensemble IDs?
@hozhoz2 Před 2 lety
Thanks dude you're legit
@fgeuna Před rokem ⁺¹
Thanks a lot for the inspiring video! What about the GO annotation of a plant genome's gene list (i.e. Triticum turgidum spp. durum) as for the "org" database? Is there a way to select a species-specific resource?
@sanbomics Před rokem
I've never worked with plants before. This method might not work. You can try checking out something like DAVID to see if they have resources for you species.
@olgastepanova9336 Před rokem ⁺¹
Very easy to follow, thank you! I was previously using ShinyGO and there was an option to supply a background list, is it possible to do it with enrichGO?
@sanbomics Před rokem
Yup! Check out the docs for the command. I forget the argument off the top of my head but there is one
@natureabioros8686 Před 3 měsíci
Best GO video out there lol. 5 minutes is record time.
@sanbomics Před 3 měsíci ⁺¹
I don't mess around xD
@ahmedal-mammari9639 Před 2 lety ⁺³
plz we need more like this sample video
@sanbomics Před 2 lety ⁺²
Hope to keep releasing at least one a week!
@danielasturm4836 Před rokem
Thanks for the helpful video! I'm super confused about the data base though since I'm working with the coccolithophore Coccolithus braarudii, for which we don't even have a genome. What should I do?
@sanbomics Před rokem
I don't have much experience working with human or non-model organisms. To do enrichment analysis you need annotated gene sets. I'm not sure if they exist for your organism or not. You could check if DAVID has anything : david.ncifcrf.gov/
@Viralworldremix Před rokem ⁺¹
Very informative and helpful. Please suggest me how to prepare the database for nonmodel organizations. Can I use all the gene ID and corresponding GO number for this purpose. Thanks
@sanbomics Před rokem
Check out topGO: bioconductor.org/packages/release/bioc/html/topGO.html
I haven't done it in a long time, but I remember it being straightforward
@johnyijaq536 Před 10 měsíci
Thank you for the video. May I know which database I have to select if I want to analyze plant and bacterial genes?
@sanbomics Před 10 měsíci
Sorry, never done it so I don't know the answer of the top of my head. Good luck!
@MaheshPaintings Před rokem
Thanks a lot for this descriptive and informative content. Could you also please provide some contents on deconvolution of bulk RNASeq data?
@sanbomics Před rokem
Good idea, I can keep that in mind for the future!
@srivatsanparthasarathy1745 Před rokem
Thanks lot for this amazing video. I am trying to do the same analysis but with Genbank Accession numbers instead of Ensembl ID as keyType. Eventhough after applying Log2FC and p-adj filters, I only get 271 genes for enrichGO, when I use "ACCNUM" as keyType, R took over 30 minutes and threw an exception error and crashed. I use Mus musculus (Mm) database.
@sanbomics Před rokem
Hmm i've never tried accession. Maybe be try converting them to entrezid first and see if that fixes is?
@anmolpardeshi3138 Před 4 měsíci
Great video! this is helpful. in the comment at 3:45 you meant 366 of 402 ? (instead of 403) because when you printed the to be tested genes there were 402 out of the initial 1192 that crossed the significance threshold
@ahmedal-mammari9639 Před 2 lety ⁺²
thank you so much
@sanbomics Před 2 lety
You're welcome!
@user-ej1lh5wl8f Před rokem
Thank you for your video, should I drop NA value in DEseq2 result that I can conduct the GO analysis?
@sanbomics Před rokem
Yeah any NA values should be dropped since in GO enrichment you only include significant genes
@maruthiram5523 Před 10 měsíci
what to do if we have ensembl gene id versions i.e. ENSG00000003436.16. how to change the key type so that we can go for further analysis?
@adria12vc Před 8 měsíci
On my R version on MAC when i type the rownames it prints the row number that that sample corresponds, how can I choose the column that I want it to express???
@user-xq6cr5ul8p Před 8 měsíci
Thank you very much for this! I am just wondering how did you determine the cutoff for baseMean? In this case it's 50, is this a commonly used threshold value?
@sanbomics Před 8 měsíci
Good question. There isn't a definitive answer. A lot of people use arbitrary thresholds, but you can also base it on the variability of your dataset at lower values or use a filter based on the distribution of gene abundances.
@Jungjis Před rokem
Huge thanks for informative tutorial, and I have a question, I got DEGs from snRNA seq data with seurat, in my marker.csv doesn't has STAT column like your dataset. So what is stat column meaning and how can I calculate it?
@sanbomics Před rokem ⁺¹
Are you trying to do GSEA? For simple GO enrichment you don't need it. For GSEA you don't need that STAT column if you can use something else. Does the DE test you use provide any statistic? If not you can use the log fold change to rank them.
@Jungjis Před rokem
Exactly, I wanted to run GSEA and my DEG.csv has colums of gene symbol, log2FC, p-value, adj-pval, pct.1 and pct.2. It just cluster marker extracted from seurat. So I can run GSEA with my DEG.csv if I arrange the data with p-val or log2FC in descending manner?
@lst595991 Před 2 lety ⁺¹
Great video! I would like to know how to do such an analysis in a nonmodel organism
@sanbomics Před 2 lety ⁺¹
That is a great question. Basically all you need is a background list of genes, target lists of genes, (e.g., genes that belong to a GO term) and your enriched set of genes. You can do a hypergeometric enrichment analysis. I have a video that goes over this. But, you still need a list of target genes, and if it is not a well-characterized organism you may have to be creative in coming up with these lists.
@layakalita9018 Před rokem
Thank you so much for this informative video. I am doing exactly the same thing as you did in this video. But I am unable to plot the barplot and dotplot. I have also tried with the enrichplot package but still it is not showing the plots. It shows the command as "Error in barplot.default(go_analysis) : 'height' must be a vector or a matrix" Can you please advice me on this issue?
@sanbomics Před rokem
I'm sorry, but it is very hard to troubleshoot without more information. I hope you were able to figure it ou!
@yijingwang7308 Před rokem ⁺¹
Hi thank you for your video. But I have a question about the genes for test, normally the FC cutoff is 2, which means |log2FC | >= 1, right? Besides, not only |log2FC | >= 1 but also the padj should less than 0.05, right?
@sanbomics Před rokem ⁺¹
The cutoffs you chose are always arbitrary. But, abs(lfc2) >= 1 AND padj < 0.05 is pretty typical. Without looking at what I did again, I most likely filtered based on both lfc and padj. Maybe at different lines of code if you didn't see both.
@yijingwang7308 Před rokem ⁺¹
@@sanbomics Thank you so much for your reply!
@GabrielleWidjaja-te5pm Před rokem ⁺¹
Hi Sanbomics, I did two GO plots and my PI wants the two p adj keys to be the same scale/range. I have been searching how to achieve this, but no beans. Do you have any suggestions? Thank you for the informative video!
@sanbomics Před rokem ⁺¹
You could convert them to -log10 values which may make it easier to put on the same scale if they are far apart. Alternatively (harder) you can make your own color mapper that spans the whole range of values and color the bars based on that and make a legend bar. There might be an easier way I don't know. (I am much better at plotting in python than R).
@GabrielleWidjaja-te5pm Před rokem
@@sanbomicsI appreciate your swift response! That is a smart solution, thank you!
@NisarAhmed-it5hp Před 2 lety
Hi, thanks for these amazing videos. I am unable to create the plot of this go_result. Is there any specific package which needs to be installed or something else?
@sanbomics Před 2 lety
Hi, what error are you getting?
@NisarAhmed-it5hp Před 2 lety ⁺¹
@@sanbomics Hey thank you for asking. It worked by installing "enrichplot" package.
@layakalita9018 Před rokem
Hey, I am also facing the same problem as u did earlier, i.e. I am unable to plot the go_result. I have also tried the package enrichplot, but it is showing some commands. Can you please help me on this?
@user-rn3vh1ff3m Před rokem
Hi, cheers for the super informative video to save a bunch of grad students like me.
Anyways, I have one question.
It seems like that enrichGO doesn't work but in CC mode, which is not so useful like you mentioned in the video. It doesn't matter if I change the keyType or any options.
Can you recommend the alternative function? Apparently gseGO and groupGO don't work in the same way.
@sanbomics Před rokem
Hi, sorry for the late reply. Were you able to figure it out?
@user-rn3vh1ff3m Před rokem
@@sanbomics Nope unfortunately, not yet😅 Any ideas?
@carolinejuery437 Před rokem
Hi !
Thanks for this very clear video.
I am working with a non model organisms that do not have the annotation file prepared as for human or mouth. Is there a package to do this ?
Thanks in advance
@sanbomics Před rokem
Do you have a list of genes and categories you want to test enrichment for?
@carolinejuery437 Před rokem
@@sanbomics thanks for your answer, yes. I have the GO file for the genome and a set of differentially expressed genes
@sanbomics Před rokem
I've never had to do it, but I think you can with topGO: bioconductor.org/packages/release/bioc/html/topGO.html
I've had this question multiple times, so I might figure it out and make a video down the line.
@carolinejuery437 Před rokem
@@sanbomics yes, thanks, I am trying to use TopGO! All the best
@atlma2 Před rokem ⁺¹
Hi, is there a location of the database/file in which you are using for this script? I'm trying to follow along but as a beginner is it difficult if I cannot view the file
@sanbomics Před rokem ⁺¹
If you want to start from the beginning I have a complete walkthrough through the process of RNAseq leading up to this point. Check out my RNAseq section
@barbarainb Před rokem ⁺¹
Hi, I am using Jaculus jaculus and I cannot find the "org" database for them , but I can retrieve the GOs info from biomart in R, How can I adapt that data in order to make this graph that you so nicely explain here, or is there a jaculus jaculus org database that you know ?
thank you so much
@sanbomics Před rokem ⁺¹
Aww I had to look up jaculus jaculus. Cute little ones. You may have to use a different tool for encirhment maybe like an EnrichR wrapper. Or you can use the DAVID web tool and save the output table and make a graph from it.
@barbarainb Před rokem
@@sanbomics thank you so much for your answer and for your awesome videos, they really make science come to life 🥰.
And yeah jaculus are the cutest 😁 do you happen to have any tutorial on some of these steps ?
Thank you so much, very grateful!
@sanbomics Před rokem ⁺²
Thank you! 😊 I don't unfortunately, but It is a common enough question that I may make one in the future.
@researcher7410 Před rokem
How can we perform GO enrichment analysis on genomic data and how to separate the gene list from a plethora of genomes????
@sanbomics Před rokem
Hi, I am sorry but I am not sure I understood the question. But, GO enrichment requires a gene list. Theoretically, whatever gives you a list of genes can also be used for GO analysis.
@jacobb5342 Před 3 měsíci
thanks for the video! have you had any luck using
fit
@1smorenoc Před 2 měsíci
use ggplot2
# p is plot
p
@elifsukartal2840 Před rokem
Hello, first of all thank you for the video and the effort. However I try to implicate it R gives object ('sigs' not found) error. I don't know how to resolve the issue and I am fairly new to the R.
@sanbomics Před rokem
sigs is a dataframe if have that only has the significant DE genes. You will need something like that but it doesnt have to be called sigs
@kitony Před rokem
Is it possible to create your own database with proteins/gene modes to use clusterprofiler?
@sanbomics Před rokem
Yup! Except I use the EnrichR wrapper when I do it. It might work with clusterprofiler too, but I haven't tried
@miladsabzevary Před 3 měsíci
Hi. In your command {[sigs$log2FoldChange>0.5,]}, you just find genes that have higher expression in A vs B. How is the command for both downregulated and upregulated in A?
@sanbomics Před 3 měsíci
you can do the absolute value > 0.5
@hebamohammed2517 Před rokem
And if the genes for rhesus macaque which library we will install?
@sanbomics Před rokem
Try this:
bioconductor.org/packages/release/data/annotation/html/org.Mmu.eg.db.html
@sakibsarkerii514 Před 7 dny
How to perform KEGG pathway analysis?
@siddharthadas86 Před rokem
I was wondering should the gene list for enrichment be all the genes tested for or all the genes in the genome?
@sanbomics Před rokem
You mean the background gene list? This is a good question. It should be all the genes you detected in your analysis - not all the genes in the genome. I wish I had specified that clearly in the video.
@freezingtolerance7493 Před rokem
If I have "GO ID" as rownames, instead of ensembl ID,, can I do also go term analysis using enrichGO function?
@sanbomics Před rokem
GO ID for individual genes or pathways?
@freezingtolerance7493 Před rokem
@@sanbomics GO id for individual genes.. since my data is non-model species, I could not use org.hs database. So, I extracted the go id of each gene_id against interproscan. So, Now, I have deseq data and GO id corresponding to each gene. With only this information, Can I perform GO analysis as you do in video?
@fuad1245 Před rokem
Hie, you used human database here which is available in bioconductor, but what if I have, for example, goldfish, which database is not in bioconductor, how will I proceed then? Pls let me know.thank you
@sanbomics Před rokem
Goldfish! That is awesome. I've never heard of someone do goldfish. Yeah, unfortunately you will need to build your own reference database. This is a common question and I go into it a little more depth in other responses if you want to look through. I may make a video doing this in the future
@sanjaisrao484 Před rokem
Sir for this analysis can we take all DEGs (up and down regulated) or should take any one?
@sanbomics Před rokem
Thats a good question. Typically, you pick up OR down. But depending on your question there might be some instances you want to include both. Unless you know for sure I would pick the former.
@sanjaisrao484 Před rokem
@@sanbomics thankss
@jujajuja742 Před 10 měsíci
Hello, I am trying to use enrichGO but I am running into an error, Expected input gene ID: ENSMUSG00000020191,ENSMUSG00000063281,ENSMUSG00000030898,ENSMUSG00000028294,ENSMUSG00000038651,ENSMUSG00000021822. My genes have the ensembl id so I do not know why it is giving me this error.
@sanbomics Před 10 měsíci
hard to say without seeing the code. I'm guessing it is a small typo or mistake
@zeinabbahari Před 2 měsíci
Hi.thanks for your good video.how can acsess to you dr. i need some emergenecy help in my data analysis.. please help me
@sanbomics Před měsícem
Hi, you can reach me through sanbomics.com
@veki2630 Před rokem
How to do functional analysis for miRNA?
@sanbomics Před rokem
You need to find a database that has functional terms associated to miRNA. I am not sure which ones exist because I have not done much with miRNA. You can change the database that you use in the function from the default one
@hyyyui Před rokem ⁺¹
Could you please upload the script? Thank you!!
@sanbomics Před rokem
Sure! Here it is: github.com/mousepixels/sanbomics_scripts/blob/main/GO_in_R.Rmd
@khanmohdsarim Před rokem
Thanks for this nice video. Please inform the following:
1. What if database of a bacteria at .org....eg.db is removed by bioconductor.
2. How to input data if I have 8 treatment with 3 replicates, can I put all treatment simultaneously or in group of 2/3?
3. After deseq2 is GO/GSEA or what should be the step to complete the analysis?
Please inform
Thanks in advance
@sanbomics Před rokem
1) You can make a custom database although it will be a bit more involved. I am not familiar with bacterial work so I cant assist much more than that.
2) Do you mean for DE analysis? It depends on the questions you are trying to answer. You can only do pairwise comparisons, but if one group is one treatment and the other group is a combination of the other 7 treatments, is up to you. Usually people would do a pairwise comparison of all treatments, but 8 groups is a lot of comparisons. I would pick the comparisons that make biological sense. For example, you might want to just compare the treatments individually to the control for 8 total DE analysis. Its a hard question to answer without knowing more though.
3) Again, this is highly dependent on what you are trying to answer. But, GO/GSEA are almost always done after DE analysis and I would highly recommend doing that at the minimum. People usually like to see the top DE genes in something like a volcano plot or heatmap, even though IMO those don't add that much to the analysis that a csv of the output doesn't already tell you. If you are interested in how similar the treatments are you can do some sort of clustering (PCA/hierarchical/etc). if you have 24 samples you can theoretically do a co expression analysis.
good luck!
@khanmohdsarim Před rokem
@@sanbomics
Thank you for a detailed description
1. I understand but in almost every information source people taking the example of humans, ratus, and Arabidopsis, I am unable to follow their instructions. Could you please elaborate on the custom database if possible?
2. Yes I have one control with 3 treatments in one condition and another control and 3 treatment in the second condition. (1+3 = condition 1) (1+3 = condition 2)
3. I am looking for how such treatments change the morphology of organisms and which genes are important for it. Thus was looking for advice on GSEA or pathway analysis?
@sanbomics Před rokem
Hi! Sorry for the delay, I don't get notified when people respond after I respond.
1) Several of the R GO packages allow you to input custom gene lists for for enrichment. Its gonna take a little trial and error on your part likely.
2) Im sorry, I am still a little confused about the layout
3) I think both are important. You should try both. GSEA is just a method to test gene set enrichment. Pathway analysis usually means the gene sets are specific to pathways, as opposed to gene ontology where the gene sets are more broad. You can use GSEA on both pathways or GO
@Stop-and-listen Před rokem
I am trying to reproduce your results, but I cannot find the file "count_table.csv" on your website.
@sanbomics Před rokem
try this: github.com/mousepixels/sanbomics_scripts/blob/main/count_table_for_deseq_example.csv
@MM-fj7ym Před rokem
Hi can you teach Gene Ontology Enrichment Analysis by GOhyperGALL function?
@sanbomics Před rokem
Hi, I've never used that function. But al GO enrichment is basically the same idea
@MM-fj7ym Před rokem
@@sanbomics Thank, and I have a question I dont understand GO enrichment analysis vs GSEA. Could you explain this?
@excelobiageli9446 Před 2 lety
Can i use this DEGs to carry out gene co-expression analysis??
@sudeeris7294 Před 2 lety ⁺¹
as far as i know you need to have enough number of samples to carry out co exp analysis
@sanbomics Před 2 lety
Hi. You co-expression analysis is different than just DE/GO. You need a large sample size for that in order to find correlations between gene expression. Usually the minimum is around ~20 samples. But that varies based on which model you are using. If you are using humans you need a lot more than if you are using cell culture from one cell line or if you are using genetically identical mice.
@excelobiageli9446 Před 2 lety
Oh, thank you. But what I really want to know is that do i have to use Differentially Expressed genes for co-expression analysis? Or i can just use whatever dataset and use it for the analysis without finding DEGs
@sanbomics Před 2 lety
You don't necessarily need to do DE analysis to do co-expression. Co-expression finds correlations between genes, irrespective of DE testing. However, DE analysis can help you focus in on specific co-expression pathways or genes that are different between conditions.
@excelobiageli9446 Před 2 lety
@@sanbomics thank you very much. And I love your videos. Please keep doing more🙏🏼
@sreehariap655 Před 4 měsíci
❤
@claudiaferreira6325 Před 4 měsíci
And plant genomes?? Not Hs...or mouse...?
@sanbomics Před 3 měsíci
Nobody cares about plants... JK. Not really familiar working with them.. but at the end of the dat the algorithm is the same you just have to find and use the right database
@zeinabbahari Před 2 měsíci
my name is zeinab bahari . you can find me in research gat... i need help in rna seq data analysis
@sanbomics Před měsícem
If you need help you can check out sanbomics.com
@user-jz4bw1bj9g Před 11 měsíci ⁺¹
Hi, thanks a lot for this video, it is very well explain and looks sooooo simple! However it is not working for me.
First, to visualise the data frame, I cannot do just: as.data.frame(GO_results), but this: as.data.frame(GO_results@result) otherwise it gives me an empty df.
Second, I cannot do the plot since I obtain this error message:
> fit
@sanbomics Před 11 měsíci
Hmm, the video is starting to age. It is possible things changed a little in the packages. Were you able to figure it out?
@yavorjordanov7416 Před 2 měsíci
Regarding the second problem that you have, I generated the plots without creating a dataframe of the enrichGO output, before that I kept getting the same "height" error.

Další v pořadí

Automatické přehrávání

RNAseq volcano plot of differentially expressed genes