HOW TO PERFORM GSEA - A tutorial on gene set enrichment analysis for RNA-seq
Vložit
- čas přidán 24. 06. 2024
- In this tutorial, we explain what gene set enrichment analysis (GSEA) is and what it offers you. We show you how to run the analysis on your computer and take you through how to interpret the outputs. The tutorial also covers leading edge analysis and analysis of gene networks with Cytoscape.
*** Editors note: If you are having trouble loading your gct file, you should save it as .gct.txt rather than just .txt
The timestamps for the following sections are:
0:09 What is GSEA?
9:09 What software do I need?
9:55 What input files do I need to create?
14:12 What gene set files are available?
19:04 How do I use GSEA software?
35:57 What is leading edge analysis?
40:05 Visualising enriched gene sets as a network
GSEA website (download free software here)
www.gsea-msigdb.org/gsea/inde...
GSEA User Guide
www.gsea-msigdb.org/gsea/doc/...
The Gene Set Database
www.gsea-msigdb.org/gsea/msig...
Cytoscape network analysis software
cytoscape.org/
This tutorial on GSEA is brought to you by Dr Katherine West in the College of Medical Veterinary and Life Sciences at the University of Glasgow, Scotland. Look out for our other videos in this tutorial series that will help you get the most out of your gene expression analysis.
We hope you found this video useful. Please support us by liking the video and consider subscribing for more informative content. Leave us a comment if you thought this video was helpful or if there is further information you would like to share with us and the community. Thank you.
www.gla.ac.uk/people/katherin...
/ genomicsgurus
/ genomicsgurus - Věda a technologie
This was really a really well done, in depth walkthrough of GSEA!
Thanks Anthony - glad you liked it
Absolutely invaluable tutorial! Thank you for creating this!
The tutorial was amazing and easy to follow. Well done and looking forward to future videos.
Excellent tutorial! Thank you very much! Would love to see more like this!
Super helpful and clear tutorial! I appreciate you saved me a lot time to figure out how to do such analysis!
Can't thank you enough for such an invaluable video!
what a wonderful course! I do watched several before this one, this is the best!!!
Thank you, this was a great tutorial! I was struggling with multiple errors before, but now everything runs smoothly. Good job :)
Great! Glad it helped you sort things out :)
Thanks for such a great tutorial! I've been struggling a little to analyze my RNAseq data, but I hope with this info I'll be able to do it.
I hope you get some interesting results!
Thank you! This was a great introduction to GSEA. I found it extremely helpful. I wish they did more tutorial like this for other software!
Glad it was helpful!
Thanks so much again for helping me with fixing my files. That was a huge support.
Thank you for this helpful tutorial! I like how you explained all the output metrics in detail. I had zero encounters with RNA seq data analysis and within a few hours, I managed to compare my gene sets of interest in my experimental groups.
Great! Hope you found something interesting!
This video is amazing, so far my favorite. Really clear and straightforward, I truly appreciate it! Great job! many thanks :)
Thanks!
Great Introduction to GSEA, Thank you very much
It was about time that I was searching for a "real" GSEA tutorial. Thanks very much!
Hope you get some useful information from it!
Such a wonderful and informative Illustration of GSEA. Thank you so much.
Glad you found it useful!
Awesome! This was extremely well done. I am a novice at NGS analysis and found this very understandable and helpful.
Glad you found it helpful. Good luck!
Great video. Clear and easy to follow tutorial. Great job Doctor!
Glad it was helpful!
Thank you so much for the detailed tutorial. Its alot easier to understand than the user guide which misses out details on the input and ranking.
Glad it's helpful. Hope you get some useful results!
Thank you so much for the detailed tutorial!! Love it!
Glad to help!
Extremely helpful. Thank you very much.
Clear and highly helpful tutorial.
this is Brilliant! a thousand thank you!
Glad it was helpful!
Amazing tutorial! Thank you very much!
Glad you found it helpful!
Very nicely done. Thank you for making this video.
We are pleased you found it useful Mo
Great tutorial, many thanks, Dr. Katherine West.
Glad you found it useful :)
It's a very very very good tutorial to introduce GSEA!!
Thanks :) glad to have helped!
Thank your so much for making the tutorial. It is really helpful.
Great!
Thanks for sharing. Very helpful!
Glad it was helpful!
Thank you for sharing your wisdom with us.
You're welcome. Do you have any questions about it?
no tanks, this is great.
Indeed very useful and well explained. Thank you!
Glad you found it helpful!
this is gold. thank you very much
Amazingly well done , Thank you !
Glad you liked it!
Thanks, this tutorial's really helpful for my work!
Very nice tutorial. Thanks!
Glad it was helpful!
Very great explanation, thank god you made this video!!
Glad you found it useful!
Nice and detailed presentation, fully understandable. I really loved this GSEA tutorial/introduction. +1 subscriber
Thanks for your kind feedback. Glad you found it useful
Excellent talk Katherine
Excellent, thank you!
Thank you very much for this very useful!
Amazing tutorial! Congrats!
Glad it was helpful!
Really great tutorial... Alot of info worth the time 😍
Glad you found it useful!
Thank you so much, its enriched and fruitful video, thanks genius :)
This was a very helpful tutorial, Thank you.
Glad to hear that!
A great tutorial! Thank you so much it is really helpful :)
Thanks! Hope your data is interesting :)
Thank you. That was concise and wonderfully analyzed .
Meanwhile your British accent is super!
Thanks for your kind feedback. Glad you found it useful!
This really helped me. Thanks.
Woww great all concepts are cleared now
Glad it was helpful.
Amazing tutorial! thank you!
Thanks Ariel. Glad you found it useful.
So much help. Thank you.
Glad it helped!
Amazing video, thanks!
Thank you@
Useful presentation. Thanks
Glad it was helpful!
I have a note about what you said at 3:33
The genes are ranked based on their P. Value and fold change, so saying based on counts isn't entirely true.
Thank you so much for the video it's really helpful.
Thank you so much!
Great class!!!
Glad you found it useful!
Brilliant ! Big thank :)
Glad you liked it!
really help me out! many thanks!
Glad you found it useful!
A great help. Thankyou mam
An excellent presentation and made GSEA understand quickly. I recommended this to my colleagues and co-researchers- Very well done.
Glad you found it helpful!
@@GenomicsGurus Madam, please le me know how to generate heat maps with this software. Please let me know what options should be used. Thank you.
Hi Magaraju-India, heatmaps are included in the outputs, but they don't show all the genes. This is a better tool if you just want heatmaps: www.heatmapper.ca/
@@GenomicsGurus Madam, I would like to have an output represented at 32min:49 sec to 33min.30 sec of your video. Please let me know the process with options. Thank you.
Hi, you don't have to select any options -the heat maps appear automatically underneath the first table. However, sometimes they don't display as it depends on the html file being able to access the image file eg I had trouble when I saved the output to an online location (one drive) and this was solved when I saved the outputs to my c drive. If they still don't load up, the heat maps are saved as individual pictures in the folder where you save the output, so go to your file manager, find the folder, and view them from there.
Thank you, this is very clear
Glad it was helpful Jeannie!
I literally didnt like anlaysing the RNAseq data for my project samples for the past 1 year. After seeing your video, it was eye-opening.
Glad you found it helpful!
@@GenomicsGurus . Oh yes! definitely :)
Good talk.. appreciate your efforts to help
Thanks for your kind feedback Raj!
very helpful, thanks!
Glad it was helpful!
Just leaving a comment hopefully for people that are trying to use it recently.
The Expression Dataset File by default is no longer like that: just remove the first 2 rows (starting with the row: Name "tab" Description "tab" ...)
I did that and everything run smoothly! You can also see it as the last example in the user guide web page
(did they change the default standard?)
It would be great if example files were available to learn from, including the changes you indicated.
Good stuff, thanks!
Glad you liked it!
Great video!
Thanks a lot. Glad you found it useful!
Nice video, very useful!
Many thanks Furong. Pleased you found it useful!
Thank you so much, well done 🌹✨✔👌
Glad you found it useful!
THANK YOU!!!!
Nice Presentation 👍 Crystal Clear explanations Thanks a lot 😊, Would be Great to learn about time course analysis also!
Thanks for your feedback. We're pleased that you found it useful. There are more videos on this topic to follow so subscribe and look out for them 😀
There is a mistake in the explanation. Do not add the #1.2 plus number of genes and columns in the file when saving as TXT, it only works when using GCT.
Oh, that's good to know - thanks very much much!
Very important comment. When I removed genes and columns it works with TXT file.
I have a wish 🙏 god, please let her return to youtube to make her awesome videos 🙏 Amen 🙏 Love from Turkey My Teacher 🙏
This video is really helpful. I learned a lot from it. I was wondering do you have an example for time series analysis. Since the GSEA website doesn't talk too much about it, I have no idea to start the time series analysis.
Great
Thank you
You're welcome, Noor
Thank you Dr. West! Good job, great tutorial. I even liked your warm voice and your accent... it sounds like you are American, but maybe it is the Scottish accent... I've never been in Scotland, I couldn't say.
I'm glad you found the tutorial useful! I think my accent is mainly Scottish, but there's probably a twinge of the eastern USA and north west England as well ;)
Thank you, was helpful to understand and perform GSEA. I would like you to cover network construction between miRNA-mRNA expression profiles using Cytoscape
Glad you found it useful. We hope to cover Cytoscape when we get some time!
comfortable voice
Thanks a lot
Glad you found it useful
An excellent tutorial and easy to follow. Thank you so much. Can you please give a tutorial on EaSeq open source software too ?
Thanks Pedram. Glad you found it useful. We will be covering ChIP-seq soon
@@GenomicsGurus Thank you, it would be great to see the integrative analysis of RNA-Seq and Chip-Seq. EaSeq would be a good option for such a analysis but unfortunately I am not an expert in bioinformatics field.
Amazing tutorial.
My question is, if i have RNA-seq results giving different expressed genes for different cell lines and I want to compare in order to check for example which cell line expresses genes (significant upregulated enrichment) associated with angiogenesis. In this case, in my expression dataset file, the first column would include ENS id's for these genes, but I have sometimes completely different genes in different cell lines (with 3 samples per cell line), so what is the appropriate way to organize this data?
Thanks for this tutorial….
Could you please make a video on Cytoscap??
Glad it was helpful. Cytoscape is on our list
Hi Katherine...........excellent explanation of GSEA for beginners..........
Can you please cover using ClueGo plugin in Cytoscape for building PPI maps..........
Thanks Harish. We hope to cover Cytoscape in future
Great video! Thanks a lot for the explanation. I was wondering if you know how to use a continuous phenotype label for a time course actually. I tried following the user guide for that but I couldn't really do it. Thanks :)
I've not tried it myself, but it doesn't look too complicated. There are lots of ways to make mistakes with these input files, though, as my students will testify! If you want to email your files to me, I can have a look at them.....katherine.west at glasgow.ac.uk
Thank you, Dr. West, for such a great tutorial!! Is there any way to see the actual FDR value if its at 0.000? Thank you for your help.
Hi Justin, If you click on the "details" link for a particular enriched geneset, you will get the scores and p values to more decimal places. However, if the P or FDR value was 0 in the first place, it often doesn't give you any more information - sorry!
Hello, thanks for the awesome tutorial.
Is there any way to get in touch with some mock/similar data to the one you used?
I am asking since it could be some time until I get my hands on proper .cls and .gct and I want to use the software and follow the tutorial exactly like you do?
Thanks Calin. Email me and we can arrange something. The address can be found in the About section on our channel homepage
Thank you Dr. West for the clear and thorough tutorial. I just have one question: is it possible to do GSEA on the interaction term as in DESeq2 design formula? Thank you again.
You can do GSEA on any list of genes that have a ranking "score" associated with them. It's most powerful when you are looking at a long list of genes that have small changes that don't all cross a significance threshold - otherwise you could just filter your list and use gene ontology analysis to find out what the significantly changed genes have in common. I'm not familiar with the output of the Deseq2 interaction term - just make sure that your list of genes and the score you choose to use is biologically meaningful.
Thank you very much for your tutorial. I have one question: Have you previously filtered the expression data? I mean, for example, applying a given Fold Change and filtering it by P-value.
Sorry for the slow reply. No, for GSEA you should use the whole data set, do not filter it.
Thank you for this informative tutorial! I would like to analyze Nanostring gene expression data (800 genes) using this GSEA software, but since these 800 genes are all immune-related, they are 'pre-enriched' and I assume this would bias the GSEA. On this software, is there any way to correct for this i.e. curate a custom a background list? Thanks so much for your help!
You can use the search function on the website to pull out specific gene sets, and you can also create your own in excel eg based on papers that show expression of certain genes in certain conditions. You need to think carefully about what question you want to ask and what you expect the data to show you. For example, if you want to look at specific subsets of immune-related genes eg pro-inflammatory cytokines, I guess GSEA would then show you whether these are particularly enriched near the start of end of your ranked gene list. You could also think about gene ontology analysis (see my video on toppgene) or pathway analysis - see my video on DAVID, or try the Reactome app within the cytoscape software.
Hi, thanks for the tutorial. I had some trouble downloading the GSEA software so I ended up using the GenePattern UI. Is there any downside to using this website instead of the GSEA desktop application? Thanks
i have a question after going through your video...that whether i need to remove normal samples expression values or not (from my own dataset) while comparing with hallmark gene sets...i have to compare between high expression and low expression of a set of genes..please reply
Thank you so much for your tutorial! In case I don't have the gene ID, just the gene symbol, how can I find their respective ID?
You shouldn't need the gene ID - choose a gene symbol chip platform instead of a gene ID chip platform when running GSEA. The long answer to your question is that you can download a file from ensembl that lists gene ID and gene symbols, and you can use vlookup in excel to look up IDs for known gene symbols and vice versa.
This content is a treasure. Thank you so much Dr. West.
Just in case someone reads my comment: I have doubts about which values I need to put in the Expression data set file (10:37). I just have two groups and I have the "Raw comparison" values and other corrected values such as reads per kilobase per million (RPKM) and Transcripts Per Kilobase Million (TPM). Which one should I use?
Another question: My data comes from a RNA-seq analysis of Mus musculus cells. Which Chip platform (23:40) should I choose?
Many thanks! : )
Hi Joan, sorry for the slow reply. TPM is probably the best dataset to use. Your genes are probably named as ensemble gene IDs (ENSMUSGXxxx) so "mouse ensembl gene ID human orthologs" with the latest number would be the right chip platform to use.
@@GenomicsGurus Many Many thanks! : )
Thanks a lot. Wonderful presentation. Can you do one on Preranked GSEA?
Glad you found it useful. I'm not sure when I'll get the time to do one on pre-ranked GSEA. Is that something you want to try? There's only a couple of things that are different, I think - I can write them down for you.
@@GenomicsGurus Yes. I have tried it a few times without success. The major issue is preparing a Preranked list .
@@sumitpaliwal1540 This is the link describing the rnk format your file needs to be in: software.broadinstitute.org/cancer/software/gsea/wiki/index.php/Data_formats#RNK:_Ranked_list_file_format_.28.2A.rnk.29
This method doesn't tolerate duplicate gene names in your list, though, which is a problem when I use Ensemble IDs, as each gene may have several different Ensemble gene IDs. If you are using Ensemble IDs I suggest you convert to gene symbols first, then sort by name (in excel) to identify any duplicates which you can then remove. If you're still having trouble, email me: katherine.west at glasgow.ac.uk and I'll have a look at your file
@@GenomicsGurus I do not have any duplicates in the list. The error I get is "After pruning, none of the gene sets passed size threshold". I can send you the screenshot and the file.
Ok, send the file and screenshot and I'll have a look.
I need to add one important point here is that the file format of the expression data mentioned here should have a .gct extension. It shouldn't be a .txt extension.
tab-delimited file doesn't take the the two rows at the top
Thanks for the feedback. This issue has come up already and is mentioned in the video description
Firstly, thank you for the tutorial, I have found it really helpful.
I have a question regarding to what can be concluded from the enrichment. I have performed a preranked method and the results showed an enrichment in the pathway, however one of the genes that is highly expressed has an inhibitory function in the pathway. So my question is if the ES shows only the enrichment of the genes (either activators or inhibitors) or also the directionality of the pathway.
Thank you in advance.
Think it only tells you about the pathway in general and not on the individual genes in the pathway. In case you want to know about the directionality for individual genes, probably you would have to check your individual genes and the fold change for that gene from your DE table.
I'm doing my RNA Seq data analysis, and I've got the differentially expressed genes in an excel sheet. I would like to do pathway analysis to see which pathways are differentially regulated now. I don't know how to do that, I hope this tutorial helps me! Thanks!
Great! Let us know how you get on!
Excellent tutorial and this made my concepts so clear. Just wondering if I can use GSEA for RNA seq analysis of any other organism. I am struggling with some Mycobacterium tuberculosis RNA seq data and could not find the GMT files for it. Is there any database from where I can download the GMT files for Mycobacterium tuberculosis.
I'm not sure if there are M. tuberculosis databases out there, but it's easy enough to make your own GMT files - it's just a list of gene IDs with a couple of rows at the top. You can download any current GMT file to see the format. I suggest you use the literature to find the genes associated with the pathway/phenotype you are interested in and make your own list. Good luck!
@@GenomicsGurus Thanks
How do you save the phenotype groups as a cls file? There doesn't seem to be an option to save files in this format on excel??
Great video, thank you so much! Could you explain how you created the .cls file? On Windows Excel doesn't have the option to save as .cls. Is there a way to convert my tab-delimited text file to a .cls file?
Just type .cls at the end of your file name and save as tab delimited text. excel will add .txt on the end, but it should work fine.