Survival analysis with TCGA data in R | Create Kaplan-Meier Curves

Bioinformagician

zhlédnutí 16 187

Přidat do
- Můj playlist
- Přehrát později
Sdílet

Sdílet

Vložit

Velikost videa:

Zobrazit ovladače přehrávání

Automatické přehrávání

Přehrát

čas přidán 29. 06. 2024
In this video I talk about the concept of survival analysis, what questions does it help to answer and what data do we need to perform this analysis. I also discuss important concepts like censoring and how it is perform and explain how to interpret Kaplan-Meier curves. Lastly, I demonstrate how to perform survival analysis in R using survival and survminer packages.
I hope you find this video helpful! Leave your thoughts in the comment section below!
Link to Code:
github.com/kpatel427/CZcamsT...
How to download data from GDC portal?
• Download data from GDC...
How to convert gene IDs to symbols?
• 3 ways to convert Ense...
Chapters:
0:00 Intro
0:35 Intuition behind survival analysis
2:21 Why do we perform survival analysis?
3:57 What is Censoring and why is it important?
6:14 What is considered as an event?
6:35 Methods for survival analysis
8:03 How to read a Kaplan-Meier curve?
10:31 Question to answer using survival analysis
10:53 3 things required for survival analysis
12:08 Download clinical data from GDC portal
15:57 Getting status information and censoring data
17:31 Set up an “overall survival” (i.e. time) for each patient in the cohort
19:01 For event/strata information for each patient, fetch gene expression data from GDC portal
19:33 Build query using GDCquery()
22:45 Download data using GDCdownload()
23:14 Extract counts using GDCprepare()
25:07 Perform Variance Stabilization Transformation (vst) on counts before further analysis
27:38 Wrangle data to get the relevant data and data in the right shape
33:11 Approaches to divide cohort into 2 groups based on expression
34:41 Bifurcating patients into low and high TP53 expression groups
34:57 Define strata for each patient
38:41 Compute a survival curve using survfit() and creating a Kaplan-Meier curve using ggsruvplot()
41:30 survfit() vs survdiff()
You can show your support and encouragement by buying me a coffee:
www.buymeacoffee.com/bioinfor...
To get in touch:
Website: bioinformagician.org/
Github: github.com/kpatel427
Email: khushbu_p@hotmail.com
#bioinformagician #bioinformatics #survival #survminer #survivalanalysis #kaplanmeier #tcga #gdcportal #tcgaportal #nci #cran #bioconductor #funcotator #variantcalling #variants #gatk #vcf #gvcf #haplotype #alleles #geneticvariants #mutations #gff3 #gff #gtf #sam #bam #phred #fasta #fastq #singlecell #10X #ensembl #biomart #annotationdbi #annotables #affymetrix #microarray #affy #ncbi #genomics #beginners #tutorial #howto #omics #research #biology #GEO #rnaseq #ngs

Komentáře • 33

@shivanirai3626 Před 10 dny
Best channel for any bioinformatician ❤❤
@preeti97rox Před rokem ⁺⁴
As someone who doesn't have a degree in Bioinformatics I am truly able to appreciate these things. Never stop making these videos!!
@jordanfredette5090 Před rokem
This is literally exactly the resource I was looking for several months ago. Glad to finally have it now. It's so nice to have example code and clear explanation.
@MsZhang666 Před rokem ⁺²
I'm going to do survival analysis tomorrow, and I found you updated this video, it's so so so helpful! You're my Godness😍😘
@codewithme_1988 Před rokem ⁺¹
Hi, I appreciate your work. Thanks for making these videos
@amitrupani9898 Před rokem ⁺¹
Thank you very much for this very informative tutorial. Very helpful indeed.
@PsycheSnacks657 Před rokem ⁺³
You are the best! Thanks
@MsZhang666 Před rokem
I can't agree more
@prakrithi.p7033 Před 10 měsíci
Thank you so much for your amazing content. I just wanted to know how I could extract the TCGA counts for some non-coding regions specified in a bed file. Suggestions would be really helpful. Thanks!
@user-mv7uw3dh5d Před rokem ⁺¹
Thanks so much. This video is really useful. Besides, how can we prepare data to combine different factors to draw forest plot or to construct risk models? Could you please share this similar R code? Thanks again!
@ezra47986 Před 10 dny
Thank you for your video! I just have question, why did you extracted the unstranded counts, but not any other count type?
@madushanfernando6495 Před 7 měsíci
Thank you very much for the excellent presentation. I am relatively new to TCGA-based R analysis. I was wondering if I can apply the same process to plot survival curves for a particular mutation using SNV data, such as the effect of BRCA1 mutation on the overall survival of ovarian cancer patients. Are there any significant changes that I need to make in the workflow to achieve this?
@BilalAhmad-gb7ui Před rokem ⁺⁶
Could you please make a video on integration of Chip-seq and RNA-seq data?
@Bioinformagician Před rokem ⁺²
I definitely plan to! Please stay tuned :)
@BilalAhmad-gb7ui Před rokem
@@Bioinformagician Thank you! I appreciate that.
@skim4901 Před 10 měsíci
Thank you for this very helpful video.
If I want to know correlation (pearson R-value) between some genes in TCGA-Breast Cancer , do I have to use fpkm_unstrand? Could you make video about this?
Again, I really appreciate your effort!!
@AyrodsGamgam Před rokem
thanks. Could you please run a tut on combining Machine Learning in R and TCGA or cbioportal or Gdac or others? Thank you.
@stefanodidonato1284 Před 8 měsíci ⁺¹
If you ever write a book, let me know cause I'll pay 2000 euro to get it hands down!
@reflections86 Před rokem ⁺¹
Greetings Miss Khusbu! Again a powerful video and it was really comprehensive. I have one question and will appreciate your guidance on it.
If we perform survival analysis on an RNA-seq data from TCGA, and let’s say the expression matrix has 20K genes and 200 patients. After survival analysis I found 30 genes that has significant survival difference. So I want to pursue further and perform a multivariate cox regression of these 30 genes. Now my confusion is that what expression matrix we should use in multivariate cox model. Should we reduce initial expression matrix to only 30 genes as variables(columns) and 200 patients (as rows) or should we use the original expression matrix (having 20K genes and 200 patients and only put 30 genes in the cox equation :
coxph(Surv(time, event) ~ gene1+ gene2 + gene3..+ gene 30 , data)).
Will highly appreciate your comment on that.
Thanks and keep doing the great work.
@Bioinformagician Před rokem ⁺¹
I don't recommend to reduce the matrix to 30 genes. You should use the entire dataset and provide 30 genes in cox equation. Also, check for multicollinearity between 30 genes, as correlations between genes can cause instability in model estimates. If collinearity is found, you should use feature selection methods to include most relevant and independent predictors in the model.
@reflections86 Před rokem
@@Bioinformagician Many Thanks. Highly appreciate your reply.
@ShubhamMaurya-ws5ly Před rokem
Can you please make video on top colleges of msc bioinformatics in India?
@mugomuiruri2313 Před 7 měsíci
good
@saeedjaanz Před rokem
Have you ever heard or done MFA & mixOmics DIABLO analysis on TCGA data?
@user-yf4pn8bw9c Před rokem
How do we change the number days upto which follow up is done? Say instead of 8000 days I want the data upto only 4000 days.
@raresciencesimple5626 Před rokem
risk.table is showing the followinf error: Error: 'yaml_body' is not an exported object from 'namespace:xfun'. can you please help
@shreyasharma8063 Před rokem
Hello mam, I am getting pvalue = 47.07. results are not significant. how to solve this. what could be the reason for this
@dwitiroy2700 Před rokem
Hello didi .. I need to talk to you .. can you pls send ur contact details .. it's about my current project .. i have some questions based on bioinformatics
@arpitmathur2933 Před 11 měsíci
Dividing into groups is not good practice. Regression should be used. I did my whole thesis on this debate.
@divyaagrawal6740 Před rokem
Why we usually chose “unstranded data” for analysis?? @bioinformagician @khushbu. Please do solve this query??
@Bioinformagician Před rokem ⁺²
I chose unstranded data for demonstration purposes. If your data is generated using a stranded protocol, you should choose stranded or reverse stranded accordingly.
@divyaagrawal6740 Před rokem
@@Bioinformagician thank you
@saeedjaanz Před rokem ⁺¹
@@Bioinformagician I had the same question as @Divya and i got my answer.

Další v pořadí

Automatické přehrávání

What is Strandedness in RNA-Seq data? | RNA-Seq Stranded Library Construction Methods