Bioinformatics Q&A - PART 2 | In collaboration with

Sdílet
Vložit
  • čas přidán 29. 06. 2024
  • Welcome to our exclusive 2 part series of Bioinformatics Collaboration Q&A session with the expert Ming "Tommy" Tang! In this video, we dive deep into the world of bioinformatics, tackling your burning questions and unraveling the mysteries of this fascinating field.
    Have a burning bioinformatics question for our next Q&A? Drop it in the comments below, and Tommy might answer it in the next session!
    Check out the comments section for the pinned comment with relevant links to the questions mentioned in the video.
    PART 1 of this video: • Bioinformatics Q&A - P...
    Download data from GDC Portal: • Download data from GDC...
    Visualize gene expression data in R using ggplot2: • Visualize gene express...
    Chapters:
    0:00 Violin Plots vs Boxplots - What’s the difference?
    2:35 Materials to help understand statistical models
    5:45 How to identify contamination or sample mix-up in WGS/WES?
    9:01 Difference between nextflow/snakemake and normal R code?
    12:25 Building robust code and reusing code for next analysis
    13:55 Building a bioinformatics portfolio: How to demonstrate skills needed to transition to Bioinformatics?
    19:46 When to integrate your scRNA-Seq data?
    22:33 How to ask questions?
    Tommy's CZcams Channel: / @chatomics
    You can show your support and encouragement by buying me a coffee:
    www.buymeacoffee.com/bioinfor...
    To get in touch:
    Website: bioinformagician.org/
    Github: github.com/kpatel427
    Email: khushbu_p@hotmail.com
    #bioinformagician #bioinformatics #illumina #bridgeamplification #sequencingbysynthesis #multiplex #alleles #10x #oxfordnanopore #pacbio #affymetrix #barcode #setseed #reproducibility #pseudorandom #singleR #singlecell #annotationdbi #reversestranded #directstranded #strandedness #survival #survminer #survivalanalysis #kaplanmeier #tcga #gdcportal #tcgaportal #nci #cran #bioconductor #funcotator #variantcalling #variants #gatk #vcf #gvcf #haplotype #alleles #geneticvariants #mutations #gff3 #gff #gtf #sam #bam #phred #fasta #fastq #singlecell #10X #ensembl #biomart #annotationdbi #annotables #affymetrix #microarray #affy #ncbi #genomics #beginners #tutorial #howto #omics #research #biology #GEO #rnaseq #ngs

Komentáře • 7

  • @Bioinformagician
    @Bioinformagician  Před 4 měsíci

    5. Q. What are your opinions on violin plots and heatmaps?
    To me, violins display the same info as a boxplot. And when heatmaps are large (>30 genes, 30 samples) i find that theyre overloading with info and never really telling a good story on their own.
    - Violin plots show both the shape of the distribution and key boxplot statistics, providing a comprehensive view of data distribution.
    - Heatmaps are effective for observing patterns and trends in your data, but may become overwhelming with information, especially in larger datasets (>30 genes, 30 samples), and might not convey a clear narrative on their own.
    - Strategize ways about narrowing down targets or consider different types of visualization to effectively visualize the trends and patterns in the data.
    6. Q. Intermediate materials to help understand statistical models. I'm currently watching @Bioinformagician's video on DESeq2 where she explained negative binomial and how the use of the statistical model was arrived at. I would love to know more about how models work and how to select them(a beginner friendly material will do too)
    - StatQuest (www.youtube.com/@statquest): Offers beginner-friendly tutorials on statistical concepts and models.
    7. Q. In the context of using Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) tests to identify germline rare diseases, at what stage can the 'Fastq' data be considered failed or unqualified for variant calling? - fastqc step
    Q. How can one detect sample mix-up or cross-contamination in genomic data, specifically in Fastq, BAM, or VCF files?
    - Check for potential issues in female samples by examining the presence of chromosomes X and Y i.e. have sanity checks at every step.
    - Explore tools and repositories like:
    - Somalier: github.com/brentp/somalier
    - FastQ-Screen: stevenwingett.github.io/FastQ-Screen/
    - BBMap: github.com/brentp?tab=repositories for quality control and contamination detection.
    8. Q. What is the difference between using pipelines such as nextflow or snakemake and writing a normal R code for a workflow?
    - When considering the transformation of a traditional pipeline into a Snakemake pipeline, one must carefully weigh in the pros and cons.
    - Pros: HPC deployment, error logs, troubleshooting, resuming from failure points, convenient for Python users, supports integration with conda and docker/singularity, automatically parallelizes tasks across available computational resources.
    - Cons: Learning curve, more suitable for regular pipeline runs rather than one-sample or experimental scenarios.
    Q. How to build flexible and robust code?, for example, if I want to reuse one of my scripts for another analysis.
    - Implement functions to encapsulate specific tasks.
    - Consider creating packages for modular and reusable code.
    - Develop workflows using tools like Snakemake for a structured and reproducible analysis.
    9. Q. How does one go about demonstrating that they have the skills needed to transition into bioinformatics? For instance, if someone is coming from a more traditional statistics background or social sciences background (hence already have knowlwedge of R and Python), but is missing skills like RNAseq etc, what is the best way to demonstrate that they have filled that knowledge gap? I know the general advice is to 'do a project', but I'd like to know what are 'ingredients for a successful portfolio project'. Thanks!
    Q. What projects can I pick to start building a Genomics portfolio? Possibly leading to a multi-omic one..
    - Showcase adaptability and a strong learning ability.
    - Reproduce a published RNA-Seq analysis from a scientific paper. Reproduce the analysis with a newer/different package or tool and compare analysis to demonstrate understanding of data, tools and workflows.
    - Demonstrate an understanding of omics data by conducting a thorough analysis.
    - Include a sanity check of results and an interpretation of biological implications.
    - Use Git/GitHub to host scripts and demonstrate understanding of using version control
    10. Q. In the integration of scRNA-Seq data, what justifies the need for integration? How can we determine when it is appropriate or not to integrate the data?
    - divingintogeneticsandgenomics.com/talk/2024-pythia/: Explore this resource for insights into the justification and appropriateness of integrating scRNA-Seq data.
    11. Q. How to ask questions?
    - czcams.com/video/UrammW0bmHI/video.html: Watch this video for guidance on effective question-asking techniques.

  • @sriswathi8426
    @sriswathi8426 Před 3 měsíci

    Could u make videos on GWAS analysis please?

  • @vetlove4056
    @vetlove4056 Před měsícem

    My question please this is very imp I'm stuck on r 4.1.2 where when installing package called delayed array shows error this is not available for this version

  • @fishfish20
    @fishfish20 Před 4 měsíci

    Where can I take a course online and get a certificate on Bioinformatics

  • @zaidshaikh9245
    @zaidshaikh9245 Před 4 měsíci

    Excellent session, I have a question though. I have previously coded in Pinescript language. I'm new to learning Rstudio and want to learn by going to the basics of each function/packages (as in what do they do l). Is there a manual for the same wherein I can get all the functions listed. Say for example, all functions of base R package and their use case or explanation

  • @Siilvia
    @Siilvia Před 4 měsíci

    Hi, I'm interested in multiomics data analysis, could you do a video about it please? Thank you in advance!

  • @shivalaxmi9873
    @shivalaxmi9873 Před 4 měsíci

    How to integrate omim hpo orphanet data into VCF file in whole exome sequencing pipeline