Understanding File Formats in Bioinformatics: VCF and gVCF

Sdílet
Vložit
  • čas přidán 29. 06. 2024
  • This is a quick video going over a very commonly used file format while performing variant calling analysis - VCF file. In this video, I will go over various fields in a VCF file while taking a look at an example VCF, understanding how the data is organized and what information do various fields store. In addition, I explain what are genotypes, difference between phased and unphased genotype, how to calculate alternate allele frequency and look at how DNA variations are recorded. Lastly, I also discuss what is a gVCF file and in what ways a gVCF file differs from a VCF file.
    I hope you find this video helpful! Leave your thoughts in the comment section below!
    FASTA/FASTQ format:
    • Understanding Bioinfor...
    SAM/BAM file format:
    • Understanding Bioinfor...
    Chapters:
    0:00 Intro
    0:40 What is a VCF file and how is it generated?
    2:38 Main sections of a VCF file
    3:27 Metadata section
    5:51 Header line
    6:51 Data lines - description of fields
    13:13 Genes and alleles
    14:30 Understanding genotype
    15:33 What does genotype 2/0 or 1/2 mean?
    17:02 Difference between GT:0/1 and GT:0|1 - phased vs unphased genotype
    10:05 How are variants recorded in a VCF file?
    22:01 Interpreting a record in VCF
    24:45 Genomic VCF (gVCF)
    Like the videos I create? Show your support and encouragement by buying me a coffee:
    www.buymeacoffee.com/bioinfor...
    To get in touch:
    Website: bioinformagician.org/
    Github: github.com/kpatel427
    Email: khushbu_p@hotmail.com
    #bioinformagician #bioinformatics #vcf #gvcf #gatk #haplotype #alleles #variantcalling #geneticvariants #mutations #gff3 #gff #gtf #sam #bam #phred #fasta #fastq #singlecell #10X #ensembl #biomart #annotationdbi #annotables #affymetrix #microarray #affy #ncbi #genomics #beginners #tutorial #howto #omics #research #biology #GEO #rnaseq #ngs

Komentáře • 43

  • @magdalineakinyi5928
    @magdalineakinyi5928 Před 6 měsíci +1

    I am a bioinformatics student,just began my studies and I have really learnt a lot from your content 😊

  • @mosesbaraza3369
    @mosesbaraza3369 Před 2 měsíci +1

    Quite explicit explanation and detailed and very chronologically arranged. Looking forward to learn in subsequent lessons

  • @josephinecudjoe3207
    @josephinecudjoe3207 Před měsícem

    I have been blessed by your videos. Thank you.

  • @isadoramachadoghilardi3168

    Excellent video! I'm in love with your channel!! Congratulations!! I'm starting in this world of bioinformatics, and your videos have helped me a lot! Thank you!

  • @hubijohn7451
    @hubijohn7451 Před 5 měsíci

    Am I glad I found this channel. Great stuff!

  • @Tekofilic
    @Tekofilic Před rokem

    Had always been looking for such a video. Thank you so much :D

  • @alexandrakassis3525
    @alexandrakassis3525 Před rokem

    Thank you so much for sharing this information and your knowledge! Very much appreciated. Could you please make a video on doing a joint variant calling? And also, what you would do for joint calling on rna-seq data?

  • @seetarajpara7626
    @seetarajpara7626 Před rokem +4

    I love your channel!! Your content is so well organized, thank you so much!

  • @yuxiang4218
    @yuxiang4218 Před 9 měsíci

    Very helpful! Thanks for sharing.

  • @user-ur6nm1fn6w
    @user-ur6nm1fn6w Před 7 měsíci

    Thanks - great teaching.

  • @giovannapg7532
    @giovannapg7532 Před rokem +1

    OMG such a good video!!! You can explain everything so amazingly ❤ Could you please one day make a tutorial about data set integration on Seurat, as 10X genomic and Smart-seq2 integration??? Thank you!!

    • @Bioinformagician
      @Bioinformagician  Před rokem

      Definitely have plans to make a video covering this. Thanks for the suggestion!

  • @tapanbaral8939
    @tapanbaral8939 Před rokem

    Really informative tutorial. Could you please make a video on TMB and MSI ?

  • @user-zv7cg1mn2i
    @user-zv7cg1mn2i Před 7 měsíci

    Thanks a lot. It was very useful.

  • @faezedarbaniyan1787
    @faezedarbaniyan1787 Před 3 dny

    Thank you so much for elaborating this. I can't relate the definition of Allele Frequency that you mentioned here for rows 2 and 3 in your sample (at 23:44 minutes). Can you please explain it for those?

  • @abebemisganaw7377
    @abebemisganaw7377 Před měsícem

    exciting video. Could you upload another video about how to analyze data using VCF tools in a Linux environment

  • @minxie2210
    @minxie2210 Před rokem

    Thank you for the great video. One quick question regarding the "What does genotype 2/0 or 1/2 mean?" section. In the 4 examples you are given, should the second one be C/T instead of C/A from the genotype numbers? Thanks again, really appreciate your effort in making all the great videos!!

  • @biomagician
    @biomagician Před 3 měsíci

    Absolutely fantastic video! Thank you! Does a gVCF always respect the VCF format or is there a distinct gVCF format? Can you tell us more about the multi-sample VCF formats jVCF and MSVCF? Thanks!

  • @jattpigeonscorner9368

    Thank you!

  • @user-up1sm2uh2r
    @user-up1sm2uh2r Před rokem +1

    Such a great lecture! I am just wondering if there is a typo at 17:00, the second row of the table at 332470 position. It has to be C/T not C/A or is there anything I missed?

  • @alexandrakassis3525
    @alexandrakassis3525 Před rokem

    Where can I find your power points you use in your videos?

  • @humarafique3093
    @humarafique3093 Před 4 měsíci

    Really really amazinggggggg and informative video for the beginners. At 16:40 the position 491520 where the GT is 1/2, there shouldn't be C/CAC instead of CAC/C?

  • @kajalpanchal8239
    @kajalpanchal8239 Před rokem +1

    everything is soo good but am i the only one who is facing sound issue? can you please consider that your sound level is really low. otherwise you are a saviour

    • @Bioinformagician
      @Bioinformagician  Před rokem

      Thank you for pointing it out. I will try to maintain optimal sound levels for my future videos :)

  • @AshishKumar-el8sb
    @AshishKumar-el8sb Před rokem

    If i have inserted the part of the same genome in a genome how can i find it

  • @nabildhifallah361
    @nabildhifallah361 Před 7 měsíci

    YES IFOUND THI VIDEO HELPFULL because i can use the whole information about the chromosome and the position the single nucleotide poistion on that chromosome (ALT) compared with the reference of DNA sequence with that i can see well if i have an insertion or convertion or deletion in the dna sample .i am thanking you for your best explanation for the metadataline ,the header and the format .thank you

  • @stemcell1167
    @stemcell1167 Před rokem

    Is there a way to get Allele frequency for each sample in multisample VCF file OR is there a way to get AO and RO .

    • @sauravroy3420
      @sauravroy3420 Před rokem

      you can slit the sample using bcftools and then use it accordingly

  • @mostafaismail4253
    @mostafaismail4253 Před rokem

    Can You make a tutorial on BS-seq and copy number variations (CNV)?
    It will be great if you did it 💛
    Thanks too much .

    • @mostafaismail4253
      @mostafaismail4253 Před rokem

      Really you are life saver for my tasks.

    • @Bioinformagician
      @Bioinformagician  Před rokem

      Thanks for the suggestion, I will surely consider covering these topics in future videos :)

  • @AshishKumar-el8sb
    @AshishKumar-el8sb Před rokem

    How to extract total genes from the genome files.

  • @sonalvishwakarma30
    @sonalvishwakarma30 Před rokem

    I want to make a request. Could you please make videos on RepeatMasker it would be really helpful

  • @anmolpardeshi3138
    @anmolpardeshi3138 Před rokem +1

    16:59 - 332470 - shouldn't that be CT or TC - since, for that position, T is reference allele (0) and C is 1st alternate allele (1) - how did you get C/A?

    • @Bioinformagician
      @Bioinformagician  Před rokem +1

      It’s a typo. It should be T

    • @anmolpardeshi3138
      @anmolpardeshi3138 Před rokem

      @@Bioinformagician thanks for the clarification and wonderful videos. I'm trying to make such an effort too. One suggestion would be to pin such clarifications so that they are not lost in a myriad of comments.

  • @vinaydeep26
    @vinaydeep26 Před rokem

    is the position of the variant with respect to the chromosome? or the whole reference? if there is chr 20 position: 1000 does it mean the variant is from the start of the reference or the chromosome?

  • @njagimwaniki4321
    @njagimwaniki4321 Před měsícem

    How can a VCF record exist where the genotype is 0|0 ? Doesn’t that mean that both the chromosomes match the reference?

  • @AshishKumar-el8sb
    @AshishKumar-el8sb Před rokem

    chrM what it denotes

  • @MuhammadFaizan-mi9yo
    @MuhammadFaizan-mi9yo Před rokem

    I have a very seruious query that got stuck at a point due to which all my projects are halted and I know you can answer my query. if you are willing to help plz reply I will post my query madam. I would be obliged to you plz take this as a request

  • @jeetnanshi4357
    @jeetnanshi4357 Před 4 měsíci

    Im sorry but the tone is very monotonus. use a marker or please take a break :(