2. Local Alignment (BLAST) and Statistics

Sdílet
Vložit
  • čas přidán 11. 09. 2024
  • MIT 7.91J Foundations of Computational and Systems Biology, Spring 2014
    View the complete course: ocw.mit.edu/7-9...
    Instructor: Christopher Burge
    In this lecture, Professor Burge reviews classical and next-generation sequencing. He then introduces local alignment (BLAST) and some of the associated statistics.
    License: Creative Commons BY-NC-SA
    More information at ocw.mit.edu/terms
    More courses at ocw.mit.edu

Komentáře • 45

  • @sfmambero
    @sfmambero Před 4 lety +9

    56:45 statistics of alignment

  • @DeanJordan-qp8kw
    @DeanJordan-qp8kw Před 3 měsíci +1

    Timestamp: 33:01
    Goal by End of Session: 33:01
    edit: complete! :) coming back in one hour to finish the lecture

  • @ProfBeckmann
    @ProfBeckmann Před 3 lety +3

    Thanks! Awesome lecture. Nice to see the dideoxy ntp structures shown - super helpful!
    I would like to hear more about the errors prone to each methodology. would also be nice to see a read length table for each technology.
    Can you make a video explaining how to setup and sort the barcoding reactions referenced? How long are the barcodes?

  • @hgz11
    @hgz11 Před rokem +2

    good question by the young lady...helped me clarify things in my head.

  • @phartatmisassa5035
    @phartatmisassa5035 Před 9 lety +23

    I was thinking about ways to optimize the matching algorithm in assembly, and this is what I came up with.
    Let Mx, Nx, My, and Ny be CPU registers of 64bits each.
    Recall base pairs A-T, G-C, and (A,G) are purines, and (T,C) are pyrimidines.
    Let registers Mx and Nx represent whether the base at bit location i is an A-T pair as 0, or a G-C pair as 1
    Let registers My and Ny represent whether the base at bit location i is a purine (A or G) as 0, or a pyrimidine (T or G) as 1
    We can now represent a base at bits i of registers Mx and My, (ie both regs Mx and My are required to represent a sequence, the N registers represent another sequence we are comparing to):
    Let sequence M be: ACTCCGAT... (just to keep things short)
    Reg Mx would equal: 01011100...
    Reg My would equal: 01111001...
    I hope that makes sense, if not just ask and I will try to clarify
    Let sequence N be: GCTAAGAT...
    Reg Nx would equal: 11000100...
    Reg Ny would equal: 01100001...
    Now the fun part, you can compare a sequence (or sub-sequence) of 64 bases at once with these instructions (MMX might make things faster if you can do bitwise intructions with them, haven't got there yet in my studies):
    This is the formula I worked out to get the matches in a sequence of 1's for match, 0 for mismatch at that base/bit position: (~ is not, ^ is xor, | is or)
    ~ ( (Mx ^ Nx) | ( My ^ Ny) )
    that would be...
    -----------
    xor r8, r9 ;; 1 at bit i if miss-match, r8 = Sx = Mx ^ Nx
    xor r10, r11;; 1 at bit i if miss-match, r10 = Sy = My ^ Ny
    or r8, r10 ;; 1 at bit i if miss-match in either Sx, or Sy, or both
    not rax, r8 ;; rax now has 1 in bit i if there was a match at bit/base i
    ------------
    ...in pseudo x64 assembly
    if SSE4.2 is available on the CPU, then the popcnt instruction is available, it returns the number of 1's in a register, something like:
    ------------
    popcnt rbx, rax ;; would put the number of matches in Seq M and N
    -------------
    So you can then divide the number of matches (in rbx) divided by 64 times 100 to get the percentage of matching bases in Seq M and Seq N, you can also do bit scanning or masking to evaluate exactly which bases matched, that is represented in rax already.
    You can "clap" together 64 bases at once using this method, though it would take some preprocessing of a sequence represented as chars (bytes in ASCII). You can also speed it up by parallelizing over many sub-sequences on different cores, or maybe use GPGPU with OpenGL/CL.
    The same thing can be done in C, just use unsigned long integers, or uint64_t, don't use signed, it get confusing and the two's compliment will most probably screw things up.

    • @5050view
      @5050view Před 7 lety

      @Rohan Patgiri: What's your problem dude? People are here to learn and discuss the lecture. You are not in the gossip section!

    • @ProfBeckmann
      @ProfBeckmann Před 3 lety

      Is it much faster, coding it in assembly?

    • @Mahi-nw5vh
      @Mahi-nw5vh Před rokem

      Be my sensei

  • @coltonhenry3636
    @coltonhenry3636 Před 10 měsíci +1

    The girl from 43:14 with her hand up is so f☠️ing relatable 😂😂

  • @coconutade111
    @coconutade111 Před 8 lety +5

    You guys are awesome! thank you for providing this class!

  • @sfmambero
    @sfmambero Před 4 lety +3

    31:30 Local alignment

  • @rabiafidan1502
    @rabiafidan1502 Před 5 lety +1

    In llumina sequencing, we add four dNTPs and one of them is attached then we reverse termination and add four dNTPs again. Think about a situation where there are C repeats for example CCCCACCCTCCCCCGCCCCCG now, in this situation concentration of dGTPs will be lower than other dNTPs. Doesn't this increase the risk of error on Cs and misincorporation? each time you need dGTP, its concentration is lower.

  • @soheilh
    @soheilh Před 3 lety +2

    Thank you

  • @luisheribertogarciaislas1785

    Thank you for sharing! I'm new on bioinformatics and I find it really interesting!

    • @JRush374
      @JRush374 Před 6 lety +1

      Luis Heriberto Garcia Islas I just got done with this bioinformatics lecture series from UC Davis and it was really good.
      Fundamental Algorithms in Bioinformatics: czcams.com/play/PL_w_qWAQZtAbh8bHfzXYpdnVKCGCDmmoL.html

    • @luisheribertogarciaislas1785
      @luisheribertogarciaislas1785 Před 6 lety

      Thank you Josh!
      I'll start to take a look at it right away!
      Regards!

  • @EDUARDO12348
    @EDUARDO12348 Před 9 lety

    Break the query into 2-3 letters and find where the best matches lie. Then evaluate based on best fits

  • @kishorkumarsarker8270
    @kishorkumarsarker8270 Před 4 lety +3

    Could you please help to find out previous lecture of it. Means the first lecture.

    • @Ilya_Sss
      @Ilya_Sss Před 3 lety

      Here are the first lecture czcams.com/video/lJzybEXmIj0/video.html

  • @MrAntihumanism
    @MrAntihumanism Před 9 lety +2

    Very good to listen to.

  • @phartatmisassa5035
    @phartatmisassa5035 Před 9 lety +4

    Thank you!

  • @Yzhang250
    @Yzhang250 Před 5 lety +1

    When the lady asked can we do something better than m times n. I was thinking about KMP. I know KMP is for exact match, but can we make some modification and make it work?

  • @marwanmohamed6575
    @marwanmohamed6575 Před 5 lety +3

    what kind of statistics i should learn to understand this ... i know nothing about statistics

    • @mitocw
      @mitocw  Před 5 lety +5

      One of these two prerequisites for this course might help: 18.440 Probability and Random Variables, 6.041 Probabilistic Systems Analysis and Applied Probability. See the course on MIT OpenCourseWare for more details: ocw.mit.edu/7-91JS14. Best wishes on your studies!

  • @NirvanSengupta
    @NirvanSengupta Před 9 lety +1

    Good overview of traditional Sanger sequencing and next generation sequencing technologies like roche 454, illumina, ...

  • @maheribrahim685
    @maheribrahim685 Před 2 lety

    i'm a biology major with no cs background and very little stat and math knowledge, is it okay if i get nothing at all, and how to change that?!

  • @sumitkumar-el3kc
    @sumitkumar-el3kc Před 4 lety

    1:01:39 does Pi and Rj being 1/4 means we're assuming there's the equal distribution of all four bases in query as well as subject?

  • @sumitkumar-el3kc
    @sumitkumar-el3kc Před 4 lety +1

    How is it O(mn) when the last bases of the query are never touching the first bases of the subject? Am I missing something??

    • @sumitkumar-el3kc
      @sumitkumar-el3kc Před 4 lety +1

      Or do we also go right to left in the alignment after left to right? Please help anyone??

    • @turnerburchard5272
      @turnerburchard5272 Před 4 lety +3

      @@sumitkumar-el3kc O notation isn't that precise. You could say it is something like O(mn-n) but at that point the difference is inconsequential so we estimate it simply to O(mn). In practice, the algorithm won't take exactly mn computations, but it will at least be on the same scale so estimation is easy.

    • @sumitkumar-el3kc
      @sumitkumar-el3kc Před 4 lety

      @@turnerburchard5272 understood, thank you, I'm new to these concepts.

  • @bikidas6690
    @bikidas6690 Před 3 lety +2

    Any book for bioinformatics?

    • @Joeythegoats
      @Joeythegoats Před 3 lety

      "bioinformatics and functional genomics" is best so far

  • @michaelchoi5880
    @michaelchoi5880 Před 6 lety +1

    Thank you very much

  • @AhmedIbrahim-co9rw
    @AhmedIbrahim-co9rw Před 5 lety

    what book did the professor use in his lectures ?
    are there any textbook or slides available to facilitate studying ? thanks

    • @mitocw
      @mitocw  Před 5 lety +1

      Visit the course on MIT OpenCourseWare to see all the materials (readings, lectures notes, assignments, projects) available for the course at: ocw.mit.edu/7-91JS14. Best wishes on your studies!

  • @shubhrabhattacharya223

    Where can I get the pdf of the book he referred to?

    • @mitocw
      @mitocw  Před 6 lety

      Visit the course on MIT OpenCourseWare for the materials at: ocw.mit.edu/7-91JS14.

  • @gurpchirp
    @gurpchirp Před 2 lety

    yo i am fucking lost. can someone list prereqs?

  • @ZortLF2
    @ZortLF2 Před 6 lety +6

    What a dismally bad lecture. Blasted through the biology stuff dropping jargon in the first lecture of the semester, wasted time splitting hairs for the algorithm stuff. Why spend 10 minutes talking about lambda just to conclude with saying it's not important???

  • @solomongrundy2211
    @solomongrundy2211 Před 4 lety +2

    im a nigga

  • @tartanhandbag
    @tartanhandbag Před 7 lety +17

    while i'm sure this presenter has great knowledge, his presentation style is so dull and dry that even though the material is mind blowing, i have once again failed to make it all the way through the video on my 3rd attempt.

    • @ProfBeckmann
      @ProfBeckmann Před 3 lety

      just up the speed of playback bruh. 😎

  • @fahlihamza7937
    @fahlihamza7937 Před 8 lety

    address address searched sddcdse the same time as the