Bitesize Bioiniformatics: Downloading sequencing data from GEO and SRA

Sdílet
Vložit
  • čas přidán 27. 06. 2024
  • In this video we're going to go through some of the different options you have for downloading raw sequence data in fastq format from the big public sequencing databases, GEO and SRA. We'll look at a couple of different database interfaces and other accessory tools which can help unlock the valuable data that these systems provide.
    Some of the sites and tools which we mention specifically are:
    GEO: www.ncbi.nlm.nih.gov/geo/
    ENA: www.ebi.ac.uk/ena
    SRA Explorer: sra-explorer.info/
    SRA Toolkit: ncbi.github.io/sra-tools/inst...
    SRA Downloader: github.com/s-andrews/sradownl...
  • Věda a technologie

Komentáře • 28

  • @sahanamuthukumar6732
    @sahanamuthukumar6732 Před 3 lety +5

    Thank you soooo much. . . . After cracking my head for 3 days . . For the raw sequence data. . Finally I understood. . . . Thanks to you

  • @ishasingh9809
    @ishasingh9809 Před 4 lety +4

    Thank you for all these amazing videos. They are highly recommend for biologist who lack bioinformatics skills. Can you please make videos on histone ChIP seq analysis starting from fastqc to GO since all the available ones are really old. Thanks again and please keep up the great work 👍

  • @MrJonathanU
    @MrJonathanU Před 11 měsíci

    This is an excellent video. It clarifies so much! Cheers!

  • @Mr2009johnsteele
    @Mr2009johnsteele Před 4 lety +1

    Thanks, really helpful video. Keep them coming!

  • @shravastimisra6793
    @shravastimisra6793 Před rokem

    Very useful resource for us dummies. Thank you!

  • @berrydp
    @berrydp Před rokem

    Great video and walk-through. Thank you!

  • @vitortarghetta418
    @vitortarghetta418 Před 3 lety

    This video is Amazing! Thanks for uploading it

  • @fmetaller
    @fmetaller Před 3 lety +1

    Thanks. It's a great guide!

  • @sarahaghani7663
    @sarahaghani7663 Před 2 lety

    This was super useful! Thank you so much!!!

  • @antoniarosaneta
    @antoniarosaneta Před 2 lety

    Thank you very much for this valuable content. Helped me a looooot

  • @CoCo-bv4mn
    @CoCo-bv4mn Před 3 lety

    What a invaluable material

  • @nesibesebnem2685
    @nesibesebnem2685 Před 3 lety

    such a valuable content mashallah. thanks a lot.

  • @taifshah9003
    @taifshah9003 Před 3 lety

    Thank you for your quick response...

    • @BabrahamBioinf
      @BabrahamBioinf  Před 3 lety

      Please contact us at babraham.bioinformatics@babraham.ac.uk with any questions. Thanks.

  • @romanatorx3949
    @romanatorx3949 Před 3 lety

    Amazing video - I love the --cantspell :D

    • @simonandrews5604
      @simonandrews5604 Před 3 lety

      Thanks! Fortunately they've fixed that so it's not an issue with newer releases. We've also expanded support in SRAdownloader - it can now also download fastq files directly from ENA which seems quicker and more reliable, and it can also just be fed a list of SRR accessions (or a single SRR name) to get the corresponding data.

  • @yajinghe1092
    @yajinghe1092 Před rokem

    Very great lesson! I have a question, at around 15:02min, when you swich the downloaded fastq data to analysis screeen, what is that analysis app? I cannot get it at this step. Thank you very much!

    • @BabrahamBioinf
      @BabrahamBioinf  Před rokem

      Hi, the software download at 15:02 is the route (direct from NCBI) which is NOT recommended. Other options are detailed from 17:04

  • @guruprasadh7928
    @guruprasadh7928 Před 3 lety

    Thank you for the informative video. I have a question and would request you to help me out. Should we consider SRR files as technical replicates or should we have to pool the SRR files for further analysis?

    • @simonandrews5604
      @simonandrews5604 Před 3 lety

      If you have multiple SRR accessions for a single SRX then the implication from the strucutre of the database is that these are technical replicates of the same sample, so the same library split across multiple sequencing lanes. I would therefore look at merging them before analysing them. I'd also recommend reading the associated paper and metadata in case the submitters have done something strange within the constraints provided by GEO/SRA though.

    • @guruprasadh7928
      @guruprasadh7928 Před 3 lety

      @@simonandrews5604 Thank you so much for helping me out.

  • @abdullahimuhammadsirajo9647

    Great and informative video. Thanks for this. But is it possible to convert the downloaded data to say excel/csv format unaltered?

    • @simonandrews5604
      @simonandrews5604 Před rokem

      The raw data you get from these databases are not going to be suitable to put into something like excel - they'd be way too big. If you're after quantitations generated from the data then every entry in GEO has to have a quantitated data file with it. There are no fixed rules for what this file has to be so the contents vary wildly from sample to sample. In some cases you would have a file which would be compatible with a spreadsheet (a matrix of samples vs counts or normalised expression for example), but in many cases even this won't be suitable (bigWig files for whole genome quantitation for example). The quantitated data will appear as a supplementary file at the bottom of the sample's GEO page. The metadata on the sample will describe what the quantitation is and how it was generated.
      A more reliable way to deal with this is to process the data to generate the quantitations you want. It's more work but at least you know what you're getting and you can do it consistently.

    • @abdullahimuhammadsirajo9647
      @abdullahimuhammadsirajo9647 Před rokem

      @@simonandrews5604 Thank you for your detailed clarification. I really appreciate it.

  • @michaelagronah
    @michaelagronah Před 2 lety

    Thanks so much for this video. I am running Ubuntu 20.04 and can't get sradownloader installed on it. Does sradownloader work on Ubuntu 20.04?

    • @simonandrews5604
      @simonandrews5604 Před 2 lety

      It should do - anywhere with a recent python should work. If you're having problems can you open an issue in the sradownloader issue tracker and post the full output of the command you ran.

    • @michaelagronah
      @michaelagronah Před 2 lety

      @@simonandrews5604 Thanks so much for the quick response. I will update my python and reinstall it. Once again thanks so much

  • @kennyday8767
    @kennyday8767 Před 2 lety

    Hmm "Bitesize" Bioinformatics...this video is 45m long...and goes well outside the scope of SRA.