Pseudobulk single-cell analysis in Python with Scanpy and pyDeseq2

Sdílet
Vložit
  • čas přidán 5. 09. 2024
  • It is now possible to do pseudobulk analysis directly in python on your scanpy object. I create the pseudobulk from single-cell data then analyze it with the python port of Deseq2.
    Notebook:
    github.com/mou...
  • Věda a technologie

Komentáře • 30

  • @sanbomics
    @sanbomics  Před rokem +3

    Important typo in code when making pseudo-replicates:
    Need to add [indices[i]]. It should be as follows:
    rep_adata = sc.AnnData(X = samp_cell_subset[indices[i]].X.sum(axis = 0),
    var = samp_cell_subset[indices[i]].var[[]])
    Also, If you get an error about the shape you will have to add .reshape(1, -1) to the end of sum(axis = 0)

  • @Brickkzz
    @Brickkzz Před rokem +12

    Eternally grateful for this channel - the most useful resource on scRNAseq analysis in Python on the internet!

    • @sanbomics
      @sanbomics  Před rokem +4

      Thank you :) ... Borne of my avoidance of R at all costs xD

  • @ramadatta7046
    @ramadatta7046 Před měsícem

    Hi, great channel and videos. May I know if we can use soupx corrected counts instead of raw counts?

  • @lly6115
    @lly6115 Před rokem

    My gratitude. Thank you for you time.

  • @sjorsmaassen3764
    @sjorsmaassen3764 Před 7 měsíci +1

    Thanks a lot for the tutorial. You are really doing a great service for anyone who is trying to learn more about scRNA seq analysis. I have a question that I hope someone here can anwser:
    For making a pseudobulk wouldnt it make more sence to get the mean of your counts instead of the sum? Because the sum method can be influenced by the total number of cells in a condition I would say. So if by random change you have outliers from a batch, or you have just more of a certain cell type in you tissue (which I would image to be the case for marcophages during a covid infection), this wouldinfluence you results.

    • @sanbomics
      @sanbomics  Před 7 měsíci

      Good question. Later, the counts are corrected by size factor which will account for differences due to the total number of cells.

  • @neishajmoments
    @neishajmoments Před 5 měsíci

    You are a life saver ! 😊 Thanks

  • @gracegregory4846
    @gracegregory4846 Před 3 měsíci

    Not sure if the DeseqDataSet parameters have changed since this tutorial but I had to change clinical to metadata when running:
    dds = DeseqDataSet(
    counts = counts,
    metadata=pb.obs,
    design_factors="tumour")

    • @sanbomics
      @sanbomics  Před 3 měsíci

      Yup its changed a lot. I'll be remaking it soon!

  • @estebanelias6958
    @estebanelias6958 Před 7 měsíci

    Hi. Firstly, thank you very much for these tutorials. Very useful. I have 3 questions: 1. How can I check if I saved my raw data after normalization, 2. Can pseudoreplicates be applied in an experiment with 2 conditions that contains pools of cells from 2-3 different samples? 3. How differences in the number of cells in a cluster from 2 conditions can affect DGE results with this method? Thanks

    • @sanbomics
      @sanbomics  Před 7 měsíci

      1) Make sure to save the raw data in a layer before you normalize or it wont be there. 2) Yes, this should be ok. 3) Theoretically, the counts are normalized by size factors, but if the number of cells are vastly different, some lowly expressed genes may show in the larger population just because its larger. It shouldn't affect the genes with higher expression

  • @qhawenid
    @qhawenid Před rokem +1

    Thanks much for such a concise and informative tutorial. One question. Is there a way to do pseudobulk DGE analysis between cell types? Thanks in advance.

    • @sanbomics
      @sanbomics  Před rokem +2

      You could just subset the cells by cell type, similarly to what we do here. You can pseudobulk any set of cells you can subset from your data. Although, usually cell type differences are so apparent that you don't really have to worry about pseudobulk. Maybe useful if you are comparing cell type subpopulations

    • @qhawenid
      @qhawenid Před rokem

      @@sanbomics Thanks for the timely response. You're ding God's work!

    • @sanbomics
      @sanbomics  Před rokem +1

      Thanks :) You're too kind.. It wasn't that timely xD

    • @stefisjustthebest
      @stefisjustthebest Před 2 měsíci

      Have you come across omicverse which uses pydeg to compare two cell types and do you think thats a valid way of doing it? I'm not sure they even aggregate the cells by sample origin but would be interested to hear your thoughts!

  • @ZnaniumTV
    @ZnaniumTV Před 6 měsíci

    Thank you very much for this very helpful video. I have a question regarding batch correction before using DESeq2. I obtained 6 samples using hashing; however, they were sequenced in 2 lanes, leading to a significant batch effect that can be observed. Usually, this is corrected with integration methods in Scanpy or Seurat. However, if we pseudobulk based on our hashing and obtain the raw data needed for DESeq2, we lose this batch correction step. Would you have any ideas on how to address this? I've checked that some of the options are RUVSeq or SVA. Thank you very much.

    • @marwanmohamed3844
      @marwanmohamed3844 Před 5 měsíci

      i have similar issue , of batch effect in my libraries and if i use pseudo bulk rawcounts for deseq2 i see strong batch effect, did you manage to solve this?
      thanks would appreciate your advice on this

  • @jalv1499
    @jalv1499 Před 11 měsíci

    Thank you very much! This is very helpful! I have One question: Can you clarify the difference between differential abundance analysis and this pseudo bulk approach to study the difference of two conditions?

    • @sanbomics
      @sanbomics  Před 10 měsíci

      They are similar, but pseuobulk looks at the summed expression of a population of cells and other methods might look at the distribution of expression in all cells in a population. One issue, among others, being that the high sample size of many cells inflates significance.

  • @carlahamilcaro6457
    @carlahamilcaro6457 Před 4 měsíci

    Hello thank you so much. I was wondering could I do differential expression analysis control vs treatment on all cell types at the same time ?

    • @sanbomics
      @sanbomics  Před 3 měsíci

      I would put each cell type in a loop and do them separately but you can put all the results back together in the end. I'll have an example posted in the next couple of weeks.

    • @carlahamilcaro6457
      @carlahamilcaro6457 Před 3 měsíci

      @@sanbomics oh that is amazing thank you so much ! Another question would it also be possible to do de on 3 categories at the same time ? say I want control vs sample that responded to treatment vs samples that did not respond to treatment.
      Thank you for all the help !

  • @qhawenid
    @qhawenid Před 5 měsíci

    How to randomly partition samples (for a scRNA-seq dataset with one sample per condition) to obtain pseudo-replicate samples, and annotate these in metadata of the main adata object? or is there a way to map the newly generated pseudo-replicates to the main adata object?

    • @leoburgy
      @leoburgy Před 5 měsíci +1

      You can insert the partition (described in the video) as a column (e.g., "replicate") of the adata.obs dataframe (of the main adata).

    • @qhawenid
      @qhawenid Před 5 měsíci

      @@leoburgy Thank you for this

  • @emilynwo4254
    @emilynwo4254 Před rokem

    Could you do a video on other RNA seq analysis such as SLAM-seq?

    • @sanbomics
      @sanbomics  Před rokem

      Sure, I'll keep that in mind for a future video