Crack A/B Testing Problems for Data Science Interviews | Product Sense Interviews

Sdílet
Vložit
  • čas přidán 26. 06. 2024
  • A/B Testing questions are very commonly asked in Data Science interviews together with metric ("case") problems. In this video, we will go over everything you need to know about A/B testing. Make sure you stay till the end. I am going to share other A/B testing resources to help you with your interview preparation.
    Read a More Comprehensive Article on A/B Testing
    towardsdatascience.com/7-a-b-...
    Step by Step Guide on Calculating Sample Sizes for A/B Tests
    • Sample Size Estimation...
    Cracking Product Sense Problems in Data Science Interviews
    • Crack Metric/Business ...
    Udacity's A/B Testing Course www.udacity.com/course/ab-tes...
    My friend Kelly's post on Towards Data Science towardsdatascience.com/a-summ...
    Book: Trustworthy Online Controlled Experiments www.amazon.com/Trustworthy-On...
    LinkedIn's Ego Cluster Paper
    arxiv.org/pdf/1903.08755.pdf
    🟢Get all my free data science interview resources
    www.emmading.com/resources
    🟡 Product Case Interview Cheatsheet www.emmading.com/product-case...
    🟠 Statistics Interview Cheatsheet www.emmading.com/statistics-i...
    🟣 Behavioral Interview Cheatsheet www.emmading.com/behavioral-i...
    🔵 Data Science Resume Checklist www.emmading.com/data-science...
    ✅ We work with Experienced Data Scientists to help them land their next dream jobs. Apply now: www.emmading.com/coaching
    // Comment
    Got any questions? Something to add?
    Write a comment below to chat.
    // Let's connect on LinkedIn:
    / emmading001
    ====================
    Contents of this video:
    ====================
    0:00 Intro
    1:26 What is A/B testing
    2:30 Designing an A/B test
    4:46 Multiple testing problem
    7:47 Novelty and primacy effect
    9:38 Interference between groups
    12:51 Dealing with interference
    15:37 Resources

Komentáře • 124

  • @emma_ding
    @emma_ding  Před 3 lety +79

    FAQ:
    1. 7:46 it should be 10 rather than 1 False positives in 200 metrics, thanks Ayank for pointing it out!
    2. Running one A/B test with 10 variants vs running 10 A/B tests
    10 variants testing means you have 1 control and 9 treatments. For example, you want to test 10 colors of a button, so each group of users see a different color. It's different from 10 A/B tests (each with 2 variants). For the 10 color example, you could run 10 A/B tests each have two variants (1 control and 1 treatment) but it's less efficient. This article may help understand the multiple testing concept home.uchicago.edu/amshaikh/webfiles/palgrave.pdf
    3. 5:50 Probability of "no false positive"
    For details of how it's computed as (1 - alpha) ^ n, you can read more from home.uchicago.edu/amshaikh/webfiles/palgrave.pdf
    4. 12:20 in two-sided markets the treatment effect would be overestimated. Why is that?
    For example, if a small group of Uber users receives incentives to have more rides, there will be enough driver to accommodate for the additional demand. However, if the incentives extents to all users, it's likely there will be not enough drivers to meet the huge increase of the demand (in the short term). Therefore, the treatment effect would likely to be overestimated.
    Feel free to ask questions below. Your questions may help others as well!
    If you have specific questions in your job search, feel free to reach out to me here data-interview-questions.web.app/.

    • @rachitsingh3299
      @rachitsingh3299 Před 3 lety

      5:50
      Why .05 is subtracted.
      I understand the part of no false positive.
      Why .05?

    • @jasonchen3062
      @jasonchen3062 Před 3 lety

      @@rachitsingh3299 5% type 1 error is a commonly used value

    • @leoyuanluo
      @leoyuanluo Před 3 lety +1

      Hey emma, in 12:06 you said, "...a new product that attracts more drivers in the treatment group...", is the objective of the treatment group to attract more drivers or to make uber users to call more uber rides?

    • @sitongchen6688
      @sitongchen6688 Před 3 lety +2

      Hi Emma, thanks for your great sharing! Regarding the point 4 above, I feel this is a comparison between pre and post launches of a new feature. What about bias during ab test between control and treatment groups? I think that should also be overestimation of true treatment effect, since there will be less available drivers for the control group which will cause a less number of rides completed than that of normal scenario for control group riders.

    • @oliverxu5134
      @oliverxu5134 Před rokem

      For False Positive Rate, I want to know, how do you know a rejection is false positive. I mean, unlike classification, we know the true label and prediction, so we know whether a prediction is false positive or not. But in this case, we don't know whether the null hypothesis is true or not. Then how do we know a rejection is false positive? Besides, for each rejection, should we use the original criteria (p = 0.05) to reject?

  • @goodjuju2132
    @goodjuju2132 Před 3 lety +7

    I was really struggling with A/B testing. This video + your friend Kelly's post just helped me ace an interview on it! You are a treasure

  • @lexichen4131
    @lexichen4131 Před 3 lety +17

    This 16mins saved me 3 hrs at least, thanks so much!

  • @abtestingvideos2259
    @abtestingvideos2259 Před 3 lety +40

    This is more helpful than a paid A/B testing course on Udemy! Emma, you are so awesome!

  • @jeoffleonora4612
    @jeoffleonora4612 Před 3 lety +1

    This is the best ab testing video. Period.

  • @aspark47
    @aspark47 Před 3 lety +5

    Awesome content. I appreciate the structured walk-through of potential problems in designing A/B testing. I also like the idea of summarizing "trustworthy online controlled experiments." Looking forward to it!

  • @taylorlee8196
    @taylorlee8196 Před 3 lety

    Best video ever! Very organized and oriented! Look forward to seeing more!

  • @1-person-startup
    @1-person-startup Před 3 lety +1

    this channel is a goldmine

  • @jieyuwang5120
    @jieyuwang5120 Před 3 lety

    Really great video! Thanks for making it available to everyone!

  • @klimmy.
    @klimmy. Před 2 lety +1

    Hey Emma, thank you, that's really helpful!
    Please note, for the multiple testing problem there is a common confusion between p-value and false positive error ratio and what you calculated on 5:58 I believe is not particularly a false positive. They are related, but not the same (for the reference you may use pages. 41, 186 of Trustworthy experiments, or an article A dirty dozen: twelve p-value misconceptions). False positives depend on p-value and the prior belief in the Hypothesis.
    This example helped me: if you are trying to convert steel to the gold you may get in the experiment p-value = 0.05. But our prior belief is that we cannot do that from the chemical perspective. So 100% of rejections will be false, or False Positives will be 1.00 for our experiment (not 0.05).
    In probability terms (H0 means null hypothesis is true, D means data observed):
    False Positive Rate = P(H0, D)
    p-value = P(D | H0) by definition
    Their relations: P(H0, D) = P(D | H0) * P(H0)
    Hope that'd help :)

  • @alanzhu7538
    @alanzhu7538 Před 2 lety

    Love the content! Keep going!

  • @miamamia354
    @miamamia354 Před 3 lety

    Great! I am also reading the book you recommended. Looking forward to the next video.

  • @Theartsygalslays
    @Theartsygalslays Před 2 lety +2

    So well articulated and enlightening! This is the vocabulary I wish I had to explain A/B testing stats to less technical folks in the past. Thank you!

    • @emma_ding
      @emma_ding  Před 2 lety

      Thank you for your kind words Veronica! :)

  • @MrBlackitalian
    @MrBlackitalian Před 2 lety

    Thank you so much for the resources!!

  • @halflearned2190
    @halflearned2190 Před 3 lety

    Excellent content, thanks!

  • @minma1987
    @minma1987 Před 2 lety

    This was very helpful, thank you!

  • @kellypeng9026
    @kellypeng9026 Před 3 lety +8

    Very comprehensive content! Honored to be mentioned in Emma’s video! 😄😄😊

  • @hameddadgour
    @hameddadgour Před rokem

    Great content! Thank you for sharing.

  • @linhe5896
    @linhe5896 Před 3 lety +5

    I enjoyed this one a lot Emma. You are becoming a pro at youtube content and style. You show more facial expression => user engagement. The second part I like is how relevant it is to real interview questions. Please keep going, and perhaps a case study combined with product sense and AB testing for future topic.

  • @afridmondal3454
    @afridmondal3454 Před rokem

    Amazing Explanation! Loved it ☺

  • @poopah4497
    @poopah4497 Před 2 lety

    Thank you. Watch multiple times.

  • @halflearned2190
    @halflearned2190 Před 3 lety

    Nice video, thanks!

  • @goelnikhils
    @goelnikhils Před 2 lety

    Thanks a lot. Amazing content

  • @user-er7sn7ef2p
    @user-er7sn7ef2p Před 3 lety

    Brilliant!

  • @hasantao
    @hasantao Před rokem

    Very well done.

  • @xingchenwang1471
    @xingchenwang1471 Před 3 lety +4

    I just read the summary article by Kelly a few days ago

  • @tinos0330
    @tinos0330 Před 2 měsíci

    wow it's very informative emma

  • @mussdroid
    @mussdroid Před 3 lety

    I want to be data scientist. Emma rocks the industry 🙏

  • @shelllu6888
    @shelllu6888 Před 2 lety +1

    Hey Emma, thanks a lot for creating the video. tbh this is the most applicable ab testing video I've watched on CZcams! Great job on creating this and thanks for making the video and help the data science community grow.
    1. Got a quick question on determining the # of days to run AB testing, you mentioned to divide sample size by # of users in each group. If we have multiple groups with not equal number of users, how do we decide # of days to run AB testing accordingly?
    2. About FDR: I'm still a bit confused on the definitions, why the formula involves calculating expectations, is FDR a random variable? (If I'm lagging so much behind, could you help throw me a link so that I can read more to pick up?)
    Thanks so much again!

  • @ceciliaxu
    @ceciliaxu Před 3 lety +3

    This is very helpful. Your voice is like one of my teacher at Bittiger. Her name is also Emma. 😊😊

  • @carloschavez9740
    @carloschavez9740 Před 2 lety

    I 've read a lot of articles and this video is amz

  • @zzzs5545
    @zzzs5545 Před 3 lety

    Great! Looking for more ab testing contents.

  • @judyhe686
    @judyhe686 Před 3 lety +3

    Hi Emma, thanks for this video and it's super helpful! I have a question around the ego-network randomization to solve network effect. I don't understand how it works because even if each user is not assigned a feature, they are still likely affected by users in the treatment group when it spills over? Can you elaborate more on that? Thanks!

  • @Alexandra-he8ol
    @Alexandra-he8ol Před 3 lety

    Thank you very much🙏🏻

  • @diegozpulido
    @diegozpulido Před 3 lety

    Hi Ema. Thank you very much for your videos. Thanks to them I got a Senior Data Scientist position at Facebook. I will forever thank you for your exceedingly good work.

    • @emma_ding
      @emma_ding  Před 3 lety

      Congrats! I'm so glad to hear it, best of luck with your new job!

  • @RobertoAnzaldua
    @RobertoAnzaldua Před 2 lety

    Great video, thanks for posting :D

    • @emma_ding
      @emma_ding  Před 2 lety

      My pleasure! So happy it was helpful for you Roberto!

  • @iOSGamingDynasties
    @iOSGamingDynasties Před 3 lety +4

    Great video Emma, some of the best A/B testing materials I have to say. However, I have some questions, when we say sample size, does it mean adding control + treatment groups? I read from somewhere that it is just the number of experimenters in a single group. Also why when we calculate the time it takes to run A/B test, we use the formula (sample size/# of users in a group)? Group here means control/treatment or just a batch of users that we show the experiment to at a single time? Do you think that it is a good idea to show all users at the same time, when the required sample size is small? Thanks!

  • @sophial.4488
    @sophial.4488 Před 2 lety

    Quality content in each and every video. Emma you are great to condense information into digestible format.

  • @timhsu87
    @timhsu87 Před 2 lety

    Thank you so much 😊

  • @zhefeijin9627
    @zhefeijin9627 Před 3 lety +4

    Hi Emma. One more such useful video!! I have a question about 'split the control and treatment group by cluster'. I know the clustering by geo-location can introduce some selection bias. For example, we do not know if it works in the U.S when we test it in Spain. Therefore, Facebook and Linkedin make the cluster according to the social graph. My question is 'if randomly take some of these clusters (social graph) for testing, will it also have any selection bias'? Thank you so much!

  • @ARJUN-op2dh
    @ARJUN-op2dh Před 3 lety

    Amazing........!!!!!!!!!!!

  • @omid9422
    @omid9422 Před 2 lety

    Excellent

  • @kelseyarthur6421
    @kelseyarthur6421 Před 2 lety

    Great video

  • @jfjoubertquebec
    @jfjoubertquebec Před 2 lety

    Subscribed, liked. Finally, who talks like an adult.
    Thank you for your ptofessionalism!

  • @santoshbiswal6567
    @santoshbiswal6567 Před rokem +1

    Thanks Emma for putting this up. One question: If we want to compare total revenue/acquisition of Test and Control group, what test(z-test ,Chisquare etc) can be used to test hypothesis? Population size > 1Mn

  • @thegreatlazydazz
    @thegreatlazydazz Před 3 lety

    I would like to say that I whole heartedly support the idea of making a video on the book with the picture of the hippo. I am from a staistics background, but never quite understood how stats were being used in this ab testing setting. Thanks a ton!!!!!

  • @SerenaKong
    @SerenaKong Před rokem

    Thanks for sharing these videos. It is really clear and helpful! I have a question. How can we know if there is the spillover effect between control group and treatment group? If there is any way to detect it?

  • @zenofall4455
    @zenofall4455 Před 3 lety +2

    Emma your channel is brilliant. Thanks for creating this content. I had a quick follow up question:
    Lets say we do a small format change on posts at FB and want to measure if this has any effect on user interaction.
    We choose metric:
    #UsersWhoEngagedinAction/#TotalUsers
    Based on your A/B testing video - where you used approx formula N = 16*var/d^2 , to determine sample size.
    typically for a binomial distributed metric like one we chose:
    var= p*(1-p) , say if p=0.2, and dmin=2%, sample size comes to ~6400.
    For a big company like FB where they have 2.5B DAU, approx 30K users active per min
    (Assumption: Ignoring any other splitting of users by characterstics or time of day)
    if we decide to only use 1% of our active users per min (30k * 1% ) and split them into two groups - 150 each, the minimum samples required would be collected in 21mins. 6400/300.
    Is that correct? - are the experimentation durations this small for a problem like this at a high traffic platform.

    • @emma_ding
      @emma_ding  Před 3 lety +4

      You are right on the math. But in reality, companies don't assign all users to either control or treatment groups of a single test. It's due to a few reasons: 1. they may run hundreds (if not thousands) of experiments in parallel (especially in companies such as FB) so each test don't get that many users. 2. In reality, it's more common to have a "ramping" process to control risks rather than splitting all users into either control or treatment, so the duration will be longer than the calculated value.

    • @lanaherman
      @lanaherman Před 3 lety

      why did you take (var=p*(1-p) instead of var=p*(1-p)*n) and (dmin=2%)?

  • @yihongsui4525
    @yihongsui4525 Před 11 měsíci

    Hey Emma, thanks so much for the great video!
    9:34 when test is already running while you want to deal with the novelty and primacy effect, would it be better to compare "first time users in treatment" vs "first time users in control"? or even... compare "first-time in treatment vs first-time in control" vs "old in treamtment vs old in control"?

  • @plttji2615
    @plttji2615 Před 2 lety

    Hi Emma, thank you for the video. What if we want to decide among two features how can we design the AB testing? or Is it multivariate testing?
    Thank you

  • @teddy911
    @teddy911 Před 2 lety

    小姐姐的视频真不错啊,很有用

  • @jonathanloganmoran
    @jonathanloganmoran Před 3 lety

    Fantastic video-thank you, Emma, for your help! Just an FYI, you forgot to reference LinkedIn's ego-cluster paper in the description (14:45).

    • @emma_ding
      @emma_ding  Před 3 lety

      I added a link to the paper in the description. Thanks!

  • @karundeep07
    @karundeep07 Před 3 lety +12

    Hey Emma,
    One more quick questions - At 3:50 when we are calculating Sample Size, it is said that we can get variance from the sample. Just wondering how we can get variance while we are in the first phase of designing A/B Test and we have not run the experiment and we don't have the sample yet. How we will get the sample variance?
    Please help me here as well.

    • @tejashshah5202
      @tejashshah5202 Před rokem

      Hi @Karundeep Yadav, did you find out the answer to your question. Would love to hear the answer in that case. Had same question too!

  • @tejas5872
    @tejas5872 Před 3 lety

    Hey Emma, Thank you for the valuable content. I've been following your channel and it's helping me regarding the expectation in the interview!. I just have a question - You mentioned coding round will be conducted in the first round. Will the coding round be based on data structures (Linked Lists, Queues, Stacks, Dynamic programming etc) or basic coding challenges like print a palindrome? Please help

    • @emma_ding
      @emma_ding  Před 3 lety

      Good question! This blog summarizes all the different kinds of coding interviews and I think it may clarify things towardsdatascience.com/the-ultimate-guide-to-acing-coding-interviews-for-data-scientists-d45c99d6bddc!

  • @lisawenyingliu3801
    @lisawenyingliu3801 Před 2 lety +1

    Hi Emma, thanks a lot for making these high quality tutorial videos, very helpful. But it is hard for me to understand because I don't have any basic knowledge, can I ask do you have any book to recommend for me to read so that I can better understand your videos?

    • @emma_ding
      @emma_ding  Před 2 lety

      Hi Lisa, please check out this blog!
      towardsdatascience.com/how-i-got-4-data-science-offers-and-doubled-my-income-2-months-after-being-laid-off-b3b6d2de6938#6f86

  • @neeru1196
    @neeru1196 Před 3 lety +1

    It would help if you explained the variables and talked about "parameters" in detail. Thanks for the video!

    • @emma_ding
      @emma_ding  Před 3 lety +1

      Noted! Thanks for the feedback!

  • @yidanhu7889
    @yidanhu7889 Před 2 lety

    Hi Emma, I do not understand ego-network randomization. What is the difference between it and "create network effect" method? I do not understand your sentence in the video "meaning the effect of my immediate connections treatment on me"? Could you please help? The paper is too long. Thank you!!

  • @cl2hanovastar
    @cl2hanovastar Před 2 lety +1

    at 7:43, what does '200 metrics" mean? According to definition of FDR, it should be 200 rejected null hypothesis but not 200 tests. Could you please clarify?

  • @amneymnr6455
    @amneymnr6455 Před 3 lety

    Thanks Emma! I got this question on a previous interview and would love your thoughts:
    'What methods can you use when an A/B test cannot or has not been conducted?"

    • @emma_ding
      @emma_ding  Před 3 lety

      Ideas could be comparing before and after. Or implementing and compare variants in different geo-regions (or based on other user segmentation methods). You can google and explore more ideas.
      Depending on the problem, the downside of not using A/B is you may need more effort on analysis and/or bias correction.

  • @nope4881
    @nope4881 Před 3 lety +1

    Hi, great topic! I have a question! You mentioned the 'difference between treatment and control' = 'delta' can be obtained by MDE. How do we get it? How to estimate 'delta' from MDE? Also, can you show an example of the use the sample size = 16*sample variance/delta formula, obtain 'delta' from MDE and get a value of 'sample size'. Hope you understand the question :)

    • @emma_ding
      @emma_ding  Před 3 lety

      You can refer to this video czcams.com/video/JEAsoUrX6KQ/video.html for derivation of the sample size.

  • @allison-hd1fg
    @allison-hd1fg Před 2 lety

    Is minimum detectable effect the same thing as practical significance?

  • @lingli8999
    @lingli8999 Před 3 lety

    Emma, another great video, thank you! I had a question. You mentioned in this video that referral program is usually considered as long-term. I understand for referral programs for like housing, it takes a long time. How about other referral programs like Uber eats, Robinhood new user program with a random stock? Can those be tested with A/B testing?

    • @emma_ding
      @emma_ding  Před 3 lety

      Even for Uber eats and Robinhood referral programs are still longer time compared with instantaneous change eg. feature update. You can A/B test those but with longer feedback loop.

    • @lingli8999
      @lingli8999 Před 3 lety

      @@emma_ding Thanks a lot Emma!

  • @nplgwnm
    @nplgwnm Před 3 měsíci

    Video was made in 2021, and I busted into laughter when “company X” is mentioned 😂 who would know, right? 😂😂😂

  • @rachitsingh3299
    @rachitsingh3299 Před 3 lety

    Hey Emma! can you explain the difference between A/B testing and experimental design?

    • @emma_ding
      @emma_ding  Před 3 lety

      A/B testing is the same as online controlled experiment.

  • @janeli2487
    @janeli2487 Před 3 lety

    Hi Emma,
    Thanks for your video, It's very comprehensive. I am wondering what would you do or communicate with PMs if the p-value is just a little bit missed, such as you got 0.051 while you defined your significant level at 0.05? Thanks

    • @emma_ding
      @emma_ding  Před 3 lety +2

      The situation is debatable. An option could be to run the experiment a little longer to see if the p value changes. The bottomline is you don't want to compromise the criteria (ie the significance level) after seeing the results.

    • @janeli2487
      @janeli2487 Před 3 lety

      @@emma_ding Thanks!

  • @XuJiBoY
    @XuJiBoY Před 3 lety

    Hi Emma, thank you very much for the great informative video! I have a question: at 12:20 you mentioned that in two-sided markets the treatment effect would be overestimated, may I know why is that? I can't quite figure it out.

    • @emma_ding
      @emma_ding  Před 3 lety +3

      For example, if a small group of Uber users receives incentives to have more rides, there will be enough driver to accommodate for the additional demand. However, if the incentives extents to all users, it's likely there will be not enough drivers to meet the huge increase of the demand (in the short term). Therefore, the treatment effect would likely to be overestimated.

    • @XuJiBoY
      @XuJiBoY Před 3 lety

      @@emma_ding Thank you very much for the explanation. This makes sense. So it's the resource competition in the population of all users, which was not an issue in the sub-population of the experiment. I guess it's probably assumed that the treatment effect is focusing on the increase in successful ride transactions, instead of pure ride demand from users (regardless of fulfillment of the demand).

  • @alanzhu7538
    @alanzhu7538 Před 2 lety

    14:40 When you talked about splitting the clusters, do you mean randomly splitting people within a cluster to treatment and control group?

    • @nipundiwan
      @nipundiwan Před 2 lety

      Let's say there are a total of n clusters in the entire sample. You randomly assign n/2 clusters to the treatment group and the remaining n/2 clusters to the control group.

  • @haowu6918
    @haowu6918 Před 2 lety

    How to estimate the variance from datasets?

  • @LauraLigmail
    @LauraLigmail Před 2 lety +4

    Hey Emma, for 5% FDR, would u mind helping me understand how you got to ‘at least 1 false positive for 200 metrics ‘?

    • @jessesong9546
      @jessesong9546 Před 2 lety

      I think she meant that the probability of observing at least 1 false positive among 200 metrics is .05, hope this makes sense.

  • @roshanpatnaik1902
    @roshanpatnaik1902 Před 2 lety

    Hi Emma,
    In the sample size discussion i.e. where you mentioned that sample size is 16 sigma square/ Delta square, sigma is sample variance of the test or control?

    • @emma_ding
      @emma_ding  Před 2 lety

      Hi Roshan, thank you for your question. Have you checked out my video -> czcams.com/video/VpTlNRUcIDo/video.html, where I discuss the basics of A/B testing? Have a look and let me know if you still have questions! Thanks for watching and sharing!

  • @yingyingxu9926
    @yingyingxu9926 Před 3 lety +2

    Question: when you talk about multiple testing problem, is that required exact same tests among 10 groups? Like 10x AA test? If we have 10 different variants, we can think it as 10 different AB tests conduct simultaneously. Do I miss something here?

    • @emma_ding
      @emma_ding  Před 3 lety +5

      No, multiple testing means you have 10 variants, i.e. 1 control and 9 treatments. For example, you want to test 10 colors of a button, so each group of users see a different color. It's different from 10 A/B tests (each with 2 variants). For the 10 color example, you could run 10 A/B tests each have two variants (1 control and 1 treatment) but there's not need to do it.

  • @jaysun2654
    @jaysun2654 Před 2 lety

    I found a typo at 8:23 that is word of 'lager' should be 'larger'.

  • @karencao1538
    @karencao1538 Před 3 lety

    Hi Emma, one question on sample variance when calculating sample size, are we referring to the sample variance of the treatment group before the experiment? Just a bit confused on what actually we're referring to here...

    • @emma_ding
      @emma_ding  Před 3 lety

      The statistic we are testing is delta (the difference) so the "variance" is the variance of delta.

    • @rogerzhao1158
      @rogerzhao1158 Před 3 lety

      @@emma_ding ​ @Data Interview Pro Hi Emma, the video is super helpful. I have one question: the sample variance is calculated as the variance of the delta, so we can only calculate the sample size after the experiment is started and data is collected? But shouldn't we decide the sample size before we start the experiment? I get confused about the order and hope you can help clarify. Thank you.

  • @nathannguyen2041
    @nathannguyen2041 Před 3 lety

    Informative video!
    What is the difference between A/B testing and analysis of variance (design of experiments topics? All of these topics are essentially the same e.g., treatments/factors, randomisation, experiment design, Bonferroni/Kimball inequality, etc. Is there a particular reason why there is a distinction of A/B testing from the general ANOVA framework?
    I may have just answered my own question though..."the general ANOVA framework," but it doesn't hurt to ask someone with more education and work experience than me.

  • @Han-ve8uh
    @Han-ve8uh Před 3 lety

    At 5:50 it shows (1-0.05)^3 for 3 groups (i assume it means variants also), then is the formula for 2 groups (1-0.05)^2? But this seems wrong because no False positive for 2 groups should just be 0.95?
    Something confusing here is the concept of number of tests and variants within a test. I'm not sure if these 2 are the same thing? At 5:30 i interpret it as 2 variants in a single test, suddenly at 5:50, the word variant disappeared and changed to 3 groups, making me think it's 3 variants in a single test, but it also looks like 3 tests, each containing 1 group/variant and the "no change" null group?

    • @emma_ding
      @emma_ding  Před 3 lety +2

      Sorry for the confusion, I should've made it clearer. Group refers to the treatment group, 3 groups at 5:50 means there're 4 variants in total. Multiple testing problem is about more than two variants in a single test, it does not relate to multiple A/B tests (each has two variants). This may help you understand the concept better home.uchicago.edu/amshaikh/webfiles/palgrave.pdf
      "But this seems wrong because no False positive for 2 groups should just be 0.95?" - Why? If we have 2 variants (i.e. one control and one treatment), the false positive rate (Type 1 error or significance level) should be exactly 0.05 thus the probability of seeing no false positive is 0.95.

  • @karundeep07
    @karundeep07 Před 3 lety

    Thank a lot emma.
    One quick question.
    At 3:45 pm, since we haven't run the test yet... then how we can the value of sigma and delta.
    Delta, we can get by minimum detectable effect. But what about signma. Please help me understand this.
    Thanks again..

    • @emma_ding
      @emma_ding  Před 3 lety

      Both sigma and delta are predetermined. They should be known before running the experiment.

    • @guancan
      @guancan Před 2 lety

      @@emma_ding Wonder how we can know what are the samples if the sample size is not determined -- if we don't know what are the samples, how we could observe sample variance? Could you please further explain Emma?

  • @ayankgupta4796
    @ayankgupta4796 Před 3 lety +6

    7:46, should it not be 10 False positives in 200 metrics? Am i missing something

  • @dunjianxiao4105
    @dunjianxiao4105 Před 3 lety

    LIFESAVER

  • @freya_yuen
    @freya_yuen Před 9 měsíci

    Why can't I save this video for my playlist /.\

  • @seant7907
    @seant7907 Před 3 lety

    Emma, I don't mean any offense. Can you add subtitles to your vids? I find it hard to follow what you're speaking because I myself am not a native English speaker. Thank youu!!

    • @emma_ding
      @emma_ding  Před 3 lety +2

      Thanks for the feedback! CZcams has the subtitles function (a "cc" icon on the right bottom of the video) that may help with understanding the content. It may have some errors though, I'll try to upload subtitles as soon as I can.

  • @enlightenment9834
    @enlightenment9834 Před 3 lety +2

    You are so cutee

  • @i-nyymoney
    @i-nyymoney Před 2 lety

    l