2 sample t test for sample sizes - 30,000 and 150,000





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty{ margin-bottom:0;
}







1












$begingroup$


I have 2 samples, one with sample size of 30,000 customers and the other with 150,000. I have to perform a 2 sample t test(on conversion rates of the 2 groups). My question is, will t test in this case be biased towards the smaller sample? If yes, what is the correct approach to perform a test?










share|cite|improve this question









New contributor




Shivam Tiwari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$








  • 6




    $begingroup$
    Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
    $endgroup$
    – StatsStudent
    4 hours ago




















1












$begingroup$


I have 2 samples, one with sample size of 30,000 customers and the other with 150,000. I have to perform a 2 sample t test(on conversion rates of the 2 groups). My question is, will t test in this case be biased towards the smaller sample? If yes, what is the correct approach to perform a test?










share|cite|improve this question









New contributor




Shivam Tiwari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$








  • 6




    $begingroup$
    Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
    $endgroup$
    – StatsStudent
    4 hours ago
















1












1








1





$begingroup$


I have 2 samples, one with sample size of 30,000 customers and the other with 150,000. I have to perform a 2 sample t test(on conversion rates of the 2 groups). My question is, will t test in this case be biased towards the smaller sample? If yes, what is the correct approach to perform a test?










share|cite|improve this question









New contributor




Shivam Tiwari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.







$endgroup$




I have 2 samples, one with sample size of 30,000 customers and the other with 150,000. I have to perform a 2 sample t test(on conversion rates of the 2 groups). My question is, will t test in this case be biased towards the smaller sample? If yes, what is the correct approach to perform a test?







hypothesis-testing statistical-significance t-test ab-test






share|cite|improve this question









New contributor




Shivam Tiwari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|cite|improve this question









New contributor




Shivam Tiwari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|cite|improve this question




share|cite|improve this question








edited 6 hours ago







Shivam Tiwari













New contributor




Shivam Tiwari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 6 hours ago









Shivam TiwariShivam Tiwari

62




62




New contributor




Shivam Tiwari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Shivam Tiwari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Shivam Tiwari is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 6




    $begingroup$
    Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
    $endgroup$
    – StatsStudent
    4 hours ago
















  • 6




    $begingroup$
    Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
    $endgroup$
    – StatsStudent
    4 hours ago










6




6




$begingroup$
Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
$endgroup$
– StatsStudent
4 hours ago






$begingroup$
Samples of that size will almost certainly result in statistically significant findings, but the differences may not be of any practical significance. See here for another discussion about this: stats.stackexchange.com/questions/4075/…. What are the actual goals of your analysis too?
$endgroup$
– StatsStudent
4 hours ago












2 Answers
2






active

oldest

votes


















6












$begingroup$

I can hardly imagine any worthwhile effect size that requires such a large sample size to be decently powered. There's no "bias" of having unequal sample sizes$^1$. The only disadvantage is that the power of the test tends to be somewhat limited by the smaller group. For even very small effects, 30,000 observations may confer quite a powerful test.



$^1$ except if you inappropriately use the "equal variance" assumption, in which case the "pooled variance" estimate is more heavily weighted toward the larger group (not toward the smaller as you suggested).






share|cite|improve this answer











$endgroup$





















    4












    $begingroup$

    Maybe a couple of examples will help to illustrate some of the issues.



    Suppose the two populations are $X sim mathsf{Norm}(mu = 500, sigma =30)$
    and $Y sim mathsf{Norm}(mu = 501, sigma = 20.)$



    If both sample sizes are $150,000,$ then there is sufficient power to detect
    the small difference in means.



    set.seed(422)
    x = rnorm(150000, 500, 30)
    y = rnorm(150000, 501, 20)
    t.test(x, y)

    Welch Two Sample t-test

    data: x and y
    t = -10.983, df = 261530, p-value < 2.2e-16
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
    -1.2042715 -0.8395487
    sample estimates:
    mean of x mean of y
    499.9804 501.0023


    If we use only the first 30,000 values in the first sample, results are
    very nearly the same for most practical purposes.



    t.test(x[1:30000], y)

    Welch Two Sample t-test

    data: x[1:30000] and y
    t = -6.3728, df = 35463, p-value = 1.879e-10
    alternative hypothesis: true difference in means is not equal to 0
    95 percent confidence interval:
    -1.5126269 -0.8010336
    sample estimates:
    mean of x mean of y
    499.8455 501.0023


    Here is a boxplot of the data used in the second t test (the wider box indicates a larger sample):



    enter image description here



    Issues of minimal concern:




    • Even though labeled as 'Welch t tests', sample sizes are sufficiently large
      that these are essentially t tests. Unless the data are very far from normal,
      we would still detect the small difference in means.


    • The power of the test is heavily dependent on the smaller sample size. But
      power is not a concern here.



    Issues warranting attention:




    • With such large samples
      in the real world (not the simulation world),
      one is entitled to wonder whether data are truly simple random samples from
      their respective populations. Could smaller, more carefully collected samples provide better information?


    • Although we did not do a formal test to confirm that variances differ, it seems clear from the boxplot that they do. In the Welch test,
      it is OK for variances to differ. But would different variances have important practical implications?


    • Although the null hypothesis that the two population means are equal is soundly rejected with minuscule P-values, it is important to realize that "statistically significant" differences (by whatever definition) are not necessarily differences of practical importance or interest. For what purpose are you
      taking the effort of check whether means are different? And what do the results
      of the t test actually contribute to that purpose?







    share|cite|improve this answer











    $endgroup$














      Your Answer








      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "65"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });






      Shivam Tiwari is a new contributor. Be nice, and check out our Code of Conduct.










      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404439%2f2-sample-t-test-for-sample-sizes-30-000-and-150-000%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      2 Answers
      2






      active

      oldest

      votes








      2 Answers
      2






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      6












      $begingroup$

      I can hardly imagine any worthwhile effect size that requires such a large sample size to be decently powered. There's no "bias" of having unequal sample sizes$^1$. The only disadvantage is that the power of the test tends to be somewhat limited by the smaller group. For even very small effects, 30,000 observations may confer quite a powerful test.



      $^1$ except if you inappropriately use the "equal variance" assumption, in which case the "pooled variance" estimate is more heavily weighted toward the larger group (not toward the smaller as you suggested).






      share|cite|improve this answer











      $endgroup$


















        6












        $begingroup$

        I can hardly imagine any worthwhile effect size that requires such a large sample size to be decently powered. There's no "bias" of having unequal sample sizes$^1$. The only disadvantage is that the power of the test tends to be somewhat limited by the smaller group. For even very small effects, 30,000 observations may confer quite a powerful test.



        $^1$ except if you inappropriately use the "equal variance" assumption, in which case the "pooled variance" estimate is more heavily weighted toward the larger group (not toward the smaller as you suggested).






        share|cite|improve this answer











        $endgroup$
















          6












          6








          6





          $begingroup$

          I can hardly imagine any worthwhile effect size that requires such a large sample size to be decently powered. There's no "bias" of having unequal sample sizes$^1$. The only disadvantage is that the power of the test tends to be somewhat limited by the smaller group. For even very small effects, 30,000 observations may confer quite a powerful test.



          $^1$ except if you inappropriately use the "equal variance" assumption, in which case the "pooled variance" estimate is more heavily weighted toward the larger group (not toward the smaller as you suggested).






          share|cite|improve this answer











          $endgroup$



          I can hardly imagine any worthwhile effect size that requires such a large sample size to be decently powered. There's no "bias" of having unequal sample sizes$^1$. The only disadvantage is that the power of the test tends to be somewhat limited by the smaller group. For even very small effects, 30,000 observations may confer quite a powerful test.



          $^1$ except if you inappropriately use the "equal variance" assumption, in which case the "pooled variance" estimate is more heavily weighted toward the larger group (not toward the smaller as you suggested).







          share|cite|improve this answer














          share|cite|improve this answer



          share|cite|improve this answer








          edited 4 hours ago

























          answered 6 hours ago









          AdamOAdamO

          35.2k265143




          35.2k265143

























              4












              $begingroup$

              Maybe a couple of examples will help to illustrate some of the issues.



              Suppose the two populations are $X sim mathsf{Norm}(mu = 500, sigma =30)$
              and $Y sim mathsf{Norm}(mu = 501, sigma = 20.)$



              If both sample sizes are $150,000,$ then there is sufficient power to detect
              the small difference in means.



              set.seed(422)
              x = rnorm(150000, 500, 30)
              y = rnorm(150000, 501, 20)
              t.test(x, y)

              Welch Two Sample t-test

              data: x and y
              t = -10.983, df = 261530, p-value < 2.2e-16
              alternative hypothesis: true difference in means is not equal to 0
              95 percent confidence interval:
              -1.2042715 -0.8395487
              sample estimates:
              mean of x mean of y
              499.9804 501.0023


              If we use only the first 30,000 values in the first sample, results are
              very nearly the same for most practical purposes.



              t.test(x[1:30000], y)

              Welch Two Sample t-test

              data: x[1:30000] and y
              t = -6.3728, df = 35463, p-value = 1.879e-10
              alternative hypothesis: true difference in means is not equal to 0
              95 percent confidence interval:
              -1.5126269 -0.8010336
              sample estimates:
              mean of x mean of y
              499.8455 501.0023


              Here is a boxplot of the data used in the second t test (the wider box indicates a larger sample):



              enter image description here



              Issues of minimal concern:




              • Even though labeled as 'Welch t tests', sample sizes are sufficiently large
                that these are essentially t tests. Unless the data are very far from normal,
                we would still detect the small difference in means.


              • The power of the test is heavily dependent on the smaller sample size. But
                power is not a concern here.



              Issues warranting attention:




              • With such large samples
                in the real world (not the simulation world),
                one is entitled to wonder whether data are truly simple random samples from
                their respective populations. Could smaller, more carefully collected samples provide better information?


              • Although we did not do a formal test to confirm that variances differ, it seems clear from the boxplot that they do. In the Welch test,
                it is OK for variances to differ. But would different variances have important practical implications?


              • Although the null hypothesis that the two population means are equal is soundly rejected with minuscule P-values, it is important to realize that "statistically significant" differences (by whatever definition) are not necessarily differences of practical importance or interest. For what purpose are you
                taking the effort of check whether means are different? And what do the results
                of the t test actually contribute to that purpose?







              share|cite|improve this answer











              $endgroup$


















                4












                $begingroup$

                Maybe a couple of examples will help to illustrate some of the issues.



                Suppose the two populations are $X sim mathsf{Norm}(mu = 500, sigma =30)$
                and $Y sim mathsf{Norm}(mu = 501, sigma = 20.)$



                If both sample sizes are $150,000,$ then there is sufficient power to detect
                the small difference in means.



                set.seed(422)
                x = rnorm(150000, 500, 30)
                y = rnorm(150000, 501, 20)
                t.test(x, y)

                Welch Two Sample t-test

                data: x and y
                t = -10.983, df = 261530, p-value < 2.2e-16
                alternative hypothesis: true difference in means is not equal to 0
                95 percent confidence interval:
                -1.2042715 -0.8395487
                sample estimates:
                mean of x mean of y
                499.9804 501.0023


                If we use only the first 30,000 values in the first sample, results are
                very nearly the same for most practical purposes.



                t.test(x[1:30000], y)

                Welch Two Sample t-test

                data: x[1:30000] and y
                t = -6.3728, df = 35463, p-value = 1.879e-10
                alternative hypothesis: true difference in means is not equal to 0
                95 percent confidence interval:
                -1.5126269 -0.8010336
                sample estimates:
                mean of x mean of y
                499.8455 501.0023


                Here is a boxplot of the data used in the second t test (the wider box indicates a larger sample):



                enter image description here



                Issues of minimal concern:




                • Even though labeled as 'Welch t tests', sample sizes are sufficiently large
                  that these are essentially t tests. Unless the data are very far from normal,
                  we would still detect the small difference in means.


                • The power of the test is heavily dependent on the smaller sample size. But
                  power is not a concern here.



                Issues warranting attention:




                • With such large samples
                  in the real world (not the simulation world),
                  one is entitled to wonder whether data are truly simple random samples from
                  their respective populations. Could smaller, more carefully collected samples provide better information?


                • Although we did not do a formal test to confirm that variances differ, it seems clear from the boxplot that they do. In the Welch test,
                  it is OK for variances to differ. But would different variances have important practical implications?


                • Although the null hypothesis that the two population means are equal is soundly rejected with minuscule P-values, it is important to realize that "statistically significant" differences (by whatever definition) are not necessarily differences of practical importance or interest. For what purpose are you
                  taking the effort of check whether means are different? And what do the results
                  of the t test actually contribute to that purpose?







                share|cite|improve this answer











                $endgroup$
















                  4












                  4








                  4





                  $begingroup$

                  Maybe a couple of examples will help to illustrate some of the issues.



                  Suppose the two populations are $X sim mathsf{Norm}(mu = 500, sigma =30)$
                  and $Y sim mathsf{Norm}(mu = 501, sigma = 20.)$



                  If both sample sizes are $150,000,$ then there is sufficient power to detect
                  the small difference in means.



                  set.seed(422)
                  x = rnorm(150000, 500, 30)
                  y = rnorm(150000, 501, 20)
                  t.test(x, y)

                  Welch Two Sample t-test

                  data: x and y
                  t = -10.983, df = 261530, p-value < 2.2e-16
                  alternative hypothesis: true difference in means is not equal to 0
                  95 percent confidence interval:
                  -1.2042715 -0.8395487
                  sample estimates:
                  mean of x mean of y
                  499.9804 501.0023


                  If we use only the first 30,000 values in the first sample, results are
                  very nearly the same for most practical purposes.



                  t.test(x[1:30000], y)

                  Welch Two Sample t-test

                  data: x[1:30000] and y
                  t = -6.3728, df = 35463, p-value = 1.879e-10
                  alternative hypothesis: true difference in means is not equal to 0
                  95 percent confidence interval:
                  -1.5126269 -0.8010336
                  sample estimates:
                  mean of x mean of y
                  499.8455 501.0023


                  Here is a boxplot of the data used in the second t test (the wider box indicates a larger sample):



                  enter image description here



                  Issues of minimal concern:




                  • Even though labeled as 'Welch t tests', sample sizes are sufficiently large
                    that these are essentially t tests. Unless the data are very far from normal,
                    we would still detect the small difference in means.


                  • The power of the test is heavily dependent on the smaller sample size. But
                    power is not a concern here.



                  Issues warranting attention:




                  • With such large samples
                    in the real world (not the simulation world),
                    one is entitled to wonder whether data are truly simple random samples from
                    their respective populations. Could smaller, more carefully collected samples provide better information?


                  • Although we did not do a formal test to confirm that variances differ, it seems clear from the boxplot that they do. In the Welch test,
                    it is OK for variances to differ. But would different variances have important practical implications?


                  • Although the null hypothesis that the two population means are equal is soundly rejected with minuscule P-values, it is important to realize that "statistically significant" differences (by whatever definition) are not necessarily differences of practical importance or interest. For what purpose are you
                    taking the effort of check whether means are different? And what do the results
                    of the t test actually contribute to that purpose?







                  share|cite|improve this answer











                  $endgroup$



                  Maybe a couple of examples will help to illustrate some of the issues.



                  Suppose the two populations are $X sim mathsf{Norm}(mu = 500, sigma =30)$
                  and $Y sim mathsf{Norm}(mu = 501, sigma = 20.)$



                  If both sample sizes are $150,000,$ then there is sufficient power to detect
                  the small difference in means.



                  set.seed(422)
                  x = rnorm(150000, 500, 30)
                  y = rnorm(150000, 501, 20)
                  t.test(x, y)

                  Welch Two Sample t-test

                  data: x and y
                  t = -10.983, df = 261530, p-value < 2.2e-16
                  alternative hypothesis: true difference in means is not equal to 0
                  95 percent confidence interval:
                  -1.2042715 -0.8395487
                  sample estimates:
                  mean of x mean of y
                  499.9804 501.0023


                  If we use only the first 30,000 values in the first sample, results are
                  very nearly the same for most practical purposes.



                  t.test(x[1:30000], y)

                  Welch Two Sample t-test

                  data: x[1:30000] and y
                  t = -6.3728, df = 35463, p-value = 1.879e-10
                  alternative hypothesis: true difference in means is not equal to 0
                  95 percent confidence interval:
                  -1.5126269 -0.8010336
                  sample estimates:
                  mean of x mean of y
                  499.8455 501.0023


                  Here is a boxplot of the data used in the second t test (the wider box indicates a larger sample):



                  enter image description here



                  Issues of minimal concern:




                  • Even though labeled as 'Welch t tests', sample sizes are sufficiently large
                    that these are essentially t tests. Unless the data are very far from normal,
                    we would still detect the small difference in means.


                  • The power of the test is heavily dependent on the smaller sample size. But
                    power is not a concern here.



                  Issues warranting attention:




                  • With such large samples
                    in the real world (not the simulation world),
                    one is entitled to wonder whether data are truly simple random samples from
                    their respective populations. Could smaller, more carefully collected samples provide better information?


                  • Although we did not do a formal test to confirm that variances differ, it seems clear from the boxplot that they do. In the Welch test,
                    it is OK for variances to differ. But would different variances have important practical implications?


                  • Although the null hypothesis that the two population means are equal is soundly rejected with minuscule P-values, it is important to realize that "statistically significant" differences (by whatever definition) are not necessarily differences of practical importance or interest. For what purpose are you
                    taking the effort of check whether means are different? And what do the results
                    of the t test actually contribute to that purpose?








                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited 4 hours ago

























                  answered 5 hours ago









                  BruceETBruceET

                  7,1561721




                  7,1561721






















                      Shivam Tiwari is a new contributor. Be nice, and check out our Code of Conduct.










                      draft saved

                      draft discarded


















                      Shivam Tiwari is a new contributor. Be nice, and check out our Code of Conduct.













                      Shivam Tiwari is a new contributor. Be nice, and check out our Code of Conduct.












                      Shivam Tiwari is a new contributor. Be nice, and check out our Code of Conduct.
















                      Thanks for contributing an answer to Cross Validated!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstats.stackexchange.com%2fquestions%2f404439%2f2-sample-t-test-for-sample-sizes-30-000-and-150-000%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      數位音樂下載

                      When can things happen in Etherscan, such as the picture below?

                      格利澤436b