01292012, 08:48 PM  #1 
Simple stats question sample size calculation
It has been 20+ years since I have done statistical calculations...so please bear with me for the simpleness of this.
For my son's 6th grade science class, I need to calculate the sample size. The experiement is: Swab household surfaces (toilets, doorknobs etc) and then wipe with bleach and reswab. Plate on agar and grow bactetia. The hypothesis is that the bleach treated surfaces with kill 99.9% of bacteria (manufactors claims.) I would like the confidence interval to be 95% and powered to 80%. This is really simple, but I have spent a enough time online seaching a left my stats books at the office. Payment...
80% probabilty to reject the null hypothesis...i am embarrased that I can't do this...i appreciate the effort!!
How are you even measuring the null hypothesis? Do you have a way to measure exactly what percentage of bacteria is killed per sample?
The hypothesis is that it kills 99.9% of bacteria, so I would except no growth on the treatment group. But, we can go to the lab and do colony counts if needed. The microbiology part is easy. I was trying to do three things at once last night and was hoping that someone could get this easily. I just need the sample size to figure out how many petri dishes to get.
sorry for the multiple posts...the crazy Iphone app likes to hang and crash...
Okay you say your null hypothesis is the manufacturer's claim of 99.9% killing efficiency. For this you really need to have each dish tested to determine the actual percentage of bacteria that was killed. Then you take the average of the results (say 95% of bacteria killed on average), compute the standard deviation, and then you can come up with your confidence intervals. (Also if you want to be pedantic you have to first assume the results will be normally distributed.)
The problem with this is you don't really have a way to accurately measure how much bacteria is killed. If you're going to break the outcome down into 2 outcomes versus an infinite set of outcomes, then it's a different problem to consider. For 2 outcomes (either bacteria grew or it didn't) you wouldn't really be able to get a confidence interval because your outcomes aren't a continuous and infinite set (such as the percentage range between 0 and 100). You would just have to get 1000 petri dishes and if bacteria grew on 1 of them then the average killing rate would indeed be 99.9%. If bacteria grew on half of them then you would say the killing rate is 50%. This is probably not feasible for you since buying that many petri dishes is tedious. Also, the claim that 99.9% of bacteria is killed is how much the product will kill on contact. What it doesn't guarantee is that the surviving bacteria reproduces over time if not totally eradicated. So the real test should be to get a sample, test that there is indeed bacteria in that sample, then use the product, and immediately test if the product was successful in killing the advertised rate of 99.9% of the bacteria in the sample. Sorry about being so annoying with my answer but I don't think you can feasibly test the null hypothesis by letting the bacteria grow on a petri dish unless you constrain your outcomes to a binary set.
For 2 outcomes (either bacteria grew or it didn't) you wouldn't really be able to get a confidence interval because your outcomes aren't a continuous and infinite set (such as the percentage range between 0 and 100) I am fuzzy here, however I think a sample size calculation can be done... So the real test should be to get a sample, test that there is indeed bacteria in that sample, then use the product, and immediately test if the product was successful in killing the advertised rate of 99.9% of the bacteria in the sample This is what we will be doing....but I will be testing some nasty stuff that will likely have bacteria and fungi...like toilet seats in public restrooms I really appreciate the help...
If I did it correctly, you would need 15k+ petri dishes. Lol
A confidence interval can be calculated at any percentage given you have a mean and a standard deviation which can be done with as little as 3 observations. The confidence interval will be very bad and not very significant, but it can be calculated nonetheless.
To the OP: Lets say each "observation" is a petri dish. Each observation has an outcome of 0 or 1. 0 meaning no growth, 1 for any kind of growth. You run this test with n trials and compute the mean and standard deviation. You can then calculate the 95% confidence intervals. To calculate the confidence interval you need to use a student Tdistribution table to lookup the value corresponding to the percentage you want given how many observations you have, n. For example, if you had 100 petri dishes and only 1 of them had growth while the other 99 had no growth, the mean would simply be 1/n=0.01 and the standard deviation would be 0.1. The null hypothesis is that the mean should be 0.001 meaning 1 out of every 1000 petri dishes will have growth according to the manufacturer's claim of 99.9%. To calculate the test statistic you need to subtract the hypothesized mean from the sample mean and then divide by the standard deviation. In this example it would be (0.01  0.001)/0.1 = 0.9. Look up what percentage 0.9 falls under on the studentt table for n1=99 degrees of freedom. This would correspond to roughly around 70% confidence so at the 95% confidence level you cannot reject the null hypothesis of the mean of 0.001 (manufacturer's claim). What you're most likely going to find is none of your petri dishes with the treatment will have growth so the standard deviation will be 0 and you will have to conclude that the mean is in fact 0 and you'll be unable to calculate any intervals since there was no variation in the results. I might've lead you down a totally incorrect path so sorry if I did but I hope it helps a little. (There are some more pedantic points like calculating sample mean and sample standard deviation to take into account degrees of freedom but I think that's beyond the scope of this...) Actually on second thought this wouldn't even work... as I said before you need a nonbinary continuous set of outcomes for this to work... ideally you'd have each sample report a percentage of bacteria killed so your data would look something like [99.1, 99.3, 98.7, etc] and then take the mean and standard deviation of that to compute the confidence intervals.
