import random random.seed(1) import matplotlib.pyplotPMC points out that a slightly easier way of getting the same value would be to use an equation: variance = n(1-p)p where n = 1000 and p is 0.5. The standard deviation is then the square root of this (i.e. sqrt of 250), 15.8.asplt from collections import defaultdict import scipy import scipy.optimize import numpyasnp values = [0, 1] ans = defaultdict(int)forNinrange(100000): tot = 0foriinrange(1000): tot += random.choice(values) ans[tot] += 1 start, end = 350, 650 x = np.arange(start, end) y = []foriinrange(start, end):ifiinans: y.append(ans[i])else:ifi > start: y.append(y[-1])else: y.append(0) n = len(x) mean = 500. sigma = 10.defgaus(x,a,sigma):returna*scipy.exp(-(x-mean)**2/(2*sigma**2)) popt,pcov = scipy.optimize.curve_fit(gaus,x,y,p0=[1.0,sigma]) print popt plt.plot(x,y,'b+:',label='data') plt.plot(x,gaus(x,*popt),'ro:',label='fit') plt.legend() plt.show()

However, the proper way of handling the original question is a power calculation. Typically power calculations are associated with calculating the necessary sample size. Here I have a fixed sample size (n=1000) and I want to figure out, given that the alternative hypothesis is true (biased coin), what number of heads must I observe to be 99.9% sure (power=0.999) that it would be show up as significant at the 0.1% level (alpha=0.001). The y = 0.5 below refers to the proportion of heads expected under the null hypothesis (unbiased coin). Since n=1000, we can use the normal approximation to the binomial distribution.

import math import scipy.stats...and the answer? 642. Thanks to ALC for help with this, though all errors are my own.defqnorm(p):returnscipy.stats.norm.ppf(p)defsolvequadratic(a, b, c):return[(-b+math.sqrt((b**2)-(4*(a*c))))/(2.*a), (-b-math.sqrt((b**2)-(4*(a*c))))/(2.*a)]if__name__ == "__main__": alpha = 0.001 # Significance level gamma = 0.999 # Power qgamma = qnorm(gamma) qalpha = qnorm(1.-(alpha/2.)) p = (qalpha+qgamma)**2 n = 1000 y = 0.5 a = 2*n+p b = (2*p-4*n)*y - 2*p # Note that if y is 0.5, then, b = -2*n-p (i.e. -a) c = -2*p*y + (2*n+p)*y*y print solvequadratic(a, b, c)

**Notes:**

*(added 04/12/2015)*

Both the power and the significance level are conditional probabilities:

- The significance level (alpha) is the probability of rejecting the null hypothesis given that it is true, ie. prob(rejecting null hypothesis | null hypothesis is correct).
- The power (gamma) is the probability of detecting a difference, if indeed the difference does exist, ie. prob(rejecting null hypothesis | null hypothesis is incorrect).

## 2 comments:

Nice post.

As an aside you may want to reconsider your quadratic solver. Here's a link to explain the danger of the simple approach. http://rosettacode.org/wiki/Roots_of_a_quadratic_function

Good to know.

Post a Comment