## Wednesday, 2 December 2015

### Coin-tossing power

I've been doing some coin tossing. Well actually it's cheminformatics, but the problem reduces to coin tossing. Given that I toss a coin 1000 times, how many heads must I observe to believe that it is a biased coin. My usual ramshackle approach would be to estimate the standard deviation of the random distribution (have my computer do some coin tossing) and then say something about being two sigmas away from the mean. I've done that just to be sure - the sigma is just under 16.
```import random
random.seed(1)
import matplotlib.pyplot as plt
from collections import defaultdict
import scipy
import scipy.optimize
import numpy as np

values = [0, 1]

ans = defaultdict(int)
for N in range(100000):
tot = 0
for i in range(1000):
tot += random.choice(values)
ans[tot] += 1

start, end = 350, 650
x = np.arange(start, end)
y = []
for i in range(start, end):
if i in ans:
y.append(ans[i])
else:
if i > start:
y.append(y[-1])
else:
y.append(0)

n = len(x)
mean = 500.
sigma = 10.

def gaus(x,a,sigma):
return a*scipy.exp(-(x-mean)**2/(2*sigma**2))

popt,pcov = scipy.optimize.curve_fit(gaus,x,y,p0=[1.0,sigma])
print popt

plt.plot(x,y,'b+:',label='data')
plt.plot(x,gaus(x,*popt),'ro:',label='fit')
plt.legend()
plt.show()
```
PMC points out that a slightly easier way of getting the same value would be to use an equation: variance = n(1-p)p where n = 1000 and p is 0.5. The standard deviation is then the square root of this (i.e. sqrt of 250), 15.8.

However, the proper way of handling the original question is a power calculation. Typically power calculations are associated with calculating the necessary sample size. Here I have a fixed sample size (n=1000) and I want to figure out, given that the alternative hypothesis is true (biased coin), what number of heads must I observe to be 99.9% sure (power=0.999) that it would be show up as significant at the 0.1% level (alpha=0.001). The y = 0.5 below refers to the proportion of heads expected under the null hypothesis (unbiased coin). Since n=1000, we can use the normal approximation to the binomial distribution.
```import math
import scipy.stats

def qnorm(p):
return scipy.stats.norm.ppf(p)

return [(-b+math.sqrt((b**2)-(4*(a*c))))/(2.*a),
(-b-math.sqrt((b**2)-(4*(a*c))))/(2.*a)]

if __name__ == "__main__":
alpha = 0.001 # Significance level
gamma = 0.999 # Power
qgamma = qnorm(gamma)
qalpha = qnorm(1.-(alpha/2.))
p = (qalpha+qgamma)**2
n = 1000
y = 0.5
a = 2*n+p
b = (2*p-4*n)*y - 2*p
# Note that if y is 0.5, then, b = -2*n-p (i.e. -a)
c = -2*p*y + (2*n+p)*y*y