python - pandas: Parametric statistic test for two proportions (with H0: p1=p2). How to use scipy.stats? -
based on source data, create dataframe aggregated counts per share looks like:
share count_cm total_cm count_pm total_pm 117 395 309 1176 b 38 395 158 1176 c 4 395 22 1176 d 4 395 9 1176 e 8 395 19 1176 f 44 395 175 1176 g 37 395 110 1176 h 35 395 111 1176 11 395 21 1176 j 39 395 92 1176 k 16 395 48 1176 l 31 395 72 1176 m 11 395 30 1176
in order run proportion tests between current month , previous month, calculate z score using following function:
import numpy np def z_prop(c1,c2,n1,n2): numerator = c1/n1 - c2/n2 p = (c1+c2)/(n1+n2) q = 1 - p denominator = np.sqrt(p*q*(1/n1 + 1/n2)) return numerator/denominator
so z score be
df['z']=z_prop(df['count'],df['count_pm'],df['total'],df['total_pm'])
and pvalue df['pvalue']=np.abs(1-2*st.norm.sf(df['z']))
at end create following addition rounding:
share count_cm total_cm count_pm total_pm z pvalue 1 117 395 309 1176 1.29 0.80 2 b 38 395 158 1176 -1.96 0.95 3 c 4 395 22 1176 -1.16 0.75 4 d 4 395 9 1176 0.47 0.36 5 e 8 395 19 1176 0.54 0.41 6 f 44 395 175 1176 -1.86 0.94 7 g 37 395 110 1176 0.01 0.01 8 h 35 395 111 1176 -0.34 0.27 9 11 395 21 1176 1.22 0.78 10 j 39 395 92 1176 1.28 0.80 11 k 16 395 48 1176 -0.03 0.02 12 l 31 395 72 1176 1.20 0.77 13 m 11 395 30 1176 0.25 0.20
hope solution can someone, p-values per share directly source file using pandas' groupby function , statistical package scipy.stats instead of creating aggregated dataframe , defining z_prop function.
does know such solution?
thank much.
Comments
Post a Comment