python - pandas: Parametric statistic test for two proportions (with H0: p1=p2). How to use scipy.stats? -


based on source data, create dataframe aggregated counts per share looks like:

share   count_cm    total_cm    count_pm    total_pm       117           395         309         1176 b        38           395         158         1176 c         4           395          22         1176 d         4           395           9         1176 e         8           395          19         1176 f        44           395         175         1176 g        37           395         110         1176 h        35           395         111         1176        11           395          21         1176 j        39           395          92         1176 k        16           395          48         1176 l        31           395          72         1176 m        11           395          30         1176 

in order run proportion tests between current month , previous month, calculate z score using following function:

import numpy np def z_prop(c1,c2,n1,n2):     numerator = c1/n1 - c2/n2     p = (c1+c2)/(n1+n2)     q = 1 - p     denominator = np.sqrt(p*q*(1/n1 + 1/n2))     return numerator/denominator 

so z score be

df['z']=z_prop(df['count'],df['count_pm'],df['total'],df['total_pm'])  

and pvalue df['pvalue']=np.abs(1-2*st.norm.sf(df['z']))

at end create following addition rounding:

    share   count_cm    total_cm    count_pm    total_pm    z   pvalue 1          117          395         309         1176     1.29  0.80 2     b       38          395         158         1176    -1.96  0.95 3     c        4          395          22         1176    -1.16  0.75 4     d        4          395           9         1176     0.47  0.36 5     e        8          395          19         1176     0.54  0.41 6     f       44          395         175         1176    -1.86  0.94 7     g       37          395         110         1176     0.01  0.01 8     h       35          395         111         1176    -0.34  0.27 9           11          395          21         1176     1.22  0.78 10    j       39          395          92         1176     1.28  0.80 11    k       16          395          48         1176    -0.03  0.02 12    l       31          395          72         1176     1.20  0.77 13    m       11          395          30         1176     0.25  0.20 

hope solution can someone, p-values per share directly source file using pandas' groupby function , statistical package scipy.stats instead of creating aggregated dataframe , defining z_prop function.

does know such solution?

thank much.


Comments

Popular posts from this blog

Django REST Framework perform_create: You cannot call `.save()` after accessing `serializer.data` -

Why does Go error when trying to marshal this JSON? -