Statistical functions¶
This page describes the statistical functions that are available in Phonometrica.
Global functions¶
-
chi2_test
(X)¶
Computes Pearson’s chi-squared (\(\chi^2\)) test on X
, which must be a two-dimensional array. The m rows in the array represent
the m levels of a categorical variable, and the n columns represent the n levels of another categorical variable.
Each cell represents the unnormalized frequency count for the combination of the two variables. This test evaluates the
null hypothesis that the two variables are independent.
This function returns an object with the following fields:
chi2
: the \(\chi^2\) valuedf
: the number of degrees of freedomp
: the p-value
See also: report_chi2()
-
corr
(x, y)¶
Calculates Pearson’s correlation coefficient between samples x
and y
, which must be one-dimensional arrays with the same size.
-
cov
(x, y)¶
Calculates the covariance between samples x
and y
, which must be one-dimensional arrays with the same size.
-
f_test
(x, y[, alternative])¶
Computes the F-test on x
and y
which must be one-dimensional arrays. This test evaluates the null hypothesis that samples
x
and y
have the same variance.
If alternative
is specified, it must be one of the following strings: "two-tailed"
performs a two-tailed test (default), "less"
performs a lef-tailed
test and "greater"
performs a right-tailed test.
This function returns an object with the following fields:
f
: the F statistic, which is the ratio between the variance ofx
and the variance ofy
df
: the number of degrees of freedomp
: the p-value
-
lm
(y, X)¶
Fits a linear regression model. y
is a set of N observations for a continuous outcome, and X
is an N by M matrix for a model with M regression
coefficients, including the intercept which must be the first coefficient. (In general, it should be a column of 1’s.)
This function returns an object with the following fields:
beta
: an array of estimates for the regression coefficients. The first entry is the interceptse
: an array representing the standard errors of the regression coefficientst
: an array of t-values for the regression coefficients (t[i]
is the t-value forbeta[i]
)p
: an array of p-values for a t-test which evaluates the null hypothesis that each regression coefficient is equal to 0 (p[i]
is the p-value forbeta[i]
)r2
: the \(R^2\) value, which is the proportion of variance explained by the modeladj_r2
: the adjusted \(R^2\) value, which takes into account the number of predictors in the model.
Note: the model is estimated by minimizing the sum of squared errors. It is fitted analytically using Singular Value Decomposition.
-
logit
(y, X[, max_iter])¶
Fits a logistic regression model. y
is a set of N binary observations (either 0 or 1), and X
is an N by M matrix for a model with M regression
coefficients, including the intercept which must be the first coefficient. (In general, it should be a column of 1’s.)
If max_iter
is provided, it indicates the maximum number of iterations that the solver should perform to estimate the coefficients (200 by default).
This function returns an object with the following fields:
beta
: an array of estimates for the regression coefficients. The first entry is the interceptse
: an array representing the standard errors of the regression coefficientsz
: an array of z-values for the regression coefficients (z[i]
is the z-value forbeta[i]
)p
: an array of p-values for a Wald test which evaluates the null hypothesis that each regression coefficient is equal to 0 (p[i]
is the p-value forbeta[i]
)niter
: the number of iterations performed by the numerical solverconverged
: a Boolean value indicating whether the solver has converged to a solution. It istrue
ifniter < max_iter
Note: the model is fitted numerically using the Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) approximation method.
-
mean
(x[, dim])¶
Returns the mean of the array x
. If dim
is specified, returns an Array
in which each element
represents the mean over the given dimension in a two dimension array. If dim is equal to 1, the calculation is performed
over rows. If it is equal to 2, it is performed over columns.
-
poisson
(y, X[, robust[, max_iter]])¶
Fits a Poisson regression model. y
is a set of N observations which represent count data (i.e. non-negative integers), and X
is an N by M matrix for a model with M regression
coefficients, including the intercept which must be the first coefficient. (In general, it should be a column of 1’s.) If robust
is
true
(it is false
by default), Phonometrica will use the so-called “robust variance sandwich estimator” to adjust the standard errors for mild violations of the assumption that the mean is equal to the variance.
If max_iter
is provided, it indicates the maximum number of iterations that the solver should perform to estimate the coefficients (200 by default).
This function returns an object with the following fields:
beta
: an array of estimates for the regression coefficients. The first entry is the interceptse
: an array representing the standard errors of the regression coefficientsz
: an array of z-values for the regression coefficients (z[i]
is the z-value forbeta[i]
)p
: an array of p-values for a Wald test which evaluates the null hypothesis that each regression coefficient is equal to 0 (p[i]
is the p-value forbeta[i]
)niter
: the number of iterations performed by the numerical solverconverged
: a Boolean value indicating whether the solver has converged to a solution. It istrue
ifniter < max_iter
Note: the model is fitted numerically using the Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) approximation method.
-
report_chi2
(X)¶
Computes and reports Pearson’s chi-squared test on X
, which must be a two-dimensional array. This is a convenience wrapper
over chi2_test()
.
See also: chi2_test()
-
std
(x[, dim])¶
Returns the standard deviation of the array x
. If dim
is specified, returns an Array
in which each element
represents the standard deviation over the given dimension in a two dimension array. If dim is equal to 1, the calculation is performed
over rows. If it is equal to 2, it is performed over columns.
-
sum
(x[, dim])¶
Returns the sum of the elements in the array x
. If dim
is specified, returns an Array
in which each element
represents the sum over the given dimension in a two dimension array. If dim is equal to 1, the summation is performed
over rows. If it is equal to 2, summation is performed over columns.
-
t_test
(x, y[, equal_variance[, alternative]])¶
Computes a two-sample independent t-test for the mean between the samples x
and y
, which must be one-dimensional
arrays. This test evaluates the null hypothesis that samples x
and y
have equal means.
If equal_variance
is true, the variance of the two samples is assumed to be equal and Student’s t-test is calculated,
using the pooled standard error. If equal_variance
is false (default), Welch’s t-test is used instead.
If alternative
is specified, it must be one of the following strings: "two-tailed"
performs a two-tailed test (default),
"less"
performs a lef-tailed test and "greater"
performs a right-tailed test.
This function returns an object with the following fields:
t
: the t statisticdf1
: the number of degrees of freedom ofx
df2
: the number of degrees of freedom ofy
p
: the p-value
See also: t_test1()
-
t_test1
(x, mu[, alternative])¶
- Computes a one-sample t-test for the sample
x
, which must be a one-dimensional array. This test evaluates the null hypothesis that the mean of sample
x
is equal to the theoretical meanmu
.
If alternative
is specified, it must be one of the following strings: "two-tailed"
performs a two-tailed test (default),
"less"
performs a lef-tailed test and "greater"
performs a right-tailed test.
This function returns an object with the following fields:
t
: the t statisticdf
: the number of degrees of freedomp
: the p-value
See also: t_test()
-
vrc
(x[, dim])¶
Returns the sample variance of the array x
. If dim
is specified, returns an Array
in which each element
represents the variance over the given dimension in a two dimension array. If dim is equal to 1, the calculation is performed
over rows. If it is equal to 2, it is performed over columns.
See also: std()