Sandbox¶
This sandbox contains code that is for various resons not ready to be included in statsmodels proper. It contains modules from the old stats.models code that have not been tested, verified and updated to the new statsmodels structure: cox survival model, mixed effects model with repeated measures, generalized additive model and the formula framework. The sandbox also contains code that is currently being worked on until it fits the pattern of statsmodels or is sufficiently tested.
All sandbox modules have to be explicitly imported to indicate that they are not yet part of the core of statsmodels. The quality and testing of the sandbox code varies widely.
Examples¶
There are some examples in the sandbox.examples folder. Additional examples are directly included in the modules and in subfolders of the sandbox.
Module Reference¶
Time Series analysis tsa
¶
In this part we develop models and functions that will be useful for time
series analysis. Most of the models and function have been moved to
statsmodels.tsa
. Currently, GARCH models remain in development stage in
sandbox.tsa.
Moving Window Statistics¶
Most moving window statistics, like rolling mean, moments (up to 4th order), min, max, mean, and variance, are covered by the functions for Moving (rolling) statistics/moments in Pandas.
movstat.movorder (x[, order, windsize, lag]) 
moving order statistics 
movstat.movmean (x[, windowsize, lag]) 
moving window mean 
movstat.movvar (x[, windowsize, lag]) 
moving window variance 
movstat.movmoment (x, k[, windowsize, lag]) 
noncentral moment 
Regression and ANOVA¶
The following two ANOVA functions are fully tested against the NIST test data
for balanced oneway ANOVA. anova_oneway
follows the same pattern as the
oneway anova function in scipy.stats but with higher precision for badly
scaled problems. anova_ols
produces the same results as the one way anova
however using the OLS model class. It also verifies against the NIST tests,
with some problems in the worst scaled cases. It shows how to do simple ANOVA
using statsmodels in three lines and is also best taken as a recipe.
anova_oneway (y, x[, seq]) 

anova_ols (y, x) 
The following are helper functions for working with dummy variables and generating ANOVA results with OLS. They are best considered as recipes since they were written with a specific use in mind. These function will eventually be rewritten or reorganized.
try_ols_anova.data2dummy (x[, returnall]) 
convert array of categories to dummy variables 
try_ols_anova.data2groupcont (x1, x2) 
create dummy continuous variable 
try_ols_anova.data2proddummy (x) 
creates product dummy variables from 2 columns of 2d array 
try_ols_anova.dropname (ss, li) 
drop names from a list of strings, 
try_ols_anova.form2design (ss, data) 
convert string formula to data dictionary 
The following are helper functions for group statistics where groups are defined by a label array. The qualifying comments for the previous group apply also to this group of functions.
try_catdata.cat2dummy (y[, nonseq]) 

try_catdata.convertlabels (ys[, indices]) 
convert labels based on multiple variables or string labels to unique 
try_catdata.groupsstats_1d (y, x, labelsunique) 
use ndimage to get fast mean and variance 
try_catdata.groupsstats_dummy (y, x[, nonseq]) 

try_catdata.groupstatsbin (factors, values) 
uses np.bincount, assumes factors/labels are integers 
try_catdata.labelmeanfilter (y, x) 

try_catdata.labelmeanfilter_nd (y, x) 

try_catdata.labelmeanfilter_str (ys, x) 
Additional to these functions, sandbox regression still contains several examples, that are illustrative of the use of the regression models of statsmodels.
Systems of Regression Equations and Simultaneous Equations¶
The following are for fitting systems of equations models. Though the returned parameters have been verified as accurate, this code is still very experimental, and the usage of the models will very likely change significantly before they are added to the main codebase.
SUR (sys[, sigma, dfk]) 
Seemingly Unrelated Regression 
Sem2SLS (sys[, indep_endog, instruments]) 
TwoStage Least Squares for Simultaneous equations 
Miscellaneous¶
Tools for Time Series Analysis¶
nothing left in here
Tools: Principal Component Analysis¶
pca (data[, keepdim, normalize, demean]) 
principal components with eigenvector decomposition 
pcasvd (data[, keepdim, demean]) 
principal components with svd 
Descriptive Statistics Printing¶
descstats.descstats (data[, cols, axis]) 
Prints descriptive statistics for one or multiple variables. 
Original stats.models¶
None of these are fully working. The formula framework is used by cox and mixed.
Mixed Effects Model with Repeated Measures using an EM Algorithm
statsmodels.sandbox.mixed
Cox Proportional Hazards Model
statsmodels.sandbox.cox
Generalized Additive Models
statsmodels.sandbox.gam
Formula
statsmodels.sandbox.formula