resistics.regression.robust module¶

The source for these functions is Robust Statisitics, Huber, 2009 in general, linear regression is# have observations y and predictors A y is multiple observations/response x are the independent variables and is unknown and y is a linear function of x => y = Ax y = nobs A = nobs * nregressors x = nregressors

resistics.regression.robust.andrewsWaveLocationWeights(r: numpy.ndarray, k: float) → numpy.ndarray[source]

Andrews Wave location weights

Parameters
rnp.ndarray

Residuals

kfloat

Tuning parameter

Returns
weightsnp.ndarray

The robust weights

resistics.regression.robust.bisquareLocationWeights(r: numpy.ndarray, k: float) → numpy.ndarray[source]

Bisquare location weights

Parameters
rnp.ndarray

Residuals

kfloat

Tuning parameter

Returns
weightsnp.ndarray

The robust weights

resistics.regression.robust.calculateDistCMH(n, x, mean, covariance)[source]
resistics.regression.robust.chatterjeeMachler(A: numpy.ndarray, y: numpy.ndarray, **kwargs) → Tuple[source]

Robust bounded influence solver

Solves for $$x$$ where,

$y = Ax .$

Being a bounded influence operator, should be robust against both outliers in dependent and independent variables.

Parameters
Anp.ndarray

Predictors, size nobs*nregressors

ynp.ndarray

Observations, size nobs

interceptbool, optional

True or False for adding an intercept term

Returns
paramsnp.ndarray

Values in x

residsnp.ndarray

Residuals = y - Ax

weightsnp.ndarray

Weights used in robust regression

resistics.regression.robust.chatterjeeMachlerHadi(X, y, **kwargs)[source]

# Another regression method based on Hadi distances # implemented from the paper A Re-Weighted Least Squares Method for Robust Regression Estimation # Billor, Hadi

resistics.regression.robust.chatterjeeMachlerMod(A, y, **kwargs)[source]
resistics.regression.robust.defaultDictionary() → Dict[source]

Robust regression defaults

Returns
Dict

Default regression options

resistics.regression.robust.eps() → float[source]

Small number

Returns
float

A small number for quitting robust regression

resistics.regression.robust.getRobustLocationWeights(r: numpy.ndarray, weight: str) → numpy.ndarray[source]

Robust weighting schemes

Parameters
rnp.ndarray

Residuals

weightstr

The type of weighting to use

Returns
weightsnp.ndarray

The robust weights

resistics.regression.robust.hampelLocationWeights(r: numpy.ndarray, k: float) → numpy.ndarray[source]

Hampel location weights

Parameters
rnp.ndarray

Residuals

kfloat

Tuning parameter

Returns
weightsnp.ndarray

The robust weights

resistics.regression.robust.hermitianTranspose(mat: numpy.ndarray) → numpy.ndarray[source]

Hermitian transpose (transpose and complex conjugation)

Parameters
np.ndarray

Vector, matrix to Hermitian transpose

Returns
np.ndarray

Hermitian transpose

resistics.regression.robust.huberLocationWeights(r: numpy.ndarray, k: float) → numpy.ndarray[source]

Huber location weights

Parameters
rnp.ndarray

Residuals

kfloat

Tuning parameter

Returns
weightsnp.ndarray

The robust weights

resistics.regression.robust.initialFromDict(initDict: Dict) → Tuple[source]

Returns initial model from provided initial model dictionary

Helps for two stage robust regression.

Parameters
Dict

Initial model to use for robust regression with the parameters, residuals and scale estimate

Returns
parametersnp.ndarray
residsnp.ndarray

The residuals

scalefloat

Initial estimate of scale

resistics.regression.robust.leastSquaresLocationWeights(r: numpy.ndarray)[source]

Least squares weights, which are all equal to 1

Parameters
rnp.ndarray

Residuals

Returns
weightsnp.ndarray

The robust weights

resistics.regression.robust.maxIter() → int[source]

Maximum number of iterations

Returns
int

The maximum number of iterations

resistics.regression.robust.mestimateModel(A: numpy.ndarray, y: numpy.ndarray, **kwargs) → Tuple[source]

Mestimate robust least squares

Solves for $$x$$ where,

$y = Ax .$

Good method for dependent outliers (in $$y$$). Not robust against independent outliers (leverage points)

Parameters
Anp.ndarray

Predictors, size nobs*nregressors

ynp.ndarray

Observations, size nobs

initial :
scaleoptional

A scale estimate

interceptbool, optional

True or False for adding an intercept term

Returns
paramsnp.ndarray

Values in x

residsnp.ndarray

Residuals = y - Ax

scalefloat

Robust measure of variance

weightsnp.ndarray

Weights used in robust regression

resistics.regression.robust.mmestimateModel(A: numpy.ndarray, y: numpy.ndarray, **kwargs)[source]

2 stage M estimate

Solves for $$x$$ where,

$y = Ax .$
Parameters
Anp.ndarray

Predictors, size nobs*nregressors

ynp.ndarray

Observations, size nobs

initialDict

Initial solution with parameters, scale and residuals

scaleoptional

A scale estimate

interceptbool, optional

True or False for adding an intercept term

Returns
paramsnp.ndarray

Values in x

residsnp.ndarray

Residuals = y - Ax

scalefloat

Robust measure of variance

weightsnp.ndarray

Weights used in robust regression

resistics.regression.robust.olsModel(A, y, **kwargs) → Tuple[source]

Ordinary least squares

Solves for $$x$$ where,

$y = Ax .$
Parameters
Anp.ndarray

Predictors, size nobs*nregressors

ynp.ndarray

Observations, size nobs

interceptbool, optional

True or False for adding an intercept term

Returns
paramsnp.ndarray

Least squares solution

residsnp.ndarray

Residuals

squareResidnp.ndarray

Square residuals

rankint

Rank of matrix A

snp.ndarray

Singular values of A

resistics.regression.robust.sampleMAD(data)[source]

Median absolute deviation

The standard deviation is not robust against outliers, hence use the MAD.

Parameters
np.ndarray

Data for which to calculate MAD

Returns
float

resistics.regression.robust.sampleMAD0(data)[source]

Median absolute deviation using an estimate of the location as 0

When the location estimate is zero (rather than the median), the MAD essentially reduces to a median. This should be over non zero data. Useful for calculating variance of residuals.

Parameters
np.ndarray

Data for which to calculate MAD. This is often residuals when using 0 as an estimate of location.

Returns
float

The MAD using zero as an esimate of location

resistics.regression.robust.sampleMedian(data)[source]

Calculate the median of an array

Mean is not a robust estimator of locations as it can be broken by a single outlying value. The median is a more robust choice.

Parameters
np.ndarray

Data for which to calculate median

Returns
float

The median

resistics.regression.robust.trimmedMeanLocationWeights(r: numpy.ndarray, k: float) → numpy.ndarray[source]

Trimmed mean location weights

Parameters
rnp.ndarray

Residuals

kfloat

Tuning parameter

Returns
weightsnp.ndarray

The robust weights

resistics.regression.robust.weightLS(A: numpy.ndarray, y: numpy.ndarray, weights: numpy.ndarray) → Tuple[numpy.ndarray][source]

Transform A and y using the weights to perform a weighted least squares

$\sqrt{weights} y = \sqrt{weights} A x ,$

is equivalent to,

$A^H weights y = A^H weights A x ,$

where $$A^H$$ is the hermitian transpose.

In this method, both y and A are multipled by the square root of the weights and then returned.

Parameters
ynp.ndarray

Observations

Anp.ndarray

Regressors

Returns
ynp.ndarray

Observations multipled by the square root of the weights

Anp.ndarray

Regressors multipled by the square root of the weights