resistics.regression.robust module¶

The source for these functions is Robust Statisitics, Huber, 2009 in general, linear regression is# have observations y and predictors A y is multiple observations/response x are the independent variables and is unknown and y is a linear function of x => y = Ax y = nobs A = nobs * nregressors x = nregressors

resistics.regression.robust.andrewsWaveLocationWeights(r: numpy.ndarray, k: float) → numpy.ndarray[source]¶

Andrews Wave location weights

Parameters

rnp.ndarray: Residuals
kfloat: Tuning parameter

Returns

weightsnp.ndarray: The robust weights

resistics.regression.robust.bisquareLocationWeights(r: numpy.ndarray, k: float) → numpy.ndarray[source]¶

Bisquare location weights

Parameters

rnp.ndarray: Residuals
kfloat: Tuning parameter

Returns

weightsnp.ndarray: The robust weights

resistics.regression.robust.calculateDistCMH(n, x, mean, covariance)[source]¶

resistics.regression.robust.chatterjeeMachler(A: numpy.ndarray, y: numpy.ndarray, **kwargs) → Tuple[source]¶

Robust bounded influence solver

Solves for \(x\) where,

\[y = Ax .\]

Being a bounded influence operator, should be robust against both outliers in dependent and independent variables.

Parameters

Anp.ndarray: Predictors, size nobs*nregressors
ynp.ndarray: Observations, size nobs
interceptbool, optional: True or False for adding an intercept term

Returns

paramsnp.ndarray: Values in x
residsnp.ndarray: Residuals = y - Ax
weightsnp.ndarray: Weights used in robust regression

resistics.regression.robust.chatterjeeMachlerHadi(X, y, **kwargs)[source]¶

Regression based on Hadi distances

# Another regression method based on Hadi distances # implemented from the paper A Re-Weighted Least Squares Method for Robust Regression Estimation # Billor, Hadi

resistics.regression.robust.chatterjeeMachlerMod(A, y, **kwargs)[source]¶

resistics.regression.robust.defaultDictionary() → Dict[source]¶

Robust regression defaults

Returns

Dict: Default regression options

resistics.regression.robust.eps() → float[source]¶

Small number

Returns

float: A small number for quitting robust regression

resistics.regression.robust.getRobustLocationWeights(r: numpy.ndarray, weight: str) → numpy.ndarray[source]¶

Robust weighting schemes

Parameters

rnp.ndarray: Residuals
weightstr: The type of weighting to use

Returns

weightsnp.ndarray: The robust weights

resistics.regression.robust.hampelLocationWeights(r: numpy.ndarray, k: float) → numpy.ndarray[source]¶

Hampel location weights

Parameters

rnp.ndarray: Residuals
kfloat: Tuning parameter

Returns

weightsnp.ndarray: The robust weights

resistics.regression.robust.hermitianTranspose(mat: numpy.ndarray) → numpy.ndarray[source]¶

Hermitian transpose (transpose and complex conjugation)

Parameters

np.ndarray: Vector, matrix to Hermitian transpose

Returns

np.ndarray: Hermitian transpose

resistics.regression.robust.huberLocationWeights(r: numpy.ndarray, k: float) → numpy.ndarray[source]¶

Huber location weights

Parameters

rnp.ndarray: Residuals
kfloat: Tuning parameter

Returns

weightsnp.ndarray: The robust weights

resistics.regression.robust.initialFromDict(initDict: Dict) → Tuple[source]¶

Returns initial model from provided initial model dictionary

Helps for two stage robust regression.

Parameters

Dict: Initial model to use for robust regression with the parameters, residuals and scale estimate

Returns

parametersnp.ndarray
residsnp.ndarray: The residuals
scalefloat: Initial estimate of scale

resistics.regression.robust.leastSquaresLocationWeights(r: numpy.ndarray)[source]¶

Least squares weights, which are all equal to 1

Parameters

rnp.ndarray: Residuals

Returns

weightsnp.ndarray: The robust weights

resistics.regression.robust.maxIter() → int[source]¶

Maximum number of iterations

Returns

int: The maximum number of iterations

resistics.regression.robust.mestimateModel(A: numpy.ndarray, y: numpy.ndarray, **kwargs) → Tuple[source]¶

Mestimate robust least squares

Solves for \(x\) where,

\[y = Ax .\]

Good method for dependent outliers (in \(y\)). Not robust against independent outliers (leverage points)

Parameters

Anp.ndarray: Predictors, size nobs*nregressors
ynp.ndarray: Observations, size nobs
initial :
scaleoptional: A scale estimate
interceptbool, optional: True or False for adding an intercept term

Returns

paramsnp.ndarray: Values in x
residsnp.ndarray: Residuals = y - Ax
scalefloat: Robust measure of variance
weightsnp.ndarray: Weights used in robust regression

resistics.regression.robust.mmestimateModel(A: numpy.ndarray, y: numpy.ndarray, **kwargs)[source]¶

2 stage M estimate

Solves for \(x\) where,

\[y = Ax .\]

Parameters

Anp.ndarray: Predictors, size nobs*nregressors
ynp.ndarray: Observations, size nobs
initialDict: Initial solution with parameters, scale and residuals
scaleoptional: A scale estimate
interceptbool, optional: True or False for adding an intercept term

Returns

paramsnp.ndarray: Values in x
residsnp.ndarray: Residuals = y - Ax
scalefloat: Robust measure of variance
weightsnp.ndarray: Weights used in robust regression

resistics.regression.robust.olsModel(A, y, **kwargs) → Tuple[source]¶

Ordinary least squares

Solves for \(x\) where,

\[y = Ax .\]

Parameters

Anp.ndarray: Predictors, size nobs*nregressors
ynp.ndarray: Observations, size nobs
interceptbool, optional: True or False for adding an intercept term

Returns

paramsnp.ndarray: Least squares solution
residsnp.ndarray: Residuals
squareResidnp.ndarray: Square residuals
rankint: Rank of matrix A
snp.ndarray: Singular values of A

resistics.regression.robust.sampleMAD(data)[source]¶

Median absolute deviation

The standard deviation is not robust against outliers, hence use the MAD.

Parameters

np.ndarray: Data for which to calculate MAD

Returns

float: The MAD

resistics.regression.robust.sampleMAD0(data)[source]¶

Median absolute deviation using an estimate of the location as 0

When the location estimate is zero (rather than the median), the MAD essentially reduces to a median. This should be over non zero data. Useful for calculating variance of residuals.

Parameters

np.ndarray: Data for which to calculate MAD. This is often residuals when using 0 as an estimate of location.

Returns

float: The MAD using zero as an esimate of location

resistics.regression.robust.sampleMedian(data)[source]¶

Calculate the median of an array

Mean is not a robust estimator of locations as it can be broken by a single outlying value. The median is a more robust choice.

Parameters

np.ndarray: Data for which to calculate median

Returns

float: The median

resistics.regression.robust.trimmedMeanLocationWeights(r: numpy.ndarray, k: float) → numpy.ndarray[source]¶

Trimmed mean location weights

Parameters

rnp.ndarray: Residuals
kfloat: Tuning parameter

Returns

weightsnp.ndarray: The robust weights

resistics.regression.robust.weightLS(A: numpy.ndarray, y: numpy.ndarray, weights: numpy.ndarray) → Tuple[numpy.ndarray][source]¶

Transform A and y using the weights to perform a weighted least squares

\[\sqrt{weights} y = \sqrt{weights} A x ,\]

is equivalent to,

\[A^H weights y = A^H weights A x ,\]

where \(A^H\) is the hermitian transpose.

In this method, both y and A are multipled by the square root of the weights and then returned.

Parameters

ynp.ndarray: Observations
Anp.ndarray: Regressors

Returns

ynp.ndarray: Observations multipled by the square root of the weights
Anp.ndarray: Regressors multipled by the square root of the weights