|
|
# Linear Regression
|
|
|
|
|
|
This document contains information regarding the Odinary Least Squares linear regression implemented in the evaluation tool of the visualtisation platform.
|
|
|
|
|
|
## Model
|
|
|
|
|
|
The regression used in our evaluation follows the linear model:
|
|
|
|
|
|
* y = __X__β + ε
|
|
|
|
|
|
where y is the vector of scores (e.g community engagement, information knowledge) and __X__ is the matrix of independent demographics (e.g gender, ethnicity) selected by the user. β is the fixed effect of parameters and ε is the vector of random errors.
|
|
|
|
|
|
## Estimating the Coefficients
|
|
|
|
|
|
The estimate of β using Ordinary Least Squares is given by:
|
|
|
|
|
|
* β<sup>_e_</sup> = (__X'X__)<sup>-1</sup> __X'__ y
|
|
|
|
|
|
where __X'__ is the transpose of __X__. We assume normality meaning (ε ~ N(0, σ<sup>2</sup>_I_<sub>n</sub>). Since ε is a random variable, the regressors are all deterministic and the regressor matrix containing a series of 1s (coefficient of x<sup>0</sup>), we have _E_(ε) = 0, where _E_ is the expected value. We substitute the value of y, we get:
|
|
|
|
|
|
* β<sup>_e_</sup> = (__X'X__)<sup>-1</sup> __X'__ y = (__X'X__)<sup>-1</sup> __X'__ (__X__β + ε)
|
|
|
= β + (__X'X__)<sup>-1</sup> __X'__ ε
|
|
|
|
|
|
leaving us with β<sup>_e_</sup> - β = (__X'X__)<sup>-1</sup> __X'__ ε. As β is a constant, we have Var(β<sup>_e_</sup> - β) = Var(β<sup>_e_</sup>)
|
|
|
where the variance is the square of the standard deviation giving us
|
|
|
|
|
|
* Var(β<sup>_e_</sup> - β) = _E_[(β<sup>_e_</sup> - β)(β<sup>_e_</sup> - β)'] - _E_[(β<sup>_e_</sup> - β)]_E_[(β<sup>_e_</sup> - β)]'
|
|
|
|
|
|
(primes denote the transpose of that particular matrix)
|
|
|
Since we assume __X__ to be deterministic, the expected value only applies to ε
|
|
|
* Var(β<sup>_e_</sup>) = (__X'X__)<sup>-1</sup>__X'__ _E_(εε')__X__(__X'X__)<sup>-1</sup> - (__X'X__)<sup>-1</sup>__X'__ _E_(ε)_E_(ε)'__X__(__X'X__)<sup>-1</sup>
|
|
|
|
|
|
with _E_(ε) = 0, it becomes :
|
|
|
* Var(β<sup>_e_</sup>) = (__X'X__)<sup>-1</sup>__X'__ _E_(εε')__X__(__X'X__)<sup>-1</sup>
|
|
|
|
|
|
and we assumed normality so Var(ε) = _E_(εε') = σ<sup>2</sup>_I_, where σ<sup>2</sup> > 0 is the common variance/error of each element in the vector of errors or the mean-squared error. This gives us:
|
|
|
* Var(β<sup>_e_</sup>) = (__X'X__)<sup>-1</sup>__X'__ σ<sup>2</sup>_I_ __X__(__X'X__)<sup>-1</sup>
|
|
|
|
|
|
simplifing this, we get the error or variance for each coefficient given by
|
|
|
|
|
|
* Var(β<sup>_e_</sup>) = σ<sup>2</sup>(__X'X__)<sup>-1</sup>
|
|
|
|
|
|
where the error of each coefficient in β<sup>_e_</sup> corresponds to the diagonal elements of the resulting matrix above. |
|
|
\ No newline at end of file |