PCA In Depth

crystal0108wong
Jan 8, 2019
2 min read

What is PCA?

PCA is fundamentally a dimensionality reduction algorithm, but it can also be useful as a tool for visualization, for noise filtering, for feature extraction and engineering, and much more. It combines our input variables in a specific way then drop the “least important” variables while still retaining the most valuable parts of all of the variables. Each of the “new” variables after PCA are all independent of one another.

PCA finds a linear projection of high dimensional data into a lower dimensional subspace such as:

The variance retained is maximized
The least square reconstruction error is minimized

When to Use?

When you want to reduce the number of variables, but aren’t able to identify variables to completely remove from consideration.
Ensure your variables are independent of one another
When you are comfortable making your independent variable less interpretable

How does PCA work?

Calculate a matrix that summarize how your variables all relate to each other.
Break the matrix down to 2 components: direction, magnitude. And the 2 components are independent from each other, which means the Cov(A,B) = 0.
Transform our data to align with the directions.

What to do with PCA?

PCA Analysis - Principal components analysis
PCA as dimensionality reduction
PCA as noise filtering

Calculate PCA step by step

Centralize/normalize the data.
Obtain the Eigenvectors and Eigenvalues from the covariance matrix or correlation matrix, or perform Singular Vector Decomposition.
The eigendecomposition of the covariance matrix (if the input data was standardized) yields the same results as a eigendecomposition on the correlation matrix, since the correlation matrix can be understood as the normalized covariance matrix. All 4 approaches yield the same eigenvectors and eigenvalue pairs:
Eigendecomposition of the covariance matrix after standardizing the data.
Eigendecomposition of the correlation matrix.
Eigendecomposition of the correlation matrix after standardizing the data.
SVD(Singular Vector Decomposition) is used to improve the computational efficiency
Sort eigenvalues in descending order and choose the k eigenvectors that correspond to the k largest eigenvalues where k is the number of dimensions of the new feature subspace (k≤d).
Construct the projection matrix W from the selected k eigenvectors.
Transform the original dataset X via W to obtain a k-dimensional feature subspace Y.

Calculate PCA in Python – sklearn

PCA vs. LDA

Both Linear Discriminant Analysis (LDA) and PCA are linear transformation methods. PCA yields the directions (principal components) that maximize the variance of the data, whereas LDA also aims to find the directions that maximize the separation (or discrimination) between different classes, which can be useful in pattern classification problem (PCA "ignores" class labels).

In other words, PCA projects the entire dataset onto a different feature (sub)space, and LDA tries to determine a suitable feature (sub)space in order to distinguish between patterns that belong to different classes.

Key Points

Correlation is a normalized measure of the amount and direction (positive or negative) that 2 columns change together. Covariance is a generalized and unnormalized version of correlation across multiple columns. A covariance matrix is a calculation of covariance of a given matrix with covariance score for every column with every other column, including itself.

[if gte msEquation 12]><m:oMathPara xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"><m:oMath xmlns:m="http://schemas.microsoft.com/office/2004/12/omml"><m:r><m:t>𝐶𝑂𝑉</m:t></m:r><m:d><m:dPr><m:ctrlPr></m:ctrlPr></m:dPr><m:e><m:r><m:t>𝑋</m:t></m:r><m:r><m:t>,</m:t></m:r><m:r><m:t>𝑌</m:t></m:r></m:e></m:d><m:r><m:t>= </m:t></m:r><m:f><m:fPr><m:ctrlPr></m:ctrlPr></m:fPr><m:num><m:nary><m:naryPr><m:chr m:val="∑"></m:chr><m:limLoc m:val="subSup"></m:limLoc><m:ctrlPr></m:ctrlPr></m:naryPr><m:sub><m:r><m:t>𝑖</m:t></m:r><m:r><m:t>=1</m:t></m:r></m:sub><m:sup><m:r><m:t>𝑛</m:t></m:r></m:sup><m:e><m:r><m:t>(</m:t></m:r><m:r><m:t>𝑋</m:t></m:r><m:r><m:t>−</m:t></m:r><m:bar><m:barPr><m:pos m:val="top"></m:pos><m:ctrlPr></m:ctrlPr></m:barPr><m:e><m:r><m:t>𝑋</m:t></m:r></m:e></m:bar></m:e></m:nary><m:r><m:t>)(</m:t></m:r><m:r><m:t>𝑌</m:t></m:r><m:r><m:t>−</m:t></m:r><m:bar><m:barPr><m:pos m:val="top"></m:pos><m:ctrlPr></m:ctrlPr></m:barPr><m:e><m:r><m:t>𝑌</m:t></m:r></m:e></m:bar><m:r><m:t>)</m:t></m:r></m:num><m:den><m:r><m:t>𝑛</m:t></m:r><m:r><m:t>−1</m:t></m:r></m:den></m:f><m:r><m:t> | </m:t></m:r><m:r><m:t>𝐶𝑂𝑅𝑅</m:t></m:r><m:d><m:dPr><m:ctrlPr></m:ctrlPr></m:dPr><m:e><m:r><m:t>𝑋</m:t></m:r><m:r><m:t>,</m:t></m:r><m:r><m:t>𝑌</m:t></m:r></m:e></m:d><m:r><m:t>= </m:t></m:r><m:f><m:fPr><m:ctrlPr></m:ctrlPr></m:fPr><m:num><m:r><m:t>𝐶𝑂𝑉</m:t></m:r><m:r><m:t>(</m:t></m:r><m:r><m:t>𝑋</m:t></m:r><m:r><m:t>,</m:t></m:r><m:r><m:t>𝑌</m:t></m:r><m:r><m:t>)</m:t></m:r></m:num><m:den><m:sSub><m:sSubPr><m:ctrlPr></m:ctrlPr></m:sSubPr><m:e><m:r><m:t>𝑆</m:t></m:r></m:e><m:sub><m:r><m:t>𝑥</m:t></m:r></m:sub></m:sSub><m:r><m:t>∗ </m:t></m:r><m:sSub><m:sSubPr><m:ctrlPr></m:ctrlPr></m:sSubPr><m:e><m:r><m:t>𝑆</m:t></m:r></m:e><m:sub><m:r><m:t>𝑦</m:t></m:r></m:sub></m:sSub></m:den></m:f></m:oMath></m:oMathPara><![endif][if !msEquation][endif]

PCA In Depth

Commentaires

Featured Posts

Survival Analysis II - Comparison of Survival Function

Tag Cloud

Category