关于PCA为什么要中心化
因为不做zero mean,根本做不了PCA。从线性变换的本质来说,PCA就是在线性空间做一个旋转(数据矩阵右乘协方差矩阵的特征向量矩阵),然后取低维子空间(实际上就是前ncomponents个特征向量张成的子空间)上的投影点来代替原本的点,以达到降维的目的,注意我说的,只做了旋转,没有平移,所以首先你要保证原本空间里的点是以原点为中心分布的,这就是zero mean的目的。另外如果自己手撸过PCA的算法就知道了,explained_variance和explainedvariance_ratio是怎么实现的?explainedvariance就是协方差矩阵的每个特征值除以sample数,而explained_variance_ratio是每个特征值除以所有特征值之和。为什么这么简单呢?这也和zero mean有关,如果你用最大投影长度的证法去证明PCA就会在过程中很自然的发现这一点,在这里我就不展开了。
链接:https://www.zhihu.com/question/40956812/answer/848527057
PCA 去中心化不一定是必需的,这一结论成立的前提是严格使用协方差矩阵的定义式 S=XXT−nμμT,而不是用 XXT 来当作协方差矩阵。
Centering is an important pre-processing step because it ensures that the resulting components are only looking at the variance within the dataset, and not capturing the overall mean of the dataset as an important variable (dimension). Without mean-centering, the first principal component found by PCA might correspond with the mean of the data instead of the direction of maximum variance.
https://towardsdatascience.com/tidying-up-with-pca-an-introduction-to-principal-components-analysis-f876599af383
Why is minimizing squared residuals equivalent to maximizing variance?
Consider a datapoint (row of ). Then the contribution of that datapoint to the variance is , or equivalently the squared Euclidean length . Applying the Pythagorean theorem shows that this total variance equals the sum of variance lost (the squared residual) and variance remaining. Thus, it is equivalent to either maximize remaining variance or minimize lost variance to find the principal components.
http://alexhwilliams.info/itsneuronalblog/2016/03/27/pca/
PCA can only be interpreted as the singular value decomposition of a data matrix when the columns have first been centered by their means.
https://stats.stackexchange.com/questions/29781/when-conducting-multiple-regression-when-should-you-center-your-predictor-varia
PCA focuses on “explaining” the data matrix using the sample means plus the eigencomponents. When the column mean is far from the origin, the first right singular value is usually quite highly correlated with column mean - thus using PCA concentrates on the second, third and sometimes higher order singular vectors. This is a loss of information when the mean is informative for the process under study. On the other hand, when the scatterplot of the data is roughly elliptical, the PCs typically align with the major axes of the ellipse. Due to the uncorrelatedness constraint, if the mean is far from the origin, the first singular vector will be close to the mean and the others will be tilted away form the major axes of the ellipse. Thus the first singular vector will not be informative about the spread of the data, and the second and third singular values will not be in the most informative directions. Generally, PCA will be more informative, particularly as a method for plotting the data, than uncentered SVD.
https://online.stat.psu.edu/stat555/node/94/
Since X is zero centered we can think of them as capturing the spread of the data around the mean in a sense reminiscent of PCA.
https://intoli.com/blog/pca-and-svd/
that reconstruction error is minimized by taking as columns of W some k orthonormal vectors maximizing the total variance of the projection.
https://stats.stackexchange.com/questions/130721/what-norm-of-the-reconstruction-error-is-minimized-by-the-low-rank-approximation
PCA is a regressional model without intercept1. Thus, principal components inevitably come through the origin. If you forget to center your data, the 1st principal component may pierce the cloud not along the main direction of the cloud, and will be (for statistics purposes) misleading.
https://stats.stackexchange.com/questions/22329/how-does-centering-the-data-get-rid-of-the-intercept-in-regression-and-pca
Centering brings in a big difference. PCA with centering maximizes SS deviations from the mean (i.e. variance); PCA on raw data maximizes SS deviations from the zero point.
https://stats.stackexchange.com/questions/489037/principal-components-with-and-without-centering?noredirect=1&lq=1
SVD and PCA
singular value decomposition
SVD is basically a matrix factorization technique, which decomposes any matrix into 3 generic and familiar matrices.
Eigenvalues and Eigenvectors
The concept of eigenvectors is applicable only for square matrices. The vector space spanned by an eigenvector is called an eigenspace.
A square matrix is called a diagonalizable matrix if it can be written in the format: $ A=PDP^{-1} $, D is the diagonal matrix comprises of the eigenvalues as diagonal elements
A Symmetric Matrix where the matrix is equal to the transpose of itself.
Special properties of a Symmetric Matrix with respect to eigenvalues and eigenvectors:Has only Real eigenvalues;Always diagonalizable;Has orthogonal eigenvectors.
A matrix is called an Orthogonal Matrix if the transpose of the matrix is the inverse of that matrix.
ince the eigenvectors of a Symmetric matrix are orthogonal to each other, matrix P in the diagonalized matrix A is an orthogonal matrix. So we say that any Symmetric Matrix is Orthogonally Diagonalizable.
% For the PCA derived from maximal preserved variance \cite{lee2007nonlinear}, we have the covariance
% of $\mathbf{y}$, which is
% \begin{equation}
% \mathbf{C}_{\mathbf{y} \mathbf{y}}=E\left{\mathbf{y} \mathbf{y}^T\right}
% \end{equation}
% This equation is valid only when $\mathbf{y}$ is centered.
The goal of PCA is to maximize the variance of the data along each of the principal components. Centering is an important step because it ensures that the resulting components are only looking at the variance of features, and not capturing the means of the features as important. Without mean-centering, the first principal component found by PCA might correspond with the mean of the data instead of the direction of maximum variance.