speech

signal

spectrogram: A spectrogram of a time signal is a special two-dimensional representation that displays time in its horizontal axis and frequency in its vertical axis.

short-time Fourier analysis

Why use it?
Some regions of speech signals shorter than 100 milliseconds often appear to be periodic, so that we can use the exact definition of Fourier
transform.

spectral leakage

This phenomenon is called spectral leakage because the amplitude of one harmonic leaks over the rest and masks its value.

feature extraction

Representation of speech signals in the frequency domain is especially useful because the frequency structure of a phoneme is generally unique.

Sinusoids are important because speech signals can be decomposed as sums of sinusoids.

For voiced sounds there is typically more energy at low frequencies
than at high frequencies, also called roll-off. To make the spectrograms easier to read, sometimes the signal is first preemphasized (typically with a first-order difference FIR filter) to boost the high frequencies
to counter the roll-off of natural speech.

Digital Systems

Linear Time-Invariant Systems and Linear Time-Varying Systems.

The Fourier Transform

Z-Transform

digital filter

filterbank

A filterbank is a collection of filters that span the whole frequency spectrum.

short-time analysis

NP and reduction

P

P is the set of all decision problems that can be solved in
Polynomial time.

Extended Church-Turing Thesis

Any realistic model of computation can be efficiently
simulated by a Turing machine.
(The ECT is probably false!Probable counter-example: Quantum computing,However, the ECT is true for the classic models of computation
underlying our laptops)

NP problem

We do not know of polynomial-time algorithms for these problems, and we cannot prove that no
polynomial-time algorithm exists.

NP stands for Nondeterministic Polynomial time.

NP is a set of problems that there is a polynomial-time algorithm that verify if a solution to the problem is correct.
Note that showing that a problem is in NP does not mean that the problem can be solved in polynomial time. It only means that a proposed solution can be verified in polynomial time.

All problems in P are also in NP, we do not know if P=NP.

NP-hardness

We say that a problem Y is NP-hard if every problem in NP can be Karp-reduced to Y. To prove NP-hardness for Y we only have to find
one other NP-hard problem X and reduce X to Y.

NP-complete problem

A large class of problems in this “gray area” has been characterized,
and it has been proved that they are equivalent in the following sense: a polynomial-time algorithm for any one of them would imply the existence of a polynomial-time algorithm for all of them. These are the NP-complete problems.

Every problem in NP can be reduced to X. We will call such an X an NP-complete problem.

To prove a problem is np-complete, we need to prove it lies in Np and it is np-hard. In NP is proved by verifying the solution in polynomial time. Np-hard is proved by using Karp-reduction from known NP-hard problem.

the NP-complete problems are the hardest problems in NP

The Cook-Levin Theorem

The Sat problem is NP-hard.

Hamiltonian Cycle

Reduction from Sat to Hamiltonian Cycle.

Travelling Salesman

Reduction from Hamiltonian Cycle

Graph Coloring

Graph Coloring is NP-hard.
2-coloring is not NP-hard.

Vertex Cover

we say that a set of nodes S is a vertex cover if every edge e has at least one end in S.

Set Cover

Given a set U of n elements, a collection S1,…, Sm of subsets of U, and
a number k, does there exist a collection of at most k of these sets whose
union is equal to all of U?

CoNP

The complexity class CoNP consists of all problems where a “no” answer can be efficiently verified.
For every NP-complete problem there is a
corresponding CoNP-complete problem.

We do not know if NP=CoNP.

PSPACE

The complexity class PSPACE consists of all problems that can
be solved by an algorithm using at most a polynomial amount of
space.
It is strongly believed that NP != PSPACE, but we do not even
know with certainty whether P != PSPACE or not!

For many 2-player games like Geography, deciding if there is a
winning strategy from a given position is a PSPACE problem

BPP and ZPP

BPP

BPP (Bounded-error Probabilistic Polynomial-time)
consists of all decision problems for which there is a
polynomial-time randomized algorithms that is correct with
probability at least 2=3 on all instances.

ZPP

ZPP (Zero-error Probabilistic Polynomial Time)
consists of all decision problems for which there is a Las Vegas
algorithm running in expected polynomial time.
It is widely believed that P = ZPP = BPP but all three could be
different.

toolnotes

背景

记录平时学习和开发工程中使用工具的一些备忘点。

正文

用vim时,鼠标右键不能粘贴而是进入了visual模式,解决方法::set mouse-=a

远程jupyter配置
https://juejin.cn/post/7026371559971520525

For Debian / Ubuntu: .deb packages installed by apt and dpkg
For Rocky / Fedora / RHEL: .rpm packages installed by yum
For FreeBSD: .txz packages installed by pkg

ssh-keygen -t rsa -b 4096 -C “your_email@example.com”

Kill PyTorch Distributed Training Processes:
kill $(ps aux | grep YOUR_TRAINING_SCRIPT.py | grep -v grep | awk ‘{print $2}’)

git reset —soft HEAD^ 撤销commit
git reset —hard HEAD^ 撤销add

tokenizing in Unix: “tr” command
分词文件排序和统计
tr -sc ’A-Za-z’ ’\n’ < $file_name| sort | uniq -c

conda:
conda create -n python=3.7 yourenv pip

git find . -name “*.py”|xargs git add —

导出项目依赖:
pip install pipreqs
pipreqs ./ —encoding=utf-8 —force

inside docker container, cannot locate package with apt install:
echo “deb http://archive.debian.org/debian stretch main” > /etc/apt/sources.list

https://unix.stackexchange.com/questions/743839/apt-get-update-failed-to-fetch-debian-amd64-packages-while-building-dockerfile-f

PlotNeuralNet

https://pub.towardsai.net/creating-stunning-neural-network-visualizations-with-chatgpt-and-plotneuralnet-adab37589e5

remote develop

https://devblogs.microsoft.com/python/remote-python-development-in-visual-studio-code/

references”

https://leimao.github.io/blog/Kill-PyTorch-Distributed-Training-Processes/

PCA

关于PCA为什么要中心化

因为不做zero mean,根本做不了PCA。从线性变换的本质来说,PCA就是在线性空间做一个旋转(数据矩阵右乘协方差矩阵的特征向量矩阵),然后取低维子空间(实际上就是前ncomponents个特征向量张成的子空间)上的投影点来代替原本的点,以达到降维的目的,注意我说的,只做了旋转,没有平移,所以首先你要保证原本空间里的点是以原点为中心分布的,这就是zero mean的目的。另外如果自己手撸过PCA的算法就知道了,explained_variance和explainedvariance_ratio是怎么实现的?explainedvariance就是协方差矩阵的每个特征值除以sample数,而explained_variance_ratio是每个特征值除以所有特征值之和。为什么这么简单呢?这也和zero mean有关,如果你用最大投影长度的证法去证明PCA就会在过程中很自然的发现这一点,在这里我就不展开了。

链接:https://www.zhihu.com/question/40956812/answer/848527057

PCA 去中心化不一定是必需的,这一结论成立的前提是严格使用协方差矩阵的定义式 S=XXT−nμμT,而不是用 XXT 来当作协方差矩阵。

Centering is an important pre-processing step because it ensures that the resulting components are only looking at the variance within the dataset, and not capturing the overall mean of the dataset as an important variable (dimension). Without mean-centering, the first principal component found by PCA might correspond with the mean of the data instead of the direction of maximum variance.
https://towardsdatascience.com/tidying-up-with-pca-an-introduction-to-principal-components-analysis-f876599af383

Why is minimizing squared residuals equivalent to maximizing variance?
Consider a datapoint (row of ). Then the contribution of that datapoint to the variance is , or equivalently the squared Euclidean length . Applying the Pythagorean theorem shows that this total variance equals the sum of variance lost (the squared residual) and variance remaining. Thus, it is equivalent to either maximize remaining variance or minimize lost variance to find the principal components.

http://alexhwilliams.info/itsneuronalblog/2016/03/27/pca/

PCA can only be interpreted as the singular value decomposition of a data matrix when the columns have first been centered by their means.
https://stats.stackexchange.com/questions/29781/when-conducting-multiple-regression-when-should-you-center-your-predictor-varia

PCA focuses on “explaining” the data matrix using the sample means plus the eigencomponents. When the column mean is far from the origin, the first right singular value is usually quite highly correlated with column mean - thus using PCA concentrates on the second, third and sometimes higher order singular vectors. This is a loss of information when the mean is informative for the process under study. On the other hand, when the scatterplot of the data is roughly elliptical, the PCs typically align with the major axes of the ellipse. Due to the uncorrelatedness constraint, if the mean is far from the origin, the first singular vector will be close to the mean and the others will be tilted away form the major axes of the ellipse. Thus the first singular vector will not be informative about the spread of the data, and the second and third singular values will not be in the most informative directions. Generally, PCA will be more informative, particularly as a method for plotting the data, than uncentered SVD.
https://online.stat.psu.edu/stat555/node/94/

Since X is zero centered we can think of them as capturing the spread of the data around the mean in a sense reminiscent of PCA.
https://intoli.com/blog/pca-and-svd/

that reconstruction error is minimized by taking as columns of W some k orthonormal vectors maximizing the total variance of the projection.
https://stats.stackexchange.com/questions/130721/what-norm-of-the-reconstruction-error-is-minimized-by-the-low-rank-approximation

PCA is a regressional model without intercept1. Thus, principal components inevitably come through the origin. If you forget to center your data, the 1st principal component may pierce the cloud not along the main direction of the cloud, and will be (for statistics purposes) misleading.
https://stats.stackexchange.com/questions/22329/how-does-centering-the-data-get-rid-of-the-intercept-in-regression-and-pca

Centering brings in a big difference. PCA with centering maximizes SS deviations from the mean (i.e. variance); PCA on raw data maximizes SS deviations from the zero point.
https://stats.stackexchange.com/questions/489037/principal-components-with-and-without-centering?noredirect=1&lq=1

SVD and PCA

https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca

singular value decomposition

SVD is basically a matrix factorization technique, which decomposes any matrix into 3 generic and familiar matrices.

Eigenvalues and Eigenvectors

The concept of eigenvectors is applicable only for square matrices. The vector space spanned by an eigenvector is called an eigenspace.
A square matrix is called a diagonalizable matrix if it can be written in the format: $ A=PDP^{-1} $, D is the diagonal matrix comprises of the eigenvalues as diagonal elements
A Symmetric Matrix where the matrix is equal to the transpose of itself.
Special properties of a Symmetric Matrix with respect to eigenvalues and eigenvectors:Has only Real eigenvalues;Always diagonalizable;Has orthogonal eigenvectors.
A matrix is called an Orthogonal Matrix if the transpose of the matrix is the inverse of that matrix.
ince the eigenvectors of a Symmetric matrix are orthogonal to each other, matrix P in the diagonalized matrix A is an orthogonal matrix. So we say that any Symmetric Matrix is Orthogonally Diagonalizable.

https://towardsdatascience.com/singular-value-decomposition-and-its-applications-in-principal-component-analysis-5b7a5f08d0bd

% For the PCA derived from maximal preserved variance \cite{lee2007nonlinear}, we have the covariance
% of $\mathbf{y}$, which is
% \begin{equation}
% \mathbf{C}_{\mathbf{y} \mathbf{y}}=E\left{\mathbf{y} \mathbf{y}^T\right}
% \end{equation}
% This equation is valid only when $\mathbf{y}$ is centered.
The goal of PCA is to maximize the variance of the data along each of the principal components. Centering is an important step because it ensures that the resulting components are only looking at the variance of features, and not capturing the means of the features as important. Without mean-centering, the first principal component found by PCA might correspond with the mean of the data instead of the direction of maximum variance.

Network Representation Learning

background

Recording studying during KTH. First blog about network representation learning, a project belongs to machine learning, advanced course.

LINE

Reproduce paper “LINE: Large-scale Information Network Embedding”.

Alias Table Method

It’s a method of effiently drawing samples from discrete distribution.
reference:
https://www.keithschwarz.com/darts-dice-coins/
https://blog.csdn.net/haolexiao/article/details/65157026

Negative Sampling

word2vec

Original paper:
Efficient estimation of word representations in vector space.
reference:
word2vec Explained: Deriving Mikolov et al.’s
Negative-Sampling Word-Embedding Method

Skip-Gram Model

Original papaer:Distributed Representations of Words and Phrases
and their Compositionality.
The idea behind the word2vec models is that the words that appear in the same context (near each other) should have similar word vectors. Therefore, we should consider some notion of similarity in our objective when training the model. This is done using the dot product since when vectors are similar, their dot product is larger.
reference:
https://www.baeldung.com/cs/nlps-word2vec-negative-sampling

graphSage

thoughts

第二十二篇 机器学习(5)--不平衡的分类问题

什么是不平衡的分类问题

在机器学习中,不平衡的分类问题指的是类别之间的样本分布不均匀,其中某一类的样本数量远远超过另一类。这种不平衡可能会对模型训练和性能评估产生影响,因为模型可能更倾向于预测样本数更多的类别,而对样本数较少的类别进行较差的预测。

如何解决

为了解决不平衡分类问题,可以考虑以下方法:增加少数类别的样本数或减少多数类别的样本数,以平衡类别分布。这包括上采样(增加少数类别样本)和下采样(减少多数类别样本)。调整分类阈值,使模型更倾向于识别少数类别。这可以通过调整模型输出的概率阈值来实现。

第二十一篇 java(2)- 集合框架

集合框架

java集合框架包括 Collection 和 Map 两种,Collection 存储着对象的集合,而 Map 存储着键值对(两个对象)的映射表。

Collection

Collection下包括Set、List、Queue,各自又包含了使用不同方式的实现。

Set

Set下包括TreeSet,HashSet,LinkedHashSet。其中TreeSet基于红黑树实现。

第二十篇 机器学习(4)-uplift模型

uplift模型

uplift模型中文为增益模型,是工业界因果推断与机器学习结合最成熟的算法之一。传统的监督学习模型,往往是对输入x去预测一个y,而增益模型注重于x的变化对y的影响,以广告为例,传统的监督学习往往是给定这个特征去预测用户是否会点击,而增益模型注重的是给这个客户投放广告与否对客户是否购买广告商品所产生的影响。

因果推断

因果推断是从观察到的数据中推断出变量之间的因果关系的过程。在统计学和数据科学中,因果推断涉及到尝试理解一个事件或行为是什么导致了另一个事件或行为。这与相关性或关联不同,因果推断试图确定一个变量的变化是否直接导致另一个变量的变化。