Sparse Autoencoder

3 min readDec 13, 2021

In 1989, a study proposed to find a relationship between a one-layer autoencoder and principal component analysis (PCA). From that study, they found that the new autoencoder is working similarly to PCA. Nevertheless, the computational complexity of autoencoder is more than PCA’s, and the reason was that the PCA’s complexity is on matrix decompositions. As a result, this narrows the applications of autoencoders. Later that, non-linear activation functions came and enabled autoencoders to learn more valuable features.

Later then, after passing years, numerous studies have been proposed, and now, it is inevitable that autoencoders are viable, and their popularity is increasing more.

Supervised learning by neural networks with labeled data has been described widely. Sometimes we do not have labeled data to feed our supervised models like ANN (Artificial Neural Networks), when we have to have a look at unsupervised learning algorithms to train on unlabeled data set {X(1), X(2), X(3),…} where X(i) is a real number. Autoencoder neural networks are considered unsupervised learning algorithms that apply backpropagation, setting the values of the targets and inputs equally. Y(i)=X(i)

We can see an autoencoder in the below figure:

During the training of the autoencoder, the algorithm is making an effort to learn the estimation to recognize the function of parameters, which is h W, b (x) ≈ x. This function poses a trivial function to be trying to learn and, at the same time, putting a limitation on the network; this is not far beyond detecting exciting structure about the data. If we put the number of hidden layers more than the number of inputs, it is still viable to accomplish our interesting structure; in other words, if sparsity constraints force on the hidden neurons, the algorithm can still be productive.

Briefly, we propose a neuron as active when its output is pretty 1, or on the opposite side, inactive when the outcome is nearly 0. Generally, we are interested in limiting the neurons to generate 0.

Autoencoder offers simpler learning than latent approaches; in other words, the autoencoder maps the original data into low-dimensional space. Autoencoder is constructed from two parts, auto + encoder, where the first part is related to being unsupervised, and the other one is about that it learns another representation of data. The autoencoder instructs an encoded representation by reaching the minimum loss between the original data and the data decoded from this representation.

Autoencoder’s objective function rebuilds the input. When it is training, the hidden neurons’ weights are the mix of previous layers, and these weights rise as the model becomes deeper. This increase in weights makes the produced features more associated with the network structure rather than its feed (input). This is not a pleasure to us, so, to prevent this happens, Sparse Autoencoder inflicts weight-decay regularization to keep neurons’ weights small.

The formula of the objective function of the sparse autoencoder is:

Here, β controls the weight of the sparsity penalty term and ρj (implicity) related to W,b also, since it is the mean of hidden unit j.

Here is a list of some popular applications of autoencoders:

Dimensionality reduction
Image denoising
Feature extraction
Recommendation system
Image compression

After all, Autoencoder is an FFNN (Feed Forward Neural Network) that we use Autoencoder to transfer the input neurons to the output neurons by hidden layer(s). It operates in two phases (encoding and decoding). The input data are mapped to a low-dimensional representation space to accomplish the most convenient feature in the stage of encoding and then, again, maps to the input space in the decoding stage.

Please feel free to contact me on LinkedIn and have a chat.🙂

Sparse Autoencoder

Written by Reza Yazdanfar

No responses yet