Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection

Anomalies are widespread when it comes to working on data. They become vital in time series. So, It is crucial to propose efficient methods to detect and deal with them. This article illustrates a state-of-the-art model called DGHL for anomaly detection. DGHL includes a ConvNet as a Generator and instead of encoding it maximizes the likelihood with the Alternating Back-Propagation algorithms.

6 min readApr 18, 2022

--

I am building nouswise🤗!! check it out and get on the waitlist😉 If you want early access hit me up on X (Twitter). Let me know your thoughts.

As you may know, time series are everywhere, in any industry you are thinking of. We have all heard of outliers as data points that are far from their neighbors or other points (global and local outliers) and likely to reduce our model’s accuracy and reliability (or sometimes useful🤞); These anomalies are inevitable to be avoided in the real world (sensors, …).

These anomalies can be seen in both univariate or multivariate time series dataset sets as you can see in Figure1(for both types: Point Outlier, Subsequence Outlier)

Figure 1. Point outliers in time series data. | source

Figure 2. **Subsequence outliers in time series data.** | source

Now we know what the Anomaly is! Great job!😉 Let’s have a general look at the common techniques for detecting anomalies: STL decomposition, CART (Classification and Regression Trees), clustering-based, distance-based, autoencoders, etc. (for more reading, you can find the review papers by 1, 2, 3). However, numerous methods deal with streaming time series; only a few can gradually tune to the time series evolution.

This article represents a new model (DGHL); in other words, Deep Generative Model based on a top-down Convolutional Network (ConvNet), that displays multivariate data windows to an exquisite hierarchical latent space. DGHL works by making the observed likelihood maximum directly with the Alternating Back-Propagation algorithm and short-run MCMC; consequently, there is no relying on auxiliary networks like encoders or discriminators.

Please note that I did not mention the illustration of mathematical equations due to the prevention of making the article too long

Deep Generative model with Hierarchical Latent Factors (DGHL)

Figure 3. Generator architecture, which maps a latent vector Z to a time series window. Each layer of the ConvNet increases the temporal dimension and reduces the filters by a factor of 2. A batch nomalization layer and ReLU activations are included between each convolutional layer. | source

1. Hierarchical Latent Factors

Here, we can find that a concatenation layer and a top-down Convolutional Network (ConvNet) are obtained for the state model and generator model, respectively(look at Figure 3).

In Figure 3. we can see the hierarchical latent space with a = [1,3,6]. The main element in this space is leveraging dynamics by letting producing realistic time series of arbitrary length while keeping their long-term dynamics. The hierarchy structure can be incorporated as hyper-parameters to be tuned or pre-trained.

We manage the model flexibility by the relative size of the lowest level state vector and the upper levels.

2. Training with Alternating Back-Propagation

We use the Alternating Back-Propagation algorithm to instruct the parameters Θ of DGHL, by boosting the observed log-likelihood directly:

Algorithm 1 exhibits the alternating back-propagation algorithm with mini-batches. Here, we have two distinguished steps, including 1. inferential back-propagation 2. learning back-propagation. You can see its procedure in the following Figure 4.

Figure 4. **Algorithm 1:** **Mini-batch Alternating back-propagation (ABP Algorithm) |** source

The one drawback of MCMC methods is the cost of computation. By the way, Langevin Dynamics depends and relies on the gradients of the Generator function, which can be implemented easily by PyTorch or TensorFlow.

3. Online Anomaly Detection

The DGHL model learned how to produce time-series windows based on our training dataset (Y); It enables the model to study normal (non-anomalous) sequential data and relations among various time series. Here, I will describe the method used to rebuild windows on unnoticed test data Yᵗᵉˢᵗ in the process of anomaly detection.

The test set Yᵗᵉˢᵗ is used as a stream of m time series in Online Anomaly Detection. Like the training dataset, we separate Yᵗᵉˢᵗ in sequential windows with the same parameters Sw and S. Here, anomalies scores in a single-window are calculated each time.

There is a novel strategy that makes DGHL outstanding; the minimization of the rebuilding error conditional on the learned models F and G is the result of maximizing a posteriori mode that is the correspondence of the inferred factors.

4. Online Anomaly Detection with missing data

From Figure 4. (ABP Algorithm), we can see the first step is inferring latent vectors with Langevin Dynamics. The algorithm can handle missing data by deriving Z by calculating the residuals just on the observed signal Yₒᵦₛ. The derived vectors assert samples of the posterior distribution conditional on the signal.

It is surprising that Generative models (trained based on ABP algorithm) showed better results than VAEs and GANs when the data had missing values on both CV and NLP tasks.

Figure 5. **Occlusion experiment on machine-1–1 of the SMD. Red lines present the reconstructed time series with DGHL. Gray areas correspond to the occluded information during training |** source

We can see an example of occluded data in Figure 5. Occluded segments are marked in gray. We can easily see that DGHL rebuilt the data with considerable accuracy even though, in those periods that most of the data was missed. This is important for the task of detecting anomalies because the data is only used to calculate the anomaly score.

Also, we can see the algorithm can recover the missing data points, which is amazing in perfect pipelines with downstream applications.

Datasets

In this project 4 datasets have been utilized:

Server Machine Dataset (SMD): multivariate dataset with 38 features for 28 server machines introduced in PAPER.

2$3. Soil Moisture Active Passive satellite (SMAP) and Mars Science Laboratory rover (MSL): SMAp Includs 55 multivariate time series datasets, where each includes 24 variables. MSL contains 27 datasets, where each contains 54 variables.

4. Secure Water Treatment (SWaT): a public dataset about water treatment testbed. The dataset is available here.

Accuracy

Figure 6. **F1 scores on SMD dataset using a single threshold across all machines |** source

We can see that The model DGHL is compared with a number of other methods in Figure 6. which shows an outstanding F1 score.

Table 1. F1 scores on benchmark datasets (the larger the better). The benchmark models performance was taken from previous research in 2020 and 2021. The first place is marked in bold and the second place in bold and italic. DGHL corresponds to simpler model with fully independent latent vectors for each window. | source

Main Reference

Challu, C., et al., Deep Generative model with Hierarchical Latent Factors for Time Series Anomaly Detection. arXiv preprint arXiv:2202.07586, 2022.

Hope you enjoyed this article. Please feel free to contact me directly via Twitter or LinkedIn for any reason. If you find any errors, please just let me know. This article is written for my future projects.