Paper Summary [Deep Deterministic Uncertainty for Semantic Segmentation]

Reza Yazdanfar
5 min readJan 9, 2022

Please note that this post is for my probable research in the future to look back and review the materials on this topic without reading the paper completely.

source is here

Deep Deterministic Uncertainty(DDU) makes it viable to calculate and separate aleatoric and aleatoric uncertainty in a model. In this, the main attention is the familiarity of feature representations of pixels at various locations for the same class. It concludes that it is probable to use DDU locations independently. This DDU would result in a considerable reduction in consuming memory compared to pixel-dependent pixels. The researchers used DeepLab-v3+ architecture and applied on Pascal VOC 2012 to show their improvements on MC Dropout and Deep Ensembles.

introduction

In addition to prediction in deploying deep learning models, uncertainty reliability is crucial for safety-critical applications (e.g. autonomous driving, medical diagnosis, etc.). Numerous methods have been proposed on this issue, demanding several forward passes through the model.

There are several methods for obtaining uncertainty in a forward pass, such as DUQ and ANP; however, though these two methods are suitable, these need comprehensive alterations to the structure and training setup, with additional hyperparameters which need to be fine-tuned.

DDU:

  • can utilize feature space density with appropriate inductive biases
  • prevents the feature collapse issue

Because of feature collapse, samples (that are out-of-distribution (OoD)) are mapped to in-distribution regions in the feature space, making the model overconfident on input data. Therefore, proper inductive biases on the model are necessary to capture uncertainty in feature space density.

There are two types of uncertainty:

  1. epistemic uncertainty:
    * captures what mode does not know
    * high for unseen or OoD inputs and can be decreased with more training data
  2. aleatoric uncertainty
    * captures ambiguity and observation noise in in-distribution samples

This research uses DDU for semantic segmentation to generate the same dimension output as classified input. The reason to choose semantic segmentation is that it is good with class imbalance.

DDU in Semantic segmentation

A brief introduction to DDU:

When we trained the model (with a bi-Lipschitz constraint), we can calculate the feature space means and covariances per class using a single pass over all the training samples. These two will be used to fit a Gaussian Discriminant Analysis (GDA).

Pixel-independent class-wise means and covariances:

Each pixel has its own prediction attached to it and corresponding distribution in semantic segmentation. In this research, we can calculate means and covariance matrices with no need for pixels, just like in multi-class classification.

Figure 1. L2 distances between the feature space mean of different classes for a pair of distant pixels on the Pascal VOC 2021. vals set: (left) Pixels (10,255) and (500,225), (middle) Pixels (234,349) and (36,22) and (right) Pixels (300,500) and (400,255)

In this figure, the authors plotted the L2 distances between the feature space means of all pairs. The result was that the means of the same class are much closer together as compared to other classes. It is sensible that the convolution kernel is shared across the entire feature space representation.

Computing Feature Density:

The authors fitted a GDA assuming that pixels would be autonomous samples. There are two actions performed simultaneously:

  1. One mean and one covariance per class (not pixel) were obtained, and then the GDA was applied.
  2. The per-pixel softmax entropy from the model is obtained.

Consequently, authors could release aleatoric and epistemic uncertainty with a single deterministic model in semantic segmentation. It can be seen from the figure below:

Applying DDU in the context of semantic segmentation

Experiments

To assess the reliability of DDU on semantic segmentation, researchers use the Pascal VOC dataset and compare it with three other uncertainty baselines (softmax entropy, MC Dropout and, Deep Ensembles).

Architecture and training setup:

The hyperprameters used for this study can be described as below:

  • epochs = 50
  • optimizer = SGD (momentum=0.9 and weight decay =5e-4)
  • lr = 0.007

Baselines and Uncertainty metrics:

  1. softmax Entropy
  2. MC Dropout (MCDO)
  3. Deep Ensemble

Metrics for Evaluation:

To evaluate each method, the authors used p(accurate — certain), p(uncertain — inaccurate), and PAPU which can be illustrated as below:

  • p(accurate — certain): the probability of a prediction being accurate given that the model is confident in the prediction
  • p(uncertain — inaccurate): the probability of the model being uncertain on inaccurate predictions.
  • PAPU: the probability of the model being confident on an accurate prediction or uncertain on an inaccurate one.

These three can be visualized as below:

Fig 3. Evaluation metrics on various baselines

The uncertainty estimate for four samples can be visualized as below:

(a) shows pixel-wise accuracy with bright signifying accurate and dark, inaccurate. (b) and (c) show predictive entropy (PE) and mutual information (MI) obtained from the MC Dropout (MCDO) baseline respectively, (d) and (e) show the PE and MI from deep ensembles. (f) maps per-pixel softmax entropy. Finally, (g) is feature density

The evaluation for Pascal VOC 2012 validation set accuracy and the time that it takes are provided in the table below:

Table 2. Pascal VOC validation set and runtime in milliseconds of a single forward pass for each above-mentioned baselines

NB: A single forward pass for:

  1. MC Dropout: it includes 5 stochastic forward passes.
  2. ensemble: it gets predictions from 3 ensemble components.

Observation:

  • The runtime of DDU and normal softmax is better than others. (Table 1)
  • DDU has higher values on all three metrics. (Figure 3)
  • DDU feature density captures epistemic uncertainty whereas softmax entropy captures aleatoric uncertainty. (Figure 4)

Conclusion:

In the end, we found that DDU can perform well on semantic segmentation tasks with FCNN architecture. It can perform independently from the pixel.

It is concluded that DDU acted better than other peer models.

NB. It is possible containing uncertainty in deep/machine learning can help us to debug models and make them more robust.

source is here

If any errors are found, please email me at rezayazdanfar1111@gmail.com. Meanwhile, follow me on my Twitter here, and visit my LinkedIn here. Finally, if you have any idea or advice, I am open to that, you just need to message me on LinkedIn. 🙂

--

--