Binary Classification in Vanguard

[1]:
# © Crown Copyright GCHQ
#
# Licensed under the GNU General Public License, version 3 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.gnu.org/licenses/gpl-3.0.en.html
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

A showcase of the implementation of standard binary classification in Vanguard. The approach used for this implementation borrows heavily from this GPyTorch example.

[2]:
random_seed = 1_989
[3]:
import numpy as np
from gpytorch.likelihoods import BernoulliLikelihood
from gpytorch.mlls import VariationalELBO
from matplotlib import pyplot as plt

from vanguard.classification import BinaryClassification
from vanguard.datasets.classification import BinaryGaussianClassificationDataset
from vanguard.kernels import ScaledRBFKernel
from vanguard.vanilla import GaussianGPController
from vanguard.variational import VariationalInference

Introduction

A standard binary classification problem can be mapped to a regression problem very straightforwardly. Instead of considering your data points as class indices, consider them as extreme points in the interval $[0, 1]$. By regressing on those points, the model can be used to make class predictions by thresholding on the value, where $[0, 0.5)$ denotes one class, and $[0.5, 1]$ the other. This value can also be used to determine the model uncertainty, as values closer to the extremes imply more certainty.

Given that we are regressing on two classes, we make use of the BernoulliLikelihood in order to transform the latent posteriors into actual probabilities. Given that the standard output from the model is a Gaussian distribution, this likelihood employs probit regression to give us proper probabilities, scaling via the standard Gaussian cumulative distribution function. In particular, the probit likelihood is calculated in closed form by applying the following formula [Kuss05]:

\[q(y_*=1\mid\mathcal{D},{\pmb{\theta}},{\bf x_*}) = \int {\bf\Phi}(f_*)\mathcal{N}(f_*\mid\mu_*,\sigma_*^2)df_* = {\bf\Phi}\left( \frac{\mu_*}{\sqrt{1 + \sigma_*^2}} \right ).\]

This means that the predictive uncertainty is properly taken into account.

Data

We use the BinaryGaussianClassificationDataset for this experiment, which creates two classes based on the distance to the centre of a two-dimensional gaussian distribution. We use relatively few training points to prevent the model overfitting out of the gate.

[4]:
DATASET = BinaryGaussianClassificationDataset(
    num_train_points=10, num_test_points=100, rng=np.random.default_rng(random_seed)
)

We plot all of the truth data:

[5]:
plt.figure(figsize=(8, 8))
DATASET.plot()
plt.show()
../_images/examples_binary_classification_10_0.png

Modelling

Preparing a controller for binary classification is as straightforward as applying the BinaryClassification decorator. Because the BernoulliLikelihood is non-Gaussian, the VariationalInference decorator is required in order to run approximate inference. It also has the added benefit of using a smaller number of inducing points, which will enable inference on the the larger datasets traditionally used in classification tasks.

[6]:
@BinaryClassification(ignore_all=True)
@VariationalInference(ignore_all=True)
class BinaryClassifier(GaussianGPController):
    pass

We choose a standard kernel: the ScaledRBFKernel. The likelihood class must also be a subclass of BernoulliLikelihood, and the marginal log likelihood class needs to accept a num_data parameter, so the safest bet is the VariationalELBO class.

[7]:
controller = BinaryClassifier(
    DATASET.train_x,
    DATASET.train_y,
    kernel_class=ScaledRBFKernel,
    y_std=0,
    likelihood_class=BernoulliLikelihood,
    marginal_log_likelihood_class=VariationalELBO,
    rng=np.random.default_rng(random_seed),
)
/home/docs/checkouts/readthedocs.org/user_builds/vanguard/envs/latest/lib/python3.13/site-packages/torch/utils/_device.py:104: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.detach().clone() or sourceTensor.detach().clone().requires_grad_(True), rather than torch.tensor(sourceTensor).
  return func(*args, **kwargs)

Before we try fitting, let’s see how well the classifier does without any hyperparameter training. We cannot use the posterior_over_point() method, as the model posteriors need to be passed through the likelihood to be properly scaled. Instead, classifiers in Vanguard have a special classify_points method to do this.

[8]:
predictions, probs = controller.classify_points(DATASET.test_x)

Recall that the output from the model is being scaled by the likelihood to the interval $[0, 1]$. In fact, the uncertainty from that output is ignored, as the means of those distributions implies the model uncertainty based on distance from the extrema.

The plot below shows the prediction classes. A circle represents a correct prediction, whereas a cross represents an incorrect prediction.

[9]:
plt.figure(figsize=(8, 8))
DATASET.plot_prediction(predictions)
plt.show()
../_images/examples_binary_classification_18_0.png

Now we actually try fitting, to see if this improves the performance.

[10]:
loss = controller.fit(100)
print(f"Loss: {loss:.5f}")
Loss: 0.60577
[11]:
predictions, probs = controller.classify_points(DATASET.test_x)
[12]:
plt.figure(figsize=(8, 8))
DATASET.plot_prediction(predictions)
plt.show()
../_images/examples_binary_classification_22_0.png

Conclusions

We have successfully demonstrated binary classification in Vanguard. For classification tasks with more than two classes, check out the multiclass example notebook