Basics of Machine Learning: Understanding Bayesian Decision Theory

7 min readMar 9, 2023

Bayesian Decision Theory involves assessing the potential gains and losses associated with different classification decisions through the use of probabilities and costs. It is a probabilistic approach that uses prior knowledge, observations, and statistical inference to make decisions that minimize expected losses. In this article, we will explore the basics of Bayesian decision theory, including the state of nature, prior, likelihood, posterior, expected loss, and loss function. We will also include a brief example and formulas for implementing Bayesian decision theory in Python.

Basic pictorial representation of Bayes Formula. https://pin.it/7FR64P6

State of nature

The theory is based on the concept of the state of nature, which is the set of all possible outcomes or events that could occur in a particular situation. The state of nature can be represented by a set of features or variables, which are relevant to the decision-making process. For instance, in medical diagnosis, the state of nature might be the presence or absence of a particular disease, and the features might be the patient’s symptoms, medical history, and test results.

Features of state of nature are the variables that represent the state of nature. They can be continuous or discrete and can have different types of distributions. For example, in a medical diagnosis problem, the features might be blood pressure, cholesterol level, and age. Each feature can have its own distribution, such as a normal distribution, a binomial distribution, or a Poisson distribution.

Prior Probabilities

Prior Probability is the probability of a particular state of nature before any data or information is available. Prior probabilities can be based on previous knowledge or assumptions, and they may or may not be accurate. Prior probabilities are important because they serve as a starting point for the calculation of the posterior probability.

Posterior Probability

The posterior probability is the probability of each state of nature after taking into account any new information. This probability is calculated using Bayes’ theorem, which states that the posterior probability is proportional to the product of the prior probability and the likelihood of the data given the state of nature. The posterior probability is the updated probability of the state of nature, taking into account any new information that has been obtained.

Loss Function and Expected Loss

In order to make a decision, a decision-maker must also consider the potential losses or costs associated with each possible outcome. The loss function assigns a numerical value to each possible outcome, representing the cost or loss associated with that outcome. The expected loss is the sum of the losses associated with each possible outcome, weighted by their respective probabilities. The expected loss associated with a particular decision is know as Risk.

Example: Flipping a Coin

To illustrate the concepts of Bayesian Decision Theory, let us consider the example of flipping a coin. Suppose we have a coin that we want to test for fairness. We flip the coin 10 times and observe that it comes up heads 7 times and tails 3 times. We want to use Bayesian decision theory to determine whether the coin is fair or biased.

We start by assigning prior probabilities to the two possible states of nature: the coin is fair or the coin is biased. We can use a uniform prior, which means that we assign equal probabilities to both states of nature. Let’s denote the state of nature by θ, where θ=0 represents a fair coin and θ=1 represents a biased coin. Then, the prior probabilities are:

P(θ=0) = 0.5

P(θ=1) = 0.5

Next, we need to define the likelihood of the observed data given each state of nature. We assume that the coin flips are independent and identically distributed (i.i.d.) Bernoulli trials with parameter θ. That is, the probability of observing a head on each flip is θ, and the probability of observing a tail is 1-θ. Then, the likelihood function is:

p(X|θ) = θ⁷(1-θ)³

where X represents the observed data, which is 7 heads and 3 tails in this case.

Using Bayes’ theorem, we can calculate the posterior probabilities of the two states of nature:

P(θ=0|X) = p(X|θ=0)P(θ=0) / (p(X|θ=0)P(θ=0) + p(X|θ=1)P(θ=1))

P(θ=1|X) = p(X|θ=1)P(θ=1) / (p(X|θ=0)P(θ=0) + p(X|θ=1)P(θ=1))

Plugging in the prior probabilities and the likelihood function, we get:

P(θ=0|X) = 0.057

P(θ=1|X) = 0.943

These are the posterior probabilities of the two states of nature given the observed data. The posterior probability of θ=1 (biased coin) is much higher than the posterior probability of θ=0 (fair coin), suggesting that the coin is likely biased.

We can now use a loss function to make a decision. Let’s assume that the cost of making a wrong decision is $1 . If we decide that the coin is fair when it is actually biased, we incur a loss of $1. If we decide that the coin is biased when it is actually fair, we also incur a loss of $1. The expected loss for each decision is:

E(L(θ=0|X)) = P(θ=1|X) = $0.943

E(L(θ=1|X)) = P(θ=0|X) = $0.057

Therefore, the decision with the lowest expected loss is to choose θ=1 (biased coin).

Calculating and Visualizing the Classifier using Python

To calculate the classifier, we need to define a decision boundary that separates the two states of nature. The decision boundary is determined by the relative values of the posterior probabilities. In this example, the decision boundary is at P(θ=0|X)=0.5, which corresponds to P(θ=1|X)=0.5.

In Python, we can calculate and visualize the classifier as follows:

import numpy as np
import matplotlib.pyplot as plt

# define prior probabilities
p0 = 0.5  # fair coin
p1 = 0.5  # biased coin
x = 7

# define likelihood function
def likelihood(x, theta):
    return theta**x * (1-theta)**(10-x)

# calculate posterior probabilities
def posterior(x, p0, p1):
    likelihood_0 = likelihood(x, 0.5)
    likelihood_1 = likelihood(x, 0.8)
    numerator_0 = likelihood_0 * p0
    numerator_1 = likelihood_1 * p1
    denominator = numerator_0 + numerator_1
    posterior_0 = numerator_0 / denominator
    posterior_1 = numerator_1 / denominator
    return posterior_0, posterior_1

posterior_0, posterior_1 = posterior(x, p0, p1)
print(f"Posterior probability of fair coin: {posterior_0:.3f}")
print(f"Posterior probability of biased coin: {posterior_1:.3f}")

The output is:

Posterior probability of fair coin: 0.368
Posterior probability of biased coin: 0.632

To visualize the classifier, we can plot the posterior probability of the biased coin as a function of the number of heads. This will give us a curve that separates the two states of nature. We can then plot a decision boundary at P(θ=0|X)=0.5 to show the region where we would choose the fair coin vs. the biased coin.

# plot classifier
x_values = np.arange(11)
posterior_1_values = [posterior(x, p0, p1)[1] for x in x_values]
plt.plot(x_values, posterior_1_values, label="Posterior probability of biased coin")
plt.axhline(y=0.5, color="r", linestyle="--", label="Decision boundary")
plt.xlabel("Number of heads")
plt.ylabel("Posterior probability")
plt.legend()
plt.show()

The resulting plot shows the posterior probability of the biased coin as a function of the number of heads, along with the decision boundary at P(θ=0|X)=0.5:

Bayesian Classifier

After calculating the posterior probabilities for each class, we can use them to classify new observations. To do so, we simply choose the class with the highest posterior probability as our predicted class. Mathematically, we can express this as:

optimal decision function: argmax_k∈{0,1,…,K−1} P(y=k∣x)

where $K$ is the total number of classes, and $P(y=k | \mathbf{x})$ is the posterior probability of class $k$ given observation $\mathbf{x}$.

In Python, we can calculate the classifier as follows:

def bayesian_classifier(x, p0, p1, theta_0, theta_1):
    posterior_0 = likelihood(x, theta_0) * p0
    posterior_1 = likelihood(x, theta_1) * p1
    return 0 if posterior_0 > posterior_1 else 1

This function takes as input an observation x, the prior probabilities p0 and p1, and the parameters theta_0 and theta_1 of the likelihood function for the two classes. It calculates the posterior probabilities for each class using the likelihood function and the prior probabilities, and then returns the predicted class based on the class with the highest posterior probability.

Conclusion

Bayesian Decision Theory provides a framework for making decisions under uncertainty by combining prior knowledge and observed data. It allows us to calculate the probability of different states of nature given the observed data and to choose the decision that minimizes the expected loss or risk. By using Bayes’ theorem, we can update our beliefs about the state of nature as we collect more data, and we can make decisions based on the most likely state of nature.

In this article, we have discussed the key concepts of Bayesian Decision Theory, including the state of nature, prior probability, likelihood function, posterior probability, loss function, and expected loss. We have also discussed the concept of Bayesian Classifier.

Finally, I have tried to summarize a gigantic topic in a short summary. This will give reader a basic insight into the maze of Bayesian Decision Theory.

For more comprehensive knowledge,

Pattern Classification by Richard, 2nd Ed

Code on my GitHub repository