https://pin.it/7ckeyCS

Deep Learning: Neural Network for Classification with Tensor Flow

Naveed Ul Mustafa

--

In the vast and evolving world of machine learning, building neural network models for classification problems presents its own unique challenges and distinctions, especially when compared to regression tasks. While the foundational steps in crafting a neural network remain consistent across problems, the nuances lie in the approach to certain hyperparameters and the overarching strategy.

Classification, at its core, is about assigning categories or labels to data points. Imagine sorting emails as ‘spam’ or ‘not-spam’, or categorizing images of fruits into their respective types; these are quintessential examples of classification. Delving deeper, we can categorize classification problems into three primary types:

  • Binary Classification: A classic ‘either-or’ scenario where data points are categorized into one of two classes, such as ‘0 or 1’ or ‘spam and not-spam’.
  • Multi-class Classification: Data points can belong to one of several distinct classes, but each data point is assigned to just one class. For instance, identifying a handwritten digit can fall into one of ten classes (0 through 9).
  • Multilabel Classification: Here, a single data sample can be tagged with multiple labels. Think of a movie that can simultaneously belong to genres like ‘action’, ‘comedy’, and ‘romance’.

In this article, we’ll embark on a journey where we demystify the model-building process for these classification problems. We’ll delve into the intricacies of binary classification, exploring how the neural network model is designed, trained, and evaluated. Following that, we’ll navigate the realm of multiclass classification, understanding its peculiarities and visualizations. By the end, you’ll have a comprehensive understanding of how neural network model-building for classification problems stands distinct from regression, and how to adeptly handle each classification type.

Architecture of a Classification Model

Neural network architectures for classification tasks may seem similar to those for regression on the surface, but there are pivotal differences that cater to the distinct nature of classification problems. As we traverse through binary to multilabel classification, subtle (yet crucial) adjustments are made in the model’s structure, which influences both its building and compilation.

snippet from lecture by Daniel
snippet from lecture by Daniel

Model building for Binary Classification

Binary classification stands distinct in the realm of machine learning tasks, primarily characterized by its black-and-white nature; the labels associated with the data either fall into the category of ‘0’ or ‘1’. These labels, in essence, describe the features of the data, acting as definitive tags that inform the outcome.

Understanding the binary structure is crucial because it influences several stages of model building: Understanding the binary structure is crucial because it influences several stages of model building, such as; Data preparation, Architecture design, Loss function, Evaluation metrics, etc.

Data Preparation:

I will generate a circular data from sklearn. As much as data is involved, so is visualization. My approach is to visualize as much as possible.

from sklearn.datasets import make_circles
import pandas as pd

# Make 1000 examples
n_samples = 1000

# Example classification data with sklearn
X,y = make_circles(n_samples,
noise = 0.03,
random_state = 42)

# convert generated to pandas dataframe
df = pd.DataFrame({"x_0": X[:,0], "x_1":X[:,1], "label": y})
df
# plot the data
import matplotlib.pyplot as plt

plt.scatter(df["x_0"], df["x_1"], c = y, cmap= plt.cm.RdYlBu)
Pandas dataFrame: df

Model Building:

Similar process goes into model building, as it did in regression, however, changes occurs in hyperparameters.

import tensorflow as tf

# set random seed
tf.random.set_seed(42)

# build a model
model_1 = tf.keras.Sequential([
tf.keras.layers.Dense(1, input_shape = (2,))
])

# Compile a model
model_1.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.SGD(),
metrics = ["accuracy"])

# fit the model
history = model_1.fit(X,y,epochs=100, verbose=0)

# model Evalustion
model_1.evaluate(X,y)
32/32 [==============================] - 0s 2ms/step - loss: 0.7042 - accuracy: 0.5000
[0.70424884557724, 0.5]

Apparently, the model is showing an accuracy of 50%. This shows that model is not learning any pattern and loss of 70% indicates how much model alignment with actual label is off. These are some massive numbers, and needs more investigation. This can be possible through model’s prediction visualization.

import numpy as np

# plot predicted values against actual data
def plot_decision_boundary(model, X, y):
"""
plot the decision boundary created by model predicting on X.
"""
# coordinates
x_min, x_max = X[:,0].min()-0.1, X[:,0].max()+0.1
y_min, y_max = X[:,1].min()-0.1, X[:,1].max()+0.1

# meshgrid
xx, yy = np.meshgrid(np.linspace(x_min, x_max, 100), # 100 values evenly b/w x_min & x_max
np.linspace(y_min, y_max, 100))
# Create X values
x_in = np.c_[xx.ravel(), yy.ravel()] # stack 2D arrays together

# Make prediction
y_pred = model.predict(x_in)

# check for multi-class
if len(y_pred[0])>1:
print("doing Multiclass classification")

# Reshape the prediction
y_pred = np.argmax(y_pred, axis=1).reshape(xx.shape)
else:
print("Binary classification")
y_pred = np.round(y_pred).reshape(xx.shape)

# plot the decision boundary
plt.contourf(xx,yy, y_pred, cmap=plt.cm.RdYlBu)
# plot the original Data
plt.scatter(X[:,0], X[:,1], c=y, s=40, cmap = plt.cm.RdYlBu)
plt.xlim(xx.min(), xx.max())
plt.ylim(yy.min(), yy.max())


# check what prediction our model is making
plot_decision_boundary(model_1, X=X, y=y)

The function `plot_decision_boundary()` takes the model, features (X) and labels (y). Further, it creates a meshgrid of the different values of X, and use our model to make predictions on those meshgrid values, and plots the prediction regions and line between zones.

It looks, our model predicting a linear prediction, to a non-linear dataset. This calls for adjusting the hyperparameters. What I learned from tweaking, in order:

  1. Add ‘relu’ activation function to input layer of neuron, and use Adam optimizer instead of SGD().
  2. Add 1 hidden layer with ‘relu’ activation function and an output layer. increases epochs to 600, to learn how long it takes to reach 99% accuracy and 0.3% loss — however, its is not sustainable, though I achieved the required figures.
  3. Adjust neuron in the input layer and hidden layer, add ‘sigmoid’ activation function to output layer, and return epochs to 100.
# Building NN with non-linear activation function + adding extra layers
tf.random.set_seed(42)

# create a model
model_7 = tf.keras.Sequential([
tf.keras.layers.Dense(4,activation=tf.keras.activations.relu),
tf.keras.layers.Dense(4,activation=tf.keras.activations.relu),
tf.keras.layers.Dense(1,activation=tf.keras.activations.sigmoid)
])

# compile
model_7.compile(loss = tf.keras.losses.BinaryCrossentropy(),
optimizer = tf.keras.optimizers.Adam(lr=0.001),
metrics=["accuracy"])

# fit
history = model_7.fit(X,y,epochs=100)

Well this time it appeared to have a clear boundary. this is the power of reevaluating the steps taken for the model accuracy.

Model building for Multi-Class Classification

Distinct from binary classification where labels depict a simple ‘yes’ or ‘no’ dichotomy, multi-class classification grapples with data where each sample could belong to one of more than two classes. Consider, for instance, identifying handwritten digits where each image could represent any number from 0 to 9. This broader categorization presents its own set of challenges and nuances compared to both binary and regression model building.

Here the focus is more towards visualization and model improvement.

Data Preparation:

The data is imported from `tensorflow.keras.datasets`. The dataset is about fashion_mnist, where a greyscale image belongs to either 1 of 10 classes. The dataset come prepared with training and test set, so not much to do with dataset. What needed to identify the train, test set and compare the unique labels with class names, given in the data description.

import tensorflow as tf
from tensorflow.keras.datasets import fashion_mnist

# The data is already being sorted in train and test
(train_data, train_labels), (test_data, test_labels) = fashion_mnist.load_data()

# Create a small list, so we can index onto our training labels (to make it human readable)
class_name = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

# plot an example with its label
index_of_choice = 17
plt.imshow(train_data[index_of_choice], cmap = plt.cm.binary)
plt.title(class_name[train_labels[index_of_choice]])
output

Model building:

It is important to understand the core merits for flatten the data. It has advantages interms of reduction in computational complexiety, memory efficiency, etc, however, it’s essential to be aware that while flattening is useful in specific scenarios (like transitioning from CNNs to fully connected layers), it’s not always the best approach. For example, at the beginning of a CNN, you’d want to maintain an image’s 2D or 3D structure to ensure convolutional filters can capture spatial features effectively. Flattening too early in such architectures would lead to a loss of valuable spatial information. I applied the flattening for the same reasons.

Steps taken before model building:

  1. Since our labels are not available in one-hot encoding, therefore, instead of `tf.keras.losses.CategoricalCrossentropy()` i will be using `tf.keras.losses.SparseCategoricalCrossentropy()`.
  2. normalize the data — as features are images, with greyscale values from 0–255, this causes alot of trouble in the model and with finding appropriate patterns. So normalizing is the way to go.
# Normalization
train_data_norm = train_data/255.0
test_data_norm = test_data/255.0

# check the min & max of normalized data
train_data_norm.min(), train_data_norm.max()
# set random seed
tf.random.set_seed(42)

# create the model
model = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=(28,28)),
tf.keras.layers.Dense(4,activation = "relu"),
tf.keras.layers.Dense(4,activation = "relu"),
tf.keras.layers.Dense(10,activation = "softmax"),
])

model.compile(loss = tf.keras.losses.SparseCategoricalCrossentropy(),
optimizer = tf.keras.optimizers.Adam(),
metrics = ["accuracy"])

# Fit the model
norm_history = model.fit(train_data_norm,
train_labels,
epochs=10,
validation_data = (test_data_norm, test_labels))

Normalizing shoots out training accuracy to 78.4% and test accuracy to almost 77%. The improvement of performance has been observed.

# visualize the loss
import pandas as pd

# plotnormalized data
pd.DataFrame(norm_history.history).plot(title = "Normalized data")
output

This clearly shows a rapid decrease in losses. This sharp decline indicates that the model is effectively learning and optimizing its weights to fit the data better with each passing epoch. In tandem with the decrease in loss, the model’s accuracy on the training data experienced a significant upward trajectory. Approaching 80% by the end of the 10th epoch reinforces the idea that the model is improving its predictive performance with each epoch.The normalization of data (by dividing by 255) likely played a key role in helping the neural network converge more efficiently. Normalized data generally leads to faster convergence and can help the optimizer navigate the loss surface more effectively. Even though there’s a significant improvement over the epochs, the fact that accuracy is at 80% and loss is at 60% by the end suggests there might still be room for further optimization. Depending on the complexity of the dataset and the problem at hand, further training, model tweaking, or additional data augmentation strategies might boost performance.

Evaluating a Multi-Class Classification Model

Building a model is just the first step. The subsequent processes of evaluation, optimization, and deployment are equally critical in the journey from data to actionable insights or products.

One way to evaluate a model is to produce a confusion Matrix.

from sklearn.metrics import confusion_matrix

def plot_confusion_matrix(y_true, y_pred, classes=None, figsize = (10,10), text_size = 15):
# Create confusion matrix
cm = confusion_matrix(y_true, y_pred)
cm_norm = cm.astype("float") / cm.sum(axis=1)[:, np.newaxis] # normalize the confusion matrix
n_classes = cm.shape[0]

# Make it more attractive
fig, ax = plt.subplots(figsize=figsize)
# Create a matrix plot
cax = ax.matshow(cm, cmap=plt.cm.Blues)
fig.colorbar(cax)

# Label the axis
if classes:
labels = classes
else:
labels = np.arange(cm.shape[0])

ax.set(title="Confusion matrix",
xlabel="Predicted label",
ylabel="True label",
xticks=np.arange(n_classes),
yticks=np.arange(n_classes),
xticklabels=labels,
yticklabels=labels)

# Set x-axis labels to the bottom
ax.xaxis.set_label_position("bottom")
ax.xaxis.tick_bottom()

# Adjust label size
ax.yaxis.label.set_size(text_size)
ax.xaxis.label.set_size(text_size)
ax.title.set_size(text_size)

# Set threshold for different colors
threshold = (cm.max() + cm.min()) / 2

# Plot the text on each cell
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, f"{cm[i, j]} ({cm_norm[i, j] * 100:.1f}%)",
horizontalalignment="center",
color="white" if cm[i, j] > threshold else "black",
size= text_size)

plt.show()

# predicted labels
test_pred = model.predict(test_data_norm)

# convert all test_preds to integers
test_pred = test_pred.argmax(axis=1)


$ plot Confusion Matrix
plot_confusion_matrix(y_true = test_labels, y_pred = test_pred,
classes = class_name,
figsize = (20,10),
text_size = 7.5)
Output

I borrowed the confusion matrix code from Daniel. This confusion matrix indicates the losses model makes. model confuses T-shirt/top with shirt and dresses, Pullover with Coat & Shirt, Sneakers with Ankle boot. one appropriate way to merge T-shirt/Top with Shirt, in order to reduce the confusion.

The realm of classification in deep learning offers a fascinating blend of challenges and opportunities. As we delved deep into the nuances of binary and multi-class classification using neural networks, it’s evident that while the foundational steps in constructing these models remain consistent, the intricacies lie in the careful calibration of hyperparameters, data handling, and architecture choices. Through visualization and methodical evaluation, we can continually refine our models, ensuring they not only understand the patterns in our data but also generalize well to unseen samples.

Follow me on Github & X.

--

--

Naveed Ul Mustafa
Naveed Ul Mustafa

Written by Naveed Ul Mustafa

Student, interested in Machine Learning & Gen AI, Computational Neuroscience & Computer Vision

No responses yet