Neural Networks are the building block of deep learning. In addition, the neural network is a mathematical model that imitates the working mechanism of the human brain. We can solve many problems in deep learning by using artificial neural networks. In artificial neural networks, we usually use python or r programming languages to train our network.
You can also think of artificial neural networks as a black box. Generally, it is quite easy to use deep learning algorithms. You can solve many problems easily and quickly with deep learning libraries if you have a data set. You can use “Keras”, “Tensorflow”, “Pytorch” libraries. But if you want to basically learn how neural networks work, you shouldn’t use these libraries to start with. Because when we want to make an advanced project, some problems will arise in our project, and we need to understand the basics of artificial neural networks to find out how to solve these problems.
So in this article, I will try to show you how to create a neural network in a simple way and how this algorithm works. While doing this, we will not use any programming language, we will create our artificial neural network using only matrices.
Artificial Neural Networks Terms
Basically, we need to know some terms to understand the working logic of artificial neural networks. Neural Networks consist of the following components
x —> input layer
y —> output layer
w —> weights
b —> bias
sigmoid —-> activation function
z = x * w + b
a = sinmoid (z)
sigmoid = 1 / (1 + e ^ -z)
ŷ = a
e = error
d = derivative
Artificial neural networks consist of three parts an input layer, hidden layer, and output layer. The input layer represents our dataset. The hidden layer is the product of the input layer and the weights. We can use as many hidden layers as we want in our artificial neural network. But we must determine the number of hidden layers according to the needs of our project. If we create too many hidden layers, the training time of our network will be very long, and this may cause memorization in our network. If we use too few hidden layers, this will result in our network not learning enough. As the name suggests, the output layer is the layer that gives us the result of our calculations. In addition, we can use as many neurons as we want in hidden layers.
The activation functions allow us to bring our output values in each layer to the 0-1 range. If we do our operations without using the activation function, the output values will increase exponentially after each process. This will cause us to train our artificial neural network for very long periods of time. There are many activation functions to eliminate this. These are sigmoid function, tanh function, relu function, etc. We will use the sigmoid function in this tutorial.
Describe the Problem
In this artificial neural network, we will create, we will use a two-input “AND” logic gate. In this application, there will be two input values and two examples. There will be a hidden layer and two neurons in this layer. In addition, there will be four weights between the input layer and the hidden layer, and two weights between the hidden layer and the output layer.
Creating the Neural Network
Artificial neural network training consists of two main parts.
- Calculating the predicted output ŷ, known as feedforward
- Updating the weights and biases, known as backpropagation
The following image neural network shows training.
Step 1: Find the z value by multiply inputs and weights_1 then sum with bias.
z = x * w + b
Step 2: Find the a1 value by applying the sigmoid function to the z1.
a1 = sigmoid(z1)
Step 3: Find the z2 values by multiply a1 and w2 then sum with bias 2.
z2 = a1 * w2 + b2
Step 4: Find the a2 values by applying the sigmoid function to the z2 values.
a2 = sigmoid(z2)
There are many loss functions that we will use in our artificial neural network. In this tutorial we’ll use the sum-of-sqaures error as our loss function.
That is, the sum of squares error is the sum of the difference between each predicted value and the actual value. The difference can be a positive or negative value, so we need to square the difference.
So far, we found the error value of our prediction. But we need to update our weights and bias values. So we need to know the derivative of the loss function with respect to the weights and biases. We know that derivative of the function is the slope of the function. If we can calculate the derivative of the function we can simply update weights and bias values by increasing/reducing. This called gradient descent.
However, we can’t directly calculate the derivative of the loss function with respect to the weights and biases. Because the derivative of the loss function doesn’t contain weights and bias. So we need to use the chain rule to calculate it. Each part of the chain rule is called the partial derivative.
Step 1: Find the derivative of the loss function. d_error = (ŷ – y) or (a – y).
Step 2: Find derivative of the a2 with respect to the z2.
Step 3: Find derivative of the z2 with respect to the w2.
Step 4: Find derivative of the z2 with respect to the b2
Step 5: Find derivative of the error with respect to the w2.
Step 6: Find derivative of the error with respect to the b2.
Step 7: Update w2 and b2 values.
step 8 : Find the derivative of the z2 with respect to the a1.
Step 9: Find the derivative of the a1 respect to the z1.
Step 10: Find the derivative of the w1 respect to the w1.
Step 11: Find the derivative of the z1 respect to the b1.
Step 12 : Find the derivative of the error respect to the w1.
Step 13 : Find the derivative of the error respect to the b1.
Step 12 : Update w1 and b1 values.
In this tutorial, we created an artificial neural network from scratch with matrices. In the next tutorial, we will see how to create an artificial neural network with python from scratch.