In the previous tutorial, we build an artificial neural network from scratch using only matrices. In this tutorial, we’ll build an artificial neural network with python just using the NumPy library. While we create this neural network we will move on step by step. But you can use any programming language to create this neural network too.


Describe The Network Structure

The artificial neural network that we will build consists of three inputs and eight rows. But we will use only six-row and the rest of the rows will be test data. We will build an artificial neural network that has a hidden layer, an output layer. The Hidden layer will consist of five neurons. Weights and bias of the neural network will be created randomly.

neural_network_strucle.PNG

Capture.PNG

Define the Variables


In this part, we will define the variables that we use. The variables will consist of the matrices. Therefore we can simply define these matrices using the python NumPy library.


Firstly we need to install the NumPy library and we can install the NumPy library using the following command.

pip install numpy

Defining inputs and output:

import numpy as np
inputs = np.array([[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1]])
output = np.array([[0],[1],[0],[1],[0],[1]])
print("inputs : ")
print(inputs)
print(".....................")
print("output : ")
print(output)

output.PNG

Defining weights

w1 = np.random.randn(inputs.shape[1],5)
w2 = np.random.randn(5,output.shape[1])
print("w1")
print(w1)
print(".......................")
print("w2")
print(w2)

Defining biases:

b1 = 1
b2 = 1

Activation Function

The activation functions allow us to bring our output values in each layer to the 0-1 range. If we do our operations without using the activation function, the output values will increase exponentially after each process. This will cause us to train our artificial neural network for very long periods of time. There are many activation functions to eliminate this. These are sigmoid function, tanh function, relu function etc. We will use the sigmoid function in this tutorial.

sigmoid.PNG
def sigmoid(x):
    return 1/(1 + np.exp(-x))

Buid the Artificial Neural Network

Artificial neural network training consists of two main parts.

  • Calculating the predicted output ŷ, known as feedforward
  • Updating the weights and biases, known as backpropagation

The following image neural network shows training.


1_CEtt0h8Rss_qPu7CyqMTdQ.png

Feedforward

z = x * w + b

a = 1/(1 + e^-z )

y_head = a2

error = loss function

error = (1/2) * (output – y_head)^2

z1 = np.dot(inputs,w1) + b1
a1 = sigmoid(z1)
z2 = np.dot(a1, w2) + b2
a2 = sigmoid(z2)
error = np.sum((1/2)*(output - a2)**2)
print("z1 : ",z1.shape)
print("a1 : ",a1.shape)
print("z2 : ",z2.shape)
print("a2 : ",a2.shape)
print("error : ", error)

z1 : (6, 5)

a1 : (6, 5)

z2 : (6, 1)

a2 : (6, 1)

error : 0.8829125977000097

BackPropagation

So far, we found the error value of our prediction. But we need to update our weights and bias values. So we need to know the derivative of the loss function with respect to the weights and biases. We know that derivative of the function is the slope of the function. If we can calculate the derivative of the function we can simply update weights and bias values by increasing/reducing. This called gradient descent.

However, we can’t directly calculate the derivative of the loss function with respect to the weights and biases. Because the derivative of the loss function doesn’t contain weights and bias. So we need to use the chain rule to calculate it. Each part of the chain rule is called the partial derivative.

derivative.PNG

Step 1: Find derivatives of layer one.

error_d_a2 = (a2 - output)
# derivative of the a2 with respect to the z2
a2_d_z2 = a2*(1 - a2)
# derivative of the z2 with respect to the w2
z2_d_w2 = a1
# derivative of the z2 with respect to the b2_w
z2_d_b2 = b2</pre></div>
print("error_d_a2 : ", error_d_a2.shape)
print("a2_d_z2 : ", a2_d_z2.shape)
print("z2_d_w2 : ", z2_d_w2.shape)
print("z2_d_b2 : ", z2_d_b2)

error_d_a2 : (6, 1)

a2_d_z2 : (6, 1)

z2_d_w2 : (6, 5)

z2_d_b2 : 1


Step 2: Calculate the derivative of the error with respect to the w2.

delta_w2 =  error_d_a2 * a2_d_z2
delta_w2 = np.dot(z2_d_w2.T, delta_w2)
print("delta_w2 : ", delta_w2.shape)

delta_w2 : (5, 1)

Step 3: Calculate the derivative of the error with respect to the b2.

delta_b2 =  error_d_a2 * a2_d_z2
delta_b2 = delta_b2 * z2_d_b2
delta_b2 = np.sum(delta_b2)
print("delta_b2 : ", delta_b2)

delta_b2 : 0.34414836643934316


Step 4: Update w2 and b2_w.

w2 = w2 - delta_w2
b2 = b2 - delta_b2
print("new w2 : ")
print(w2)
print(".....................")
print("new b2_w : ")
print(b2)

Capture2.PNG


Step 5: Find derivatives of layer two.

# derivative of the z2 with respect to the a1
z2_d_a1 = w2
# derivative of the a1 with respect to the z1
a1_d_z1 = a1*(1 - a1)
# derivative of the z1 with respect to the w1
z1_d_w1 = inputs
# derivative of the z1 with respect to the b1_w
z1_d_b1_w = b1
print("z2_d_a1 : ", z2_d_a1.shape)
print("a1_d_z1 : ", a1_d_z1.shape)
print("z1_d_w1 : ", z1_d_w1.shape)
print("z1_d_b1_w : ", z1_d_b1_w)

z2_d_a1 : (5, 1)

a1_d_z1 : (6, 5)

z1_d_w1 : (6, 3)

z1_d_b1_w : 1


Step 6: Calculate the derivative of the error with respect to the w2.

delta_w1 =  error_d_a2 * a2_d_z2
delta_w1 = np.dot(delta_w1,z2_d_a1.T)
delta_w1 = delta_w1 * a1_d_z1
delta_w1 = np.dot(inputs.T,delta_w1)
print("delta_w1 : ", delta_w1.shape)

delta_w1 : (3, 5)


Step 7: Calculate the derivative of the error with respect to the b1.

delta_b1 =  error_d_a2 * a2_d_z2
delta_b1 = np.dot(delta_b1,z2_d_a1.T)
delta_b1 = delta_b1 * a1_d_z1
delta_b1 = delta_b1 * z1_d_b1_w
delta_b1 = np.sum(delta_b1)
print("delta_b1: ", delta_b1)

delta_b1: -0.0020394367989461973


Step 8: Update w1 and b1.

w1 = w1 - delta_w1
b1 = b1 - delta_b1
print("new w1 : ")
print(w1)
print(".....................")
print("new b1 : ")
print(b1)

Capture3.PNG


Training the neural network

So far, we calculated all parameters and now we will train our neural network.

# Defining all variables
import numpy as np
error_list = list()
inputs = np.array([[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1]])
output = np.array([[0],[1],[0],[1],[0],[1]])
# Weights
w1 = np.random.randn(inputs.shape[1],5)
w2 = np.random.randn(5,output.shape[1])
# Biases
b1 = 1
b2 = 1
# Sigmoid Function
def sigmoid(x):
    return 1/(1 + np.exp(-x))
# Update the weights times 100.
for i in range(100):
    # Feedforward
    z1 = np.dot(inputs,w1) + b1
    a1 = sigmoid(z1)
    z2 = np.dot(a1, w2) + b2
    a2 = sigmoid(z2)
    error = np.sum((1/2)*(output - a2)**2)
    # Backpropagation
    ## LAYER 2
    ### derivative of the error with respect to the a2
    error_d_a2 = (a2 - output)
    ### derivative of the a2 with respect to the z2
    a2_d_z2 = a2*(1 - a2)
    ### derivative of the z2 with respect to the w2
    z2_d_w2 = a1
    ### derivative of the z2 with respect to the b2_w
    z2_d_b2 = b2
    ### delta weights 2
    delta_w2 =  error_d_a2 * a2_d_z2
    delta_w2 = np.dot(z2_d_w2.T, delta_w2)
    ### delta biases
    delta_b2 =  error_d_a2 * a2_d_z2
    delta_b2 = delta_b2 * z2_d_b2
    delta_b2 = np.sum(delta_b2)
    ### Update weights and bias
    w2 = w2 - delta_w2
    b2 = b2 - delta_b2
    ## LAYER 1
    ### derivative of the z2 with respect to the a1
    z2_d_a1 = w2
    ### derivative of the a1 with respect to the z1
    a1_d_z1 = a1*(1 - a1)
    ### derivative of the z1 with respect to the w1
    z1_d_w1 = inputs
    ### derivative of the z1 with respect to the b1_w
    z1_d_b1_w = b1
    ### delta weights 1
    delta_w1 =  error_d_a2 * a2_d_z2
    delta_w1 = np.dot(delta_w1,z2_d_a1.T)
    delta_w1 = delta_w1 * a1_d_z1
    delta_w1 = np.dot(inputs.T,delta_w1)
    ### delta bias 1
    delta_b1 =  error_d_a2 * a2_d_z2
    delta_b1 = np.dot(delta_b1,z2_d_a1.T)
    delta_b1 = delta_b1 * a1_d_z1
    delta_b1 = delta_b1 * z1_d_b1_w
    delta_b1 = np.sum(delta_b1)
    ### update w1 and b1
    w1 = w1 - delta_w1
    b1 = b1 - delta_b1
    error_list.append(error)
    print("error : ", error)

Show the Trained Data With Matplotlib Library

We trained our artificial neural network times 100. Now let’s show trained data on the screen using the python matplotlib library.

import matplotlib.pyplot as plt
x = np.arange(len(error_list))
y = error_list
plt.figure(figsize=(10,8))
plt.plot(x,y)
plt.xlabel("iteration")
plt.ylabel("error")
plt.title("Artificial Neural Networks Training")
plt.show()
matplotlib.PNG