In the previous tutorial, we build an artificial neural network from scratch using only matrices. In this tutorial, we’ll build an artificial neural network with python just using the NumPy library. While we create this neural network we will move on step by step. But you can use any programming language to create this neural network too.
Describe The Network Structure
The artificial neural network that we will build consists of three inputs and eight rows. But we will use only six-row and the rest of the rows will be test data. We will build an artificial neural network that has a hidden layer, an output layer. The Hidden layer will consist of five neurons. Weights and bias of the neural network will be created randomly.


Define the Variables
In this part, we will define the variables that we use. The variables will consist of the matrices. Therefore we can simply define these matrices using the python NumPy library.
Firstly we need to install the NumPy library and we can install the NumPy library using the following command.
pip install numpy
Defining inputs and output:
import numpy as np
inputs = np.array([[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1]])
output = np.array([[0],[1],[0],[1],[0],[1]])
print("inputs : ")
print(inputs)
print(".....................")
print("output : ")
print(output)
Defining weights
w1 = np.random.randn(inputs.shape[1],5)
w2 = np.random.randn(5,output.shape[1])
print("w1")
print(w1)
print(".......................")
print("w2")
print(w2)
Defining biases:
b1 = 1
b2 = 1
Activation Function
The activation functions allow us to bring our output values in each layer to the 0-1 range. If we do our operations without using the activation function, the output values will increase exponentially after each process. This will cause us to train our artificial neural network for very long periods of time. There are many activation functions to eliminate this. These are sigmoid function, tanh function, relu function etc. We will use the sigmoid function in this tutorial.

def sigmoid(x):
return 1/(1 + np.exp(-x))
Buid the Artificial Neural Network
Artificial neural network training consists of two main parts.
- Calculating the predicted output ŷ, known as feedforward
- Updating the weights and biases, known as backpropagation
The following image neural network shows training.

Feedforward
z = x * w + b
a = 1/(1 + e^-z )
y_head = a2
error = loss function
error = (1/2) * (output – y_head)^2
z1 = np.dot(inputs,w1) + b1
a1 = sigmoid(z1)
z2 = np.dot(a1, w2) + b2
a2 = sigmoid(z2)
error = np.sum((1/2)*(output - a2)**2)
print("z1 : ",z1.shape)
print("a1 : ",a1.shape)
print("z2 : ",z2.shape)
print("a2 : ",a2.shape)
print("error : ", error)
z1 : (6, 5)
a1 : (6, 5)
z2 : (6, 1)
a2 : (6, 1)
error : 0.8829125977000097
BackPropagation
So far, we found the error value of our prediction. But we need to update our weights and bias values. So we need to know the derivative of the loss function with respect to the weights and biases. We know that derivative of the function is the slope of the function. If we can calculate the derivative of the function we can simply update weights and bias values by increasing/reducing. This called gradient descent.
However, we can’t directly calculate the derivative of the loss function with respect to the weights and biases. Because the derivative of the loss function doesn’t contain weights and bias. So we need to use the chain rule to calculate it. Each part of the chain rule is called the partial derivative.

Step 1: Find derivatives of layer one.
error_d_a2 = (a2 - output)
# derivative of the a2 with respect to the z2
a2_d_z2 = a2*(1 - a2)
# derivative of the z2 with respect to the w2
z2_d_w2 = a1
# derivative of the z2 with respect to the b2_w
z2_d_b2 = b2</pre></div>
print("error_d_a2 : ", error_d_a2.shape)
print("a2_d_z2 : ", a2_d_z2.shape)
print("z2_d_w2 : ", z2_d_w2.shape)
print("z2_d_b2 : ", z2_d_b2)
error_d_a2 : (6, 1)
a2_d_z2 : (6, 1)
z2_d_w2 : (6, 5)
z2_d_b2 : 1
Step 2: Calculate the derivative of the error with respect to the w2.
delta_w2 = error_d_a2 * a2_d_z2
delta_w2 = np.dot(z2_d_w2.T, delta_w2)
print("delta_w2 : ", delta_w2.shape)
delta_w2 : (5, 1)
Step 3: Calculate the derivative of the error with respect to the b2.
delta_b2 = error_d_a2 * a2_d_z2
delta_b2 = delta_b2 * z2_d_b2
delta_b2 = np.sum(delta_b2)
print("delta_b2 : ", delta_b2)
delta_b2 : 0.34414836643934316
Step 4: Update w2 and b2_w.
w2 = w2 - delta_w2
b2 = b2 - delta_b2
print("new w2 : ")
print(w2)
print(".....................")
print("new b2_w : ")
print(b2)
Step 5: Find derivatives of layer two.
# derivative of the z2 with respect to the a1
z2_d_a1 = w2
# derivative of the a1 with respect to the z1
a1_d_z1 = a1*(1 - a1)
# derivative of the z1 with respect to the w1
z1_d_w1 = inputs
# derivative of the z1 with respect to the b1_w
z1_d_b1_w = b1
print("z2_d_a1 : ", z2_d_a1.shape)
print("a1_d_z1 : ", a1_d_z1.shape)
print("z1_d_w1 : ", z1_d_w1.shape)
print("z1_d_b1_w : ", z1_d_b1_w)
z2_d_a1 : (5, 1)
a1_d_z1 : (6, 5)
z1_d_w1 : (6, 3)
z1_d_b1_w : 1
Step 6: Calculate the derivative of the error with respect to the w2.
delta_w1 = error_d_a2 * a2_d_z2
delta_w1 = np.dot(delta_w1,z2_d_a1.T)
delta_w1 = delta_w1 * a1_d_z1
delta_w1 = np.dot(inputs.T,delta_w1)
print("delta_w1 : ", delta_w1.shape)
delta_w1 : (3, 5)
Step 7: Calculate the derivative of the error with respect to the b1.
delta_b1 = error_d_a2 * a2_d_z2
delta_b1 = np.dot(delta_b1,z2_d_a1.T)
delta_b1 = delta_b1 * a1_d_z1
delta_b1 = delta_b1 * z1_d_b1_w
delta_b1 = np.sum(delta_b1)
print("delta_b1: ", delta_b1)
delta_b1: -0.0020394367989461973
Step 8: Update w1 and b1.
w1 = w1 - delta_w1
b1 = b1 - delta_b1
print("new w1 : ")
print(w1)
print(".....................")
print("new b1 : ")
print(b1)
Training the neural network
So far, we calculated all parameters and now we will train our neural network.
# Defining all variables
import numpy as np
error_list = list()
inputs = np.array([[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1]])
output = np.array([[0],[1],[0],[1],[0],[1]])
# Weights
w1 = np.random.randn(inputs.shape[1],5)
w2 = np.random.randn(5,output.shape[1])
# Biases
b1 = 1
b2 = 1
# Sigmoid Function
def sigmoid(x):
return 1/(1 + np.exp(-x))
# Update the weights times 100.
for i in range(100):
# Feedforward
z1 = np.dot(inputs,w1) + b1
a1 = sigmoid(z1)
z2 = np.dot(a1, w2) + b2
a2 = sigmoid(z2)
error = np.sum((1/2)*(output - a2)**2)
# Backpropagation
## LAYER 2
### derivative of the error with respect to the a2
error_d_a2 = (a2 - output)
### derivative of the a2 with respect to the z2
a2_d_z2 = a2*(1 - a2)
### derivative of the z2 with respect to the w2
z2_d_w2 = a1
### derivative of the z2 with respect to the b2_w
z2_d_b2 = b2
### delta weights 2
delta_w2 = error_d_a2 * a2_d_z2
delta_w2 = np.dot(z2_d_w2.T, delta_w2)
### delta biases
delta_b2 = error_d_a2 * a2_d_z2
delta_b2 = delta_b2 * z2_d_b2
delta_b2 = np.sum(delta_b2)
### Update weights and bias
w2 = w2 - delta_w2
b2 = b2 - delta_b2
## LAYER 1
### derivative of the z2 with respect to the a1
z2_d_a1 = w2
### derivative of the a1 with respect to the z1
a1_d_z1 = a1*(1 - a1)
### derivative of the z1 with respect to the w1
z1_d_w1 = inputs
### derivative of the z1 with respect to the b1_w
z1_d_b1_w = b1
### delta weights 1
delta_w1 = error_d_a2 * a2_d_z2
delta_w1 = np.dot(delta_w1,z2_d_a1.T)
delta_w1 = delta_w1 * a1_d_z1
delta_w1 = np.dot(inputs.T,delta_w1)
### delta bias 1
delta_b1 = error_d_a2 * a2_d_z2
delta_b1 = np.dot(delta_b1,z2_d_a1.T)
delta_b1 = delta_b1 * a1_d_z1
delta_b1 = delta_b1 * z1_d_b1_w
delta_b1 = np.sum(delta_b1)
### update w1 and b1
w1 = w1 - delta_w1
b1 = b1 - delta_b1
error_list.append(error)
print("error : ", error)
Show the Trained Data With Matplotlib Library
We trained our artificial neural network times 100. Now let’s show trained data on the screen using the python matplotlib library.
import matplotlib.pyplot as plt
x = np.arange(len(error_list))
y = error_list
plt.figure(figsize=(10,8))
plt.plot(x,y)
plt.xlabel("iteration")
plt.ylabel("error")
plt.title("Artificial Neural Networks Training")
plt.show()
