Artificial neural networks (ANNs) are one of the hottest topics in computer science and artificial intelligence due to their potential and advantages in analyzing real-world problems in various disciplines, including but not limited to physics, biology, chemistry, and engineering. However, ANNs lack several key characteristics of biological neural networks, such as sparsity, scale-freeness, and small-worldness. The concept of sparse and scale-free neural networks has been introduced to fill this gap. Network sparsity is implemented by removing weak weights between neurons during the learning process and replacing them with random weights. When the network is initialized, the neural network is fully connected, which means the number of weights is four times the number of neurons. In this study, considering that a biological neural network has some degree of initial sparsity, we design an ANN with a prescribed level of initial sparsity. The neural network is tested on handwritten digits, Arabic characters, CIFAR-10, and Reuters newswire topics. Simulations show that it is possible to reduce the number of weights by up to 50% without losing prediction accuracy. Moreover, in both cases, the testing time is dramatically reduced compared with fully connected ANNs.

The powerful tools of artificial intelligence are increasingly attractive for the analysis of real-life problems. Due to the proven efficiency of deep learning, artificial neural networks (ANNs) are the most frequently used strategy, with substantial applications and results in areas such as physics [

To resolve this issue, various fast and efficient learning strategies have been developed, including the scaled [

A novel method inspired by biological neural networks was proposed by Le Cun et al. [

A similar approach [

This study aims to develop the idea of sparse connectivity by training networks with an initially reduced number of weights. Unlike Mocanu et al. [

The rest of the paper is organized as follows. Section 3 presents the details and pseudocode of the algorithm. In Section 4, we apply TIS to the recognition of handwritten digits and Arabic characters and document the performance advantages of ANNs with reduced connectivity over the fully connected network. In Section 5, we compare TIS with similar methods. Conclusions and future research directions are discussed in Section 6.

Multi-layer neural networks have received significant attention and wide interest from scholars and scientists, as they are a powerful tool and efficient method in many fields, such as image processing [

Yu et al. [

Recently, researchers used a metaheuristics algorithm in a multi-layer perceptron (MLP). Metaheuristics include the lightning search algorithm [

TIS is a training algorithm for an ANN with reduced connectivity. By reducing the number of weights, we aim to improve the computational effectiveness of the testing (validation) process compared with a fully connected ANN. The reduction of weights should not substantially affect the accuracy. Thus, we seek faster validation without reducing accuracy and possibly improving it.

To illustrate, we construct a standard neural network with two hidden layers, where

The training process has two steps: forward and backward propagation. Forward propagation can be described as follows. Given an input vector

whose responses for the first and second hidden layers will be

The real output

In the BP stage, the standard quadratic error function is considered:

The gradient of

At the NN initialization step, random weights and biases are set. A prescribed level of initial sparsity is set by randomly removing the corresponding portions of the weights. Standard gradient descent is then used to update the remaining weights and biases according to the iterative procedure defined by

where

At each epoch from the first to the last, after updating all weights, the weak weights (those with absolute values close to zero) are removed and replaced by random weights. At the last epoch, weak weights are removed and not replaced. The pseudocode of this process is presented as

As a simple application of the algorithm, we compare the performance of ANNs with reduced connectivity to that of fully connected ANNs. The algorithm was implemented in MATLAB and run on an Intel Core i7-4700MQ CPU @ 2.40 GHz (8 CPUs).

First, we consider the publicly accessible MNIST dataset of handwritten digits, which contains 6·10^{4} samples for learning and 10^{4} for testing^{1}

By experimenting with different levels of initial sparsity, we observed that it is possible to achieve accuracy comparable to that of a corresponding fully connected NN even with substantially fewer weights.

Initial sparsity (%) | Total sparsity (%) | Accuracy (%) |
---|---|---|

0 | 93.6 | |

10 | 94.58 | |

14.5 | 94.68 | |

19.0 | 94.07 | |

23.50 | 94.52 | |

27.99 | 94.61 | |

32.50 | 94.76 | |

45.99 | 95.04 | |

54.99 | 94.25 |

Initial sparsity (%) | ^{I} |
^{H} |
^{O} |
Total connections |
---|---|---|---|---|

12544 | 8000 | 5000 | 25544 | |

10726 | 6840 | 4275 | 21841 | |

9597 | 6120 | 3825 | 19542 | |

8468 | 5400 | 3375 | 17243 | |

5646 | 3600 | 2251 | 11497 |

However, as expected, higher levels of initial sparsity affect convergence. As

The connectivity of visible neurons is plotted at different epochs of the training process in

The TIS algorithm was also applied to analyze the publicly accessible Arabic Handwritten Characters Dataset,^{2}

In this case also, initially sparse NNs provided accuracies comparable to those of fully connected NNs even with substantially fewer weights. It can be seen from

Initial sparsity (%) | Total sparsity (%) | Accuracy (%) |
---|---|---|

0 | 75.71 | |

14.5 | 76.31 | |

23.50 | 76.73 | |

32.50 | 75.92 | |

55 | 75.18 |

The number of remaining weights for different levels of initial sparsity is presented in

Initial sparsity (%) | ^{I} |
^{H} |
^{O} |
Total connections |
---|---|---|---|---|

256,000 | 62,500 | 7,000 | 325,500 | |

218,880 | 53,438 | 5,985 | 278,303 | |

195,840 | 47,813 | 5,355 | 249,008 | |

172,800 | 42,188 | 4,725 | 219,713 | |

115,200 | 28,125 | 3,151 | 146,476 |

The connectivity of visible neurons at different epochs of the training process is plotted for this case in

A straightforward consequence of reduced weights is the dramatic reduction of the time spent testing the test samples (see

It is easy to see that our algorithm does not depend on the choice of the BP algorithm.

As mentioned above, we chose the simplest training algorithm for BP, namely, gradient descent. We compare the validation time versus initial sparsity with some other BP algorithms. These are the well-known conjugate gradient, quasi-Newton, and Levenberg–Marquardt algorithms (see

Dataset | CG | Grad. Desc. | q.-Newton | Levenberg–Marquardt |
---|---|---|---|---|

MNIST | 94.25% | 94.8% | 94.85% | 95.3% |

HAC | 75.18% | 75.46% | 75.8% | 76.5% |

We compare TIS with the SET [^{3}

Open-source implementations are freely available for both SET^{4}^{5}

We compare the performance of SET, DO, and TIS on the MNIST dataset considered in Section 3.1. For TIS with 50% initial sparsity (

We compare the performance of TIS, SET, and DO on the CIFAR-10 dataset.

We compare the performance of SET, DO, and TIS on the RNT dataset. For TIS with 50% initial sparsity (

As artificial neurons are designed to mimic the functioning of biological neurons, it is natural to expect that artificial neural networks should possess the key features of biological neural networks, which would lead to efficient learning. Features reported to have a significant impact on learning efficiency include sparsity [

In this study, we introduced the concept of initial sparsity, that is, the ANN is assumed to be sparse at the initial step, with the possibility to prescribe the level of initial sparsity. At each training epoch, weights that are close to zero in absolute value are removed, and random weights are added (see

The proposed method was also compared with other similar methods, namely, SET and DO. An analysis was carried out on the MNIST, CIFAR-10, and RNT datasets. The analysis showed that TIS outperforms both SET and DO in accuracy and convergence rate. These observations apply to the four tested BP algorithms: conjugate gradient, gradient descent, quasi-Newton, and Levenberg–Marquardt.

These observations motivate us to improve the general algorithm, which will be a focus of future work. In this study, we used gradient descent, one of the simplest BP methods, for error minimization. A priority of future work will be to implement the developed algorithm with more advanced minimization strategies, such as the modified conjugate gradient descent and distributed Newton methods, combined with a more efficient line search strategy. We also intend to test the algorithm on some variants of convolutional neural networks. Another challenging problem is the optimal choice of the level of initial sparsity in the context of the network structure and the particular dataset.

I express my gratitude to King Khalid University, Saudi Arabia, for administrative and technical support.

^{1}

^{2}

^{3}

^{4}

^{5}