Even though several advances have been made in recent years, handwritten script recognition is still a challenging task in the pattern recognition domain. This field has gained much interest lately due to its diverse application potentials. Nowadays, different methods are available for automatic script recognition. Among most of the reported script recognition techniques, deep neural networks have achieved impressive results and outperformed the classical machine learning algorithms. However, the process of designing such networks right from scratch intuitively appears to incur a significant amount of trial and error, which renders them unfeasible. This approach often requires manual intervention with domain expertise which consumes substantial time and computational resources. To alleviate this shortcoming, this paper proposes a new neural architecture search approach based on metaheuristic quantum particle swarm optimization (QPSO), which is capable of automatically evolving the meaningful convolutional neural network (CNN) topologies. The computational experiments have been conducted on eight different datasets belonging to three popular Indic scripts, namely Bangla, Devanagari, and Dogri, consisting of handwritten characters and digits. Empirically, the results imply that the proposed QPSOCNN algorithm outperforms the classical and stateoftheart methods with faster prediction and higher accuracy.
With the rapid development and exponential usage of imaging technology, digital cameras, and other intelligent devices, the need for automatic character recognition in document images has drawn the attention of many researchers in this domain. Extensive comprehensive research work is available on the printed character recognition, and the recognition accuracy of printed characters has been potentially considered as a solved problem. However, the recognition of handwritten characters is still a challenging task in the field of pattern recognition. The challenging part of handwritten character recognition is the diversity in individual writing styles, patterns, size, and thickness of characters [
Deep neural networks (DNN) have demonstrated their remarkable performance in recent years to solve pattern recognition problems [
The rest of this article is organized as follows. Section 2 briefly describes the background of CNN, Binary Particle Swarm Optimization (BPSO), Quantum Computing (QC) and summarizes the related work. Section 3 delineates the proposed algorithm. Section 4 outlines the experimental design, which includes a brief description of the datasets, algorithm parameters, and implementation details. Section 5 presents the computational results and compares the performance of the proposed algorithm with the traditional stateoftheart techniques. Finally, Section 6 concludes the article.
CNN was firstly introduced by LeCun et al. [
The BPSO suggested by Kennedy et al. [
If
The concept of QC is evolved from Quantum Physics. A quantum bit (Qbit) is the smallest amount of information in QC [
A Qbit is denoted with a pair of complex numbers
In literature, neuroevolution during its early inception has been applied to encode connection weights and topologies of artificial neural networks (ANN). Stanley et al. [
In this section, we describe the proposed quantum particle swarm optimization based convolutional neural network (QPSOCNN) technique in detail. To be specific, the framework of the proposed QPSOCNN is elucidated in Section 3.1.
The proposed QPSOCNN algorithm applies quantum PSO, to automatically evolve meaningful CNN architectures. Algorithm 1 manifests the overall framework of the proposed technique. Firstly, the algorithm is initiated by randomly initializing the position corresponding to each particle and quantum bit (Qbit) individuals. After this, the evolution of particles will start to take effect before the termination conditions, for instance, the given number of iterations, are met. Lastly, the global best solution is picked up and decoded into the corresponding CNN architecture for the final deep training in order to perform the jobs at hand. During the evolution, the evaluation of the particles is performed, and the corresponding recognition accuracy is employed as a fitness measure of individual particles. Consequently, the
In the proposed QPSOCNN, a binary encoding strategy is used to encode the potential CNN architectures into particle vectors. Each particle is composed of a different number of convolutional, pooling, and fullyconnected layers. Therefore, these layers should be encoded in a single particle vector for the evolution process to proceed further. Each particle vector with D dimensions accommodates the details about CNN layers. To be more specific, in the binary encoding scheme, the particle vector consists of x number of fixedlength binary strings where each string represents the configuration of a single CNN layer, i.e., the layer parameters. The parameters corresponding to the convolutional layer are Kernel size, stride size, and number of feature maps. Secondly, parameters corresponding to the pooling layer are pooling window size, stride size and pooling type (maximal pooling or average pooling). Finally, parameters corresponding to the fullyconnected layer are the number of neurons.
Depending on the chosen benchmark datasets size and conventions used in the traditional deep learning community, the range for all the parameters is elucidated in
Type of layers  Parameters  Ranges  Number of bits  Examples 

Convolutional layer  Kernel size  [2,7]  3  011 
No. of feature maps  [3,256]  8  00111111  
Stride size  [1,2]  1  1  
Summary      12  011001111111 
Pooling layer  Window size  [2,4]  2  10 
Stride size  [2,4]  2  10  
Pooling Type  [1,2]  1  1  
Summary      5  10101 
Fullyconnected layer  Number of neurons  [1,1024]  10  1111111111 
Summary      10  1111111111 
Disabled layer      12  000000000000 
Summary      12  000000000000 
Since the maximum number of bits used to encode a single layer is 12; therefore, for each layer, the binary string is filled with zeros till the length approaches 12 bits, as illustrated in
Type of layers  Encoded information 

Convolutional  011001111111 
Pooling  000000010101 
Fullyconnected  001111111111 
Disabled  000000000000 
The initialization of the swarm begins with creating individual particles based on the precedent encoding strategy until the predetermined population size is reached. This process will generate
Once the position of the particles is obtained, the fitness evaluation is performed by training the particles representing fullfledged CNN architectures on the training dataset (
The proposed QPSOCNN algorithm uses a rotation operator to update the Qbit individual. The rotation operator employs a rotation angle (
Finally, the rotation operator is applied to update the
The position vector of every
After the evolution of QPSOCNN is completed, the global best solution obtained by picking the best particle out of all the particles in the swarm is selected for the final deep training. The deep training process is similar to the fitness evaluation process discussed in Section 3.4, apart from the fact that a substantially huge number of epochs are employed for training the optimal CNN architecture, for example, 100 or 200.
The designed QPSOCNN algorithm has been evaluated on benchmark handwritten character datasets belonging to three popular Indic scripts (Devanagari, Bangla, and Dogri). These scripts are genealogically different from each other and highly used by the majority of people in India [
The original handwritten character images in the datasets are not normalized to a uniform size and have numerous pixel resolutions. Therefore, for training and testing, some preprocessing steps are applied to the datasets. The grayscale isolated handwritten character images are first transformed into subsequent binary format. Then the handwritten character images for datasets (D2, D3, D4, and D7) are normalized to a size of 32 × 32 pixels with the aspect ratios preserved.
Index  Dataset Name [Reference]  Script  Category  Images  

Training  Test  
D1  CMATERdb 3.1.1 [ 
Bangla  Digits  4,000  2,000 
D2  CMATERdb 3.1.2 [ 
Bangla  Basic  12,000  3,000 
D3  CMATERdb 3.1.3 [ 
Bangla  Compound  34,439  8,520 
D4  BanglaLekhaIso [ 
Bangla  Digits+Basic+ Compound  1,32,884  33,221 
D5  CMATERdb 3.2.1 [ 
Devanagari  Digits  2,000  1,000 
D6  DHCD [ 
Devanagari  Digits+Basic  78,200  13,800 
D7  DOGRAC64  Dogra  Digits+Basic+ Modified  8,192  2,048 
D8  MNIST [ 
  Digits  60,000  10,000 
In the proposed QPSOCNN algorithm, the parameters are primarily classified into three groups, i.e., parameters related to QPSO, CNN initialization, and CNN training. The parameters used in the present investigation have been compiled in
Parameters  Values 

Swarm size ( 
30 
Number of iterations to find 
30 
Minimum magnitude of rotation angle 

Maximum magnitude of rotation angle 

Initial values of 

Minimum number of layers  3 
Maximum number of layers  20 
Minimum number of feature maps  32 
Maximum number of feature maps  256 
Minimum size of convolutional filter  3 × 3 
Maximum size of convolutional filter  5 × 5 
Convolutional filter stride  (1, 1) 
Pooling window size  2 × 2 
Pooling window stride  (2, 2) 
Learning rate  0.001 
Dropout rate  0.5 
Batch size  50 
Number of training epochs for evaluating the particles  2 
Number of epochs for training 
100 
The parameters associated with the second group regulate the diversity of initial particles’ architectures. The second group includes seven parameters, namely, the minimum number of feature maps, the maximum number of feature maps, the minimum size of a convolutional filter, the maximum size of a convolutional filter, the convolutional filter stride size, the pooling window size, and the pooling window stride size. The improper setting of parameters in the convolutional and pooling layers would make the CNN architecture incompetent and result in unaffordable computational costs. Therefore, according to the deep learning conventions, during the exploration of each particle, the minimum and the maximum number of feature maps is set as [32,256]. On the basis of stateoftheart CNNs conventions, squared convolutional filters are used with a filter size ranging from 3*3 to 5*5. In the convolutional layer, the (width, height) of stride is taken as (1, 1). Analogously, the pooling layer uses squared kernels with a pooling window of size 2*2. The (width, height) of stride in the pooling layer is taken to be (2, 2).
Finally, the parameters associated with the third group regulate the training process of each particle in the swarm. The third group includes five parameters, viz, the training epochs for particle evaluation, learning rate, dropout rate, batch size, and number of epochs for training optimal CNN architecture (
The experiments on the proposed QPSOCNN model have been performed using a Nvidia Tesla V100 GPU with 16 GB of memory and Ubuntu 16.04.6 LTS operating system. Due to the stochastic nature of the proposed QPSOCNN algorithm, 10 independent experimental runs are conducted on each handwritten dataset in order to maintain the consistency in the results.
The overall recognition performance of the proposed QPSOCNN algorithm in terms of the best recognition accuracies and the mean recognition accuracies obtained from 10 independent experimental runs on each chosen benchmark dataset is outlined in
Algorithms  Test accuracies (%)  

D1  D2  D3  D4  D5  D6  D7  D8  
QPSOCNN (best)  98.95  99.16  98.39  98.77  99.60  99.49  97.07  99.69 
QPSOCNN (mean)  98.51  98.63  98.13  97.24  99.18  99.25  95.94  99.56 
In this section, to demonstrate the effectiveness of the proposed algorithm, the overall performance has been compared with the conventional techniques that are widely used for handwritten Indic scripts recognition. The existing stateoftheart techniques that have claimed propitious recognition accuracy on the chosen benchmark Indic script datasets are considered as the peer competitors. The evaluation results of the QPSOCNN algorithm, along with the existing peer competitors on all the benchmark datasets, have been compiled in
Indices  Datasets  Work references  Recognition accuracies (%) 

D1  Bangla Digit  Keserwani et al. [ 
98.80 
CMATERdb 3.1.1  Gupta et al. [ 
97.33  
Dash et al. [ 
98.44  
D2  Bangla Basic 
Keserwani et al. [ 
98.56 
Alom et al. [ 
98.31  
Dash et al. [ 
94.78  
Gupta et al. [ 
86.10  
D3  Bangla Compound 
Keserwani et al. [ 
95.70 
Sarkhel et al. [ 
98.12  
Kibria et al. [ 
88.73  
D4  Bangla Digits+ 
Chatterjee et al. [ 
97.12 
Rabby et al. [ 
95.71  
Alif et al. [ 
95.10  
D5  Devanagari Digits 
Sarkhel et al. [ 
99.50 
Tushar et al. [ 
99.26  
Dash et al. [ 
98.96  
D6  Devanagari Basic 
Mhapsekar et al. [ 
99.35 
Aneja et al. [ 
99.00  
Gupta et al. [ 
94.15  
D8  MNIST  Wang et al. [ 
98.87 
Digits  Sun et al. [ 
99.51  
Gupta et al. [ 
98.92  
Chen et al. [ 
99.48  
Sun et al. [ 
98.82  
For the Bangla script datasets, viz, D1, D2, D3, and D4, the deep learning based techniques used to compare our results are unified CNN [
In this section, we show some typical examples of misclassified samples on each chosen benchmark dataset, as delineated in
From the experimental analysis, we have noticed that there are two prominent reasons that contribute to the failure cases for handwritten Indic scripts recognition.
The main reason is the confusion of similar shaped characters, i.e., some characters’ pairs are written in such a manner that they have similar structural constructs, which are quite challenging to recognize even for humans.
Also, we observed that it is difficult to correctly identify the cursive and nonstandard writing habits in hooks and circles. Furthermore, the erroneous characters’ samples, which are either degraded or severely polluted, have broken architecture, brought great ambiguities, and directly led to misclassification.
The experimental results in this paper clearly indicate that the metaheuristic evolutionary approach is proven to be feasible in designing the promising CNN architectures for handwritten Indic scripts recognition. The proposed algorithm provides competitive performance without using complicated architectures and data augmentation, and the results are comparable to the existing handcrafted models. In the past, most of the works have contributed handdesigned architectures for CNNs, explicitly designed for solving a particular problem. This process of manually creating the architecture is expensive and entails a significant amount of trial and error in determining the solution quality. Therefore, this proposed neuroevolution approach is way more robust and simpler than the existing stateoftheart techniques.
The proposed algorithm integrates the traditional PSO with the principles and ideology of quantum computing. Unlike classical computing, in which a bit may exist in either state 0 or state 1, in quantum computing, the Qbit may exist in state 0, state 1, or superposition two states. This ability of quantum computing to have more than two states contributes to a better and faster exploration and exploitation of search space. Since the exploration and the exploration should complement each other, so the appropriate tuning of them could improve the performance. In this regard, the rotation angle is introduced for updating the position of particles. The proper selection of rotation angle controls and upholds a good balance between exploitation and exploration of the search space and obtain the competing solutions with shorter computation time and smaller swarm size. In consequence, this approach remarkably promotes the computation efficiency of the proposed algorithm. Furthermore, QPSO with Ddimensional Qbit representation provides a better population diversity as it covers the search space faster than the traditional PSO. Thus, quantum computing supplements much more to the performance of PSO, which further intensifies the efficacy of the technique. In addition, the training curves of the global best solution representing the optimal CNN topology for dataset D3 are illustrated in
Moreover, in the existing neural architecture search approaches, the final recognition accuracy is considered as a fitness measure while evaluating the particles. The final recognition accuracy usually needs a large number of training epochs. So, this process will eventually take a considerable amount of time. Therefore, to design the complete architecture with the above fitness evaluation plan, it is essential to exercise a significant number of computational resources for speeding up the process. Additionally, this process also demands further professional assistance, for example, task scheduling and synchronization, which is far from the expertise of most of the researchers. Hence, during the evolution, the particles do not need to check the final recognition accuracy; however, it is sufficient to predict the tendency that could reveal the future quality of the solution. In this context, the particles in the proposed scheme are trained with small numbers of epochs during the evaluation. In summary, it concludes that the proposed technique with the simplistic fitness evaluation scheme and welldesigned encoding strategies lends the researchers to discover potential CNN architectures without prior domain knowledge.
In this paper, a QPSOCNN algorithm has been proposed for the recognition of handwritten Indic scripts. The proposed hybrid neuroevolutionary approach integrates particle swarm optimization with the concept of quantum computing to automatically evolve promising CNN architectures. The QPSO has a different operational procedure and is an amended version of conventional PSO. It is strengthened via an additional operator, i.e., the rotation angle. The proper selection of rotation angle controls and upholds a good balance between exploitation and exploration of the search space and obtain the competing solutions, even with a smaller swarm size. Also, we deduce that with the effective use of heuristics, the proposed algorithm avoids wasting too much computational time in vain search and hence provides an enhanced searching efficiency. The superiority of the proposed QPSOCNN algorithm has been evaluated on a variety of Indic script datasets. The comprehensive experimental results demonstrate that the proposed algorithm performs significantly better than the existing stateoftheart techniques.