<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">20827</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2022.020827</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Deep Learning Based Audio Assistive System for Visually Impaired People</article-title>
<alt-title alt-title-type="left-running-head">Deep Learning Based Audio Assistive System for Visually Impaired People</alt-title>
<alt-title alt-title-type="right-running-head">Deep Learning Based Audio Assistive System for Visually Impaired People</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes"><name name-style="western"><surname>Devi</surname><given-names>S. Kiruthika</given-names></name><email>kiruthis2@srmist.edu.in</email>
</contrib>
<contrib id="author-2" contrib-type="author"><name name-style="western"><surname>Subalalitha</surname><given-names>C. N. </given-names></name>
</contrib>
<aff><institution>Department of Computer Science and Engineering, SRM Institute of Science and Technology</institution>, <addr-line>Kattankulathur, 603203</addr-line>, <country>India</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: S. Kiruthika Devi. Email: <email>kiruthis2@srmist.edu.in</email></corresp>
</author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2021-10-18">
<day>18</day>
<month>10</month>
<year>2021</year>
</pub-date>
<volume>71</volume>
<issue>1</issue>
<fpage>1205</fpage>
<lpage>1219</lpage>
<history>
<date date-type="received">
<day>10</day>
<month>6</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>05</day>
<month>8</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2022 Devi and Subalalitha</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Devi and Subalalitha</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_20827.pdf"></self-uri>
<abstract>
<p>Vision impairment is a latent problem that affects numerous people across the globe. Technological advancements, particularly the rise of computer processing abilities like Deep Learning (DL) models and emergence of wearables pave a way for assisting visually-impaired persons. The models developed earlier specifically for visually-impaired people work effectually on single object detection in unconstrained environment. But, in real-time scenarios, these systems are inconsistent in providing effective guidance for visually-impaired people. In addition to object detection, extra information about the location of objects in the scene is essential for visually-impaired people. Keeping this in mind, the current research work presents an Efficient Object Detection Model with Audio Assistive System (EODM-AAS) using DL-based YOLO v3 model for visually-impaired people. The aim of the research article is to construct a model that can provide a detailed description of the objects around visually-impaired people. The presented model involves a DL-based YOLO v3 model for multi-label object detection. Besides, the presented model determines the position of object in the scene and finally generates an audio signal to notify the visually-impaired people. In order to validate the detection performance of the presented method, a detailed simulation analysis was conducted on four datasets. The simulation results established that the presented model produces effectual outcome over existing methods.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Deep learning</kwd>
<kwd>visually impaired people</kwd>
<kwd>object detection</kwd>
<kwd>YOLO v3</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>In recent times, Artificial Intelligence (AI) models started yielding better outcomes in terms of voice-rich virtual candidates like Siri and Alexa [<xref ref-type="bibr" rid="ref-1">1</xref>], independent vehicles (Tesla), robotics (car manufacturing), and automated conversion (Google translator). In line with this, AI-based solutions have been introduced in assistive techniques, especially in guiding visually-impaired or blind people. Mostly, the systems mentioned above overcome the independent navigation issues with the help of portable assistive tools such as infrared sensors, ultrasound sensors, Radio Frequency Identification (RFID), Bluetooth Low Energy (BLE) beacon and cameras. Followed by, in autonomous direction, visually-impaired people require some other assistive models too. For example, computer vision methods are to be unified with Machine Learning (ML) model to provide moderate solutions for the above-defined problem. For instance, a computer vision module is projected to examine the currency with the help of Speeded-Up Robust Features (SURF) [<xref ref-type="bibr" rid="ref-2">2</xref>]. Also, the system is capable of recognizing US currencies with true recognition rate and false recognition rate. Alternatively, visually-impaired users can shop in departmental stores through prediction whereas barcode analysis provides the visually-impaired individuals with details about the product through voice communication.</p>
<p>Chen et al. [<xref ref-type="bibr" rid="ref-3">3</xref>] presented a model to guide the visually-impaired people to analyze and go through the content. In this prediction model, the candidate regions are initially predicted with a text of special statistical features. Followed by, commercial Optical Character Recognition (OCR) software is applied to examine the content present inside the candidate regions. Alternatively, the application is based on travel assistant model that predicts and examines the public transportation modes. Further, the system predicts the text writings on buses and stations and inform the visually-impaired users regarding station names, numbers, bus numbers and target location, etc. Moreover, Jia et al. [<xref ref-type="bibr" rid="ref-4">4</xref>] reported an issue in identifying the staircases within buildings and the model informs the user only if they are nearby 5 m of the staircase. It depends upon iterative preemptive Random Sample Consensus (RANSAC) method to predict the number of steps in the staircase. However, the performance of the model was determined based on the calculation of doors within the buildings and by examining general as well as stable properties of doors like edges and corners. Consequently, a model was used for predicting restroom signage based on Scale-Invariant Feature Transform (SIFT) characteristics. Object prediction and analysis are highly studied problems in computer vision applications.</p>
<p>The object prediction models [<xref ref-type="bibr" rid="ref-5">5</xref>], developed earlier, were constructed on the basis of extracting hand-engineered attributes, before implementing a classification method. Later, the reformation of Neural Networks (NN) in 2012 and the advent of advanced architectures such as Convolutional Neural Network (CNN), Region CNN (RCNN), You Only Look Once (YOLO), Single Shot Multi-Box Detector (SDD), pyramid system, and Retina-Net networks have simplified the process. In spite of high efficiency, the maximum processing costs make it difficult to implement on wearable devices. As a result, visually-impaired users make use of portable devices to predict objects. Hence, the researchers have come up with a cost-effective and efficient solution that can predict several objects. However, detecting the accurate location of objects is still a challenge.</p>
<p>In this background, the current research article presents an Efficient Object Detection Model with Audio Assistive System (EODM-AAS) using DL-based YOLO v3 model for visually-impaired people. The aim of the research article is to develop a model that can generate a comprehensive description of the objects around visually-impaired people. The presented model includes a YOLO v3 model for multi-label object detection. Also, the presented model computes the position of object in the scene and lastly, it creates an audio signal to inform the visually-impaired persons. In order to validate the detection performance of the presented model, a comprehensive simulation analysis was conducted on four datasets namely, David3, Human4, Subway and Hall Monitor.</p>
<p>Rest of the paper is organized as follows: Section 2 presents a review of state-of-the-art techniques for object detection and classification for assisting visually-impaired people. Section 3 describes the proposed EODM-AAS model and its implementation details. This is tailed by Section 4 in which the experimental analysis of the proposed model on four different datasets and comparison with other models are discussed. Finally, Section 5 contains the conclusion and future enhancement of the work.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Works</title>
<p>The challenges involved in object classification include dynamic modifications in natural scenarios and visual features of the objects (color, shape and size). If effective models are planned to be deployed for object classification, it should consider the scenarios from unseen conditions and correlate the object features as well. In this section, various state-of-the-art models for object prediction and classification, exclusively meant for helping the visually-impaired people, have been discussed. Lin et al. [<xref ref-type="bibr" rid="ref-6">6</xref>] employed FRCNN and YOLO models for real-time object detection to help the visually-impaired people. These models alert the users about the objects around them. In this work, the researchers classified the detected object as either &#x2018;normal object&#x2019; or &#x2018;emergency object&#x2019; according to the class identified and relative of the object from visually-impaired person. Accordingly, the visually-impaired persons are alerted. Furthermore, developers have extended the work using bone conduction headphones so that the users can listen the audio signals.</p>
<p>Lakshmanan et al. [<xref ref-type="bibr" rid="ref-7">7</xref>] presented a system to help the visually-impaired users by providing instructions to them with which they can perform a collision-free navigation. This model applies a prototype with Kinect sensor attached to the walking stick that predicts the velocity of moving objects using the estimated depth map. Huang et al. [<xref ref-type="bibr" rid="ref-8">8</xref>] proposed a novel approach to predict static as well as dynamic objects with the help of depth information generated by connected component model, designed based on Microsoft Kinect sensor. Static classes such as rising stairs and steep stairs are identified whereas dynamic classes are detected as dynamic and are not further classified. Poggi et al. [<xref ref-type="bibr" rid="ref-9">9</xref>] applied DL method in the classification of objects with the help of CNN.</p>
<p>Vlaminck et al. [<xref ref-type="bibr" rid="ref-10">10</xref>] utilized RGBD camera tracking method to localize the objects with the help of color and depth information. After that, object classification is carried out by obtaining geometrical features of the object. The developers focused on three classes such as staircases, room-walls and doors. Vlaminck et al. [<xref ref-type="bibr" rid="ref-11">11</xref>] applied 3D-sensors to detect the objects in indoor environment. The researchers focused on detecting four classes such as steps, wall, door and bumpy surface of the floor.</p>
<p>Hoang et al. [<xref ref-type="bibr" rid="ref-12">12</xref>] applied mobile Kinect bounded on a user&#x2019;s body. It predicts both static as well as dynamic objects and inform the same to visually-impaired people. Moreover, it predicts the people with the help of Kindest SDK sensor and considers the depth of image as input. Further, static objects such as ground and wall are forecasted using plane segmentation in this research work. A modern ultrasonic garment prototype was presented in literature [<xref ref-type="bibr" rid="ref-13">13</xref>]. It is a real-time adaptive object classifier which applies acoustic echolocations to extract the features of objects in navigational path of visually-impaired users.</p>
<p>Takizawa et al. [<xref ref-type="bibr" rid="ref-14">14</xref>] presented an object recognition model with the help of computer vision technique like edge prediction that can guide the visually-impaired users to identify the type of object. Mandhala et al. [<xref ref-type="bibr" rid="ref-15">15</xref>] proposed machine learning-based solution named clustering technique to classify the multi-class object. Bhole et al. [<xref ref-type="bibr" rid="ref-16">16</xref>], used deep learning techniques such as Single Shot Detector (SSD) and Inception V3 to classify bank concurrency notes in real-time environment. Vaidya et al. [<xref ref-type="bibr" rid="ref-17">17</xref>] presented an image processing method with machine learning approach to classify the multiclass objects.</p>
<p>When reviewing the state-of-the-art techniques proposed so far to assist the visually-impaired people, most of the models incorporated several sensors to detect the objects. Sensor-based detection techniques have their own setbacks too such as high cost, power consumption and limited accuracy. These drawbacks are experienced when it comes to object detection with respect to distance from the visually-impaired persons during their navigation. A few research works has focused on the prediction of multi-class objects using machine learning models. Those models are heavier in terms of computation and memory resource that may not be suitable for embedding with real-time assistive tools. Recent advancements in computer vision era and deep learning play an important role in the field of object detection. Though several deep learning algorithms have been proposed for object detection applications, it is still challenging to localize the object rather than recognizing it. Hence, an efficient deep learning model is the need of the hour which should be able to locate multiple&#x2013;objects, classify the multi-class objects found in the scene, should be a light weight model and should attain maximum accuracy in real-time environment at minimum time. The one stage object detector <italic>i.e</italic>., You Only Look Once (YOLO) [<xref ref-type="bibr" rid="ref-18">18</xref>] algorithm is the suitable algorithm to meet the requirement.</p>
<p>The key aim of this proposed approach is to guide the visually-impaired users in unknown places by directing them through vocal messages regarding the position of object identified in the scene and its class name. For objection detection and classification, Yolo v3 is used due to its agility in predicting real-time objects [<xref ref-type="bibr" rid="ref-18">18</xref>]. The next section describes the working process of the proposed EODM-AAS model.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>The Proposed EODM-AAS Model</title>
<p><?A3B2 "fig1",5,"anchor"?><xref ref-type="fig" rid="fig-1">Fig. 1</xref> shows the workflow of the proposed EODM-AAS model. Initially, the input video undergoes frame conversion process during when the entire video is segregated into a set of frames. Then, object detection process takes place using YOLO v3 model to identify the set of objects in the frame. Followed by, the position of the object in the scene is determined. At last, an audio signal is generated using Python package called pyttsx to notify the visually-impaired people effectually.</p>
<sec id="s3_1">
<label>3.1</label>
<title>Object Detection</title>
<p>Primarily, every frame undergoes YOLO v3-based object detection process to identify and classify multiple objects in the frame. YOLO v3, with an input image size of 416 &#x002217; 416 pixels, has been used in current study. Besides, YOLO v3 is generally trained on COCO dataset comprised of a total of 80 objects. However, in this work, YOLO v3 model was used to predict 30 different object classes connected with visually-impaired persons. YOLOv3 is a 3rd generation product of YOLO semantic segmentation model [<xref ref-type="bibr" rid="ref-19">19</xref>]. It accomplishes both classification and regression tasks by detecting the classes of objects and its location. Hence YOLO v3 is highly suitable for assisting visually-impaired persons. YOLOv3 follows the same procedure of classification and regression alike its previous versions such as YOLO v1 and v2 and YOLO9000, the variants of YOLO family. YOLO v3 imbibes most of the elements from v1 and v2. In addition to that, it also makes use of Darknet 53 [<xref ref-type="bibr" rid="ref-20">20</xref>], with convolution layer, and Resnet connections to eliminate the issue of diminishing gradients. In prediction state, FPN (Feature Pyramid Network) applies three scale feature maps in which minimum feature maps offer semantic details whereas maximum feature maps offer fine-grained details. Then, instead of SoftMax, independent multinomial logistic classification is applied in YOLOv3 structure while binary cross-entropy loss for class prediction is applied during training stage.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Work flow of EODM-AAS model</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-1.png"/>
</fig>
<p>YOLOv3 has darknet-53 feature-extraction system and YOLO prediction layer which incur lower processing costs and can be applied in embedded device platforms. Also, the actual input of darknet is 416 &#x00D7; 416 pixels. Hence, the prediction of multi-scale targets can be accomplished by generating pixel feature maps. The density of pixel grid results in limited down-sampling of the iterations which in turn activates the prediction of small targets. <?A3B2 "fig2",5,"anchor"?><xref ref-type="fig" rid="fig-2">Fig. 2</xref> shows the architecture of YOLO v3 [<xref ref-type="bibr" rid="ref-19">19</xref>].</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Architecture of YOLO v3</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-2.png"/>
</fig>
<p>YOLOv3 model assumes object prediction as a regression problem. It forecasts class probabilities as well as bounding box offsets, from complete images, using single feed forward CNN. It intends to eliminate region proposal simulation, feature resampling and summarization at every step in a single network so as to make an end-to-end prediction approach. This method classifies input image as tiny grid cells. When an intermediate portion of an object comes under a grid cell, then the grid cell is answerable for object prediction [<xref ref-type="bibr" rid="ref-21">21</xref>]. A grid cell detects the location details of <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>B</mml:mi></mml:math></inline-formula> bounding boxes and estimates the objectness values, equivalent to the bounding boxes, using <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>.</p>
<p><disp-formula id="eqn-1">
<label>(1)</label>
<mml:math id="mml-eqn-1" display="block"><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi mathvariant="italic">O</mml:mi><mml:mi mathvariant="italic">b</mml:mi><mml:mi mathvariant="italic">j</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">t</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mtext>&#xA0;</mml:mtext><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow><mml:mtext>&#xA0;</mml:mtext><mml:mi>I</mml:mi><mml:mi>O</mml:mi><mml:msubsup><mml:mi>U</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="italic">t</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">u</mml:mi><mml:mi mathvariant="italic">t</mml:mi><mml:mi mathvariant="italic">h</mml:mi></mml:mrow></mml:mrow></mml:msubsup></mml:math>
</disp-formula></p>
<p>where <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> implies the objectness value of j<sup>th</sup> bounding box in i<sup>th</sup> grid cell. <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo></mml:math></inline-formula>Object) refers to a function of object. <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mi>I</mml:mi><mml:mi>O</mml:mi><mml:msubsup><mml:mi>U</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="italic">t</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">u</mml:mi><mml:mi mathvariant="italic">t</mml:mi><mml:mi mathvariant="italic">h</mml:mi></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> indicates Intersection Over Union (IOU) from the predicted box as well as ground truth box. Also, YOLOv3 scheme applies binary cross-entropy of the examined objectness values as well as truth objectness, as the portion of loss function is depicted as follows.</p>
<p><disp-formula id="eqn-2">
<label>(2)</label>
<mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mi>S</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:munderover><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>[</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>C</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>C</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>where, <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msup><mml:mi>S</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> defines the count of grid cells, and <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mi>B</mml:mi></mml:math></inline-formula> denotes the count of bounding boxes. <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msubsup><mml:mrow><mml:mover><mml:mi>C</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> refer to the examined objectness value as well as truth objectness value, correspondingly. The location of the bounding box depends upon four predictions such as <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:math></inline-formula> <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:math></inline-formula> <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, when considering <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> as the offset of grid cell, directed from top left corner of the image. The middle portion of the bounding boxes is referred to as offset from top left corner of the image using <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and is determined as given below:</p>
<p><disp-formula id="eqn-3">
<label>(3)</label>
<mml:math id="mml-eqn-3" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math>
</disp-formula></p>
<p>where <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> denotes a sigmoid function. The width and height of the detected bounding box are evaluated by function given below.</p>
<p><disp-formula id="eqn-4">
<label>(4)</label>
<mml:math id="mml-eqn-4" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math>
</disp-formula></p>
<p>whereby <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:math></inline-formula> <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> define the width and height of a bounding box prior to what is accomplished by dimensional clustering. Ground truth box is composed of four attributes (<inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:math></inline-formula> <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:math></inline-formula> <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>) that corresponds to the detected attributes such as <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:math></inline-formula> <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:math></inline-formula> <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. According to <xref ref-type="disp-formula" rid="eqn-3">Eqs. (3)</xref> and <xref ref-type="disp-formula" rid="eqn-4">(4)</xref>, the true values of <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:math></inline-formula> <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> are determined using the <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref>:</p>
<p><disp-formula id="eqn-5">
<label>(5)</label>
<mml:math id="mml-eqn-5" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mi>&#x03C3;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mtd></mml:mtr></mml:mtable></mml:math>
</disp-formula></p>

<p>YOLOv3 model applies square error of coordinate examination as single portion of the loss function. It is illustrated as follows</p>
<p><disp-formula id="eqn-6">
<label>(6)</label>
<mml:math id="mml-eqn-6" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mi></mml:mi><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mi>S</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:munderover><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>[</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x2217;</mml:mo></mml:mtd><mml:mtd><mml:mi></mml:mi><mml:mo>+</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:msup><mml:mi>S</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:munderover><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo stretchy="false">[</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo stretchy="false">]</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
</disp-formula></p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Position Determination</title>
<p>Once YOLO v3 model identifies the objects in the frame, the next step is to determine the objects&#x2019; position. For that, every frame is segregated into 3-row X 3-column grid cell as shown in <?A3B2 "fig3",5,"anchor"?><xref ref-type="fig" rid="fig-3">Fig. 3</xref>. The whole image is divided into three positions such as top, center and bottom as row-wise and left, center and right as column-wise. After that, the central location of every bounding box is calculated based on box coordinates such as x, y, width(w) and height (h).</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Object position determination</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-3.png"/>
</fig>
<fig id="fig-13"><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-13.png"/></fig>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Audio Signal Generation</title>
<p>In audio signal generation stage, both detected object and its position in the frame are converted into an audio signal with the help of pyttsx Python library. Being a cross-platform text-to-speech conversion library, it is independent of the platform. Furthermore, a major benefit of this library is that it works offline as well. The python code snippet of pyttsx usage is given below.</p>
<fig id="fig-14"><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-14.png"/></fig>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experimental Results Analysis</title>
<p>This section discusses about the results of the detailed experimentation conducted upon EODM-AAS model when using four datasets namely, David3, Human4, Subway, and Hall Monitor [<xref ref-type="bibr" rid="ref-22">22</xref>]. The first dataset has a total of 252 frames while the second one has 667 frames and third and fourth ones have 176 and 300 frames respectively. Few details related to the dataset are given in <?A3B2 "tbl1",5,"anchor"?><xref ref-type="table" rid="table-1">Tab. 1</xref> and some of the sample test images are shown in <?A3B2 "fig4",5,"anchor"?><xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Sample frames of dataset a) David3 b) Human4 c) Subway d) Hall monitor</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-4.png"/>
</fig>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Dataset descriptions</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Dataset</th>
<th>Number of frames</th>
</tr>
</thead>
<tbody>
<tr>
<td>David3</td>
<td>252</td>
</tr>
<tr>
<td>Human4</td>
<td>667</td>
</tr>
<tr>
<td>Subway</td>
<td>176</td>
</tr>
<tr>
<td>Hall monitor</td>
<td>300</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><?A3B2 "fig5",5,"anchor"?><xref ref-type="fig" rid="fig-5">Fig. 5</xref> shows the results of the qualitative visualization analysis attained by the presented EODM-AAS model on the applied David3 dataset. <xref ref-type="fig" rid="fig-5">Fig. 5a</xref> depicts the input image whereas the output image is shown in <xref ref-type="fig" rid="fig-5">Fig. 5b</xref>. The figure infers that the proposed EODM-AAS model detected the objects as &#x2018;car&#x2019; and &#x2018;person&#x2019;.</p>
<p><?A3B2 "fig6",5,"anchor"?><xref ref-type="fig" rid="fig-6">Fig. 6</xref> shows the results of qualitative visualization analysis of the projected EODM-AAS model on the applied Subway dataset. <xref ref-type="fig" rid="fig-6">Fig. 6a</xref> showcases the input image while the output image is illustrated in <xref ref-type="fig" rid="fig-6">Fig. 6b</xref>. The figure infers that the EODM-AAS model identified the object, &#x2018;person&#x2019; proficiently.</p>
<p><?A3B2 "fig7",5,"anchor"?><xref ref-type="fig" rid="fig-7">Fig. 7</xref> portrays the results of qualitative visualization analysis of the proposed EODM-AAS model on the applied Hall monitor dataset. <xref ref-type="fig" rid="fig-7">Fig. 7a</xref> depicts the input image while the output image is shown in <xref ref-type="fig" rid="fig-7">Fig. 7b</xref>. The figure denotes that EODM-AAS model identified the object, &#x2018;suitcase&#x2019; correctly.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Visualization analysis results of EODM-AAS model on David3 dataset (a) Original image, (b) Output image</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-5.png"/>
</fig>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Visualization analysis results of EODM-AAS model on subway dataset (a) Original image, (b) Output image</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-6.png"/>
</fig>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Visualization analysis results of EODM-AAS model on hall monitor dataset (a) Original image, (b) Output image</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-7.png"/>
</fig>
<p><?A3B2 "fig8",5,"anchor"?><xref ref-type="fig" rid="fig-8">Fig. 8</xref> clearly visualizes the results obtained by the proposed EODM-AAS model on David3 dataset. From the figure, it is clear that the presented EODM-AAS model detected the objects such as &#x2018;car&#x2019; and &#x2018;person&#x2019; along with position details such as &#x201C;on your left center person&#x2019;&#x201D; and &#x201C;on your center-center car&#x201D;.</p>
<p><?A3B2 "tbl2",5,"anchor"?><xref ref-type="table" rid="table-2">Tab. 2</xref> and <?A3B2 "fig9",5,"anchor"?><xref ref-type="fig" rid="fig-9">Fig. 9</xref> demonstrate the classification results accomplished by EODM-AAS model. The experimental values infer the effectual detection of the proposed EODM-AAS model on all the applied datasets under sensitivity and specificity metrics as given in the <xref ref-type="disp-formula" rid="eqn-7">Eq. (7)</xref> and <xref ref-type="disp-formula" rid="eqn-8">(8)</xref>.</p>
<p><disp-formula id="eqn-7">
<label>(7)</label>
<mml:math id="mml-eqn-7" display="block"><mml:mtext>Sensitivity</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:math>
</disp-formula></p>
<p><disp-formula id="eqn-8">
<label>(8)</label>
<mml:math id="mml-eqn-8" display="block"><mml:mtext>Specificity</mml:mtext><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:math>
</disp-formula></p>
<p>For instance, on test David3 dataset, EODM-AAS model depicted effective detection results with sensitivity and specificity being 98.16% and 94.56% respectively.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Audio assistive system: i) On your left center person found ii) On your center-center car found</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-8.png"/>
</fig>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Analysis results of the proposed EODM-AAS model</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Dataset</th>
<th>Sensitivity</th>
<th>Specificity</th>
<th>Average</th>
</tr>
</thead>
<tbody>
<tr>
<td>David3</td>
<td>98.16</td>
<td>94.56</td>
<td>96.36</td>
</tr>
<tr>
<td>Human4</td>
<td>98.19</td>
<td>93.12</td>
<td>95.66</td>
</tr>
<tr>
<td>Subway</td>
<td>97.98</td>
<td>93.76</td>
<td>95.87</td>
</tr>
<tr>
<td>Hall monitor</td>
<td>98.34</td>
<td>93.54</td>
<td>95.94</td>
</tr>
<tr>
<td>Average</td>
<td>98.17</td>
<td>93.75</td>
<td>95.96</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Analysis results of EODM-AAS model</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-9.png"/>
</fig>
<p>Moreover, the Pre-trained ResNet model yielded reasonable results with a sensitivity of 89.54% due to skip connections among the layers. However, it is still a heavier model. In addition, Convolutional SVM Net and Fine-tuning SqueezeNet models demonstrated acceptable results <italic>i.e</italic>., sensitivity values such as 93.64% and 96.05% respectively while the prediction speed can be increased in spite of its small size.</p>
<p>Also, the Pre-trained VGG16 and Fusion using OWA models exhibited closer sensitivity values of 97.2% and 97.66% respectively. But VGG model, in spite its accuracy, took more time for training due to its heavy architecture. However, the presented EODM-AAS model displayed a better performance compared to all other methods and obtained a high sensitivity of 98.17%. It is highly suitable for real-time object detection, because of light weight structure and high prediction accuracy as shown in <?A3B2 "tbl3",5,"anchor"?><xref ref-type="table" rid="table-3">Tab. 3</xref>.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Comparison of the proposed EODM-AAS model with existing techniques [<xref ref-type="bibr" rid="ref-23">23</xref>,<xref ref-type="bibr" rid="ref-24">24</xref>]</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Methods</th>
<th>Sensitivity</th>
<th>Specificity</th>
<th>Accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td>SURF &#x002B; GPR</td>
<td>77.72</td>
<td>99.28</td>
<td>89.46</td>
</tr>
<tr>
<td>EDCS &#x002B; GPR</td>
<td>70.00</td>
<td>90.12</td>
<td>80.66</td>
</tr>
<tr>
<td>MR random projection</td>
<td>77.18</td>
<td>91.41</td>
<td>84.90</td>
</tr>
<tr>
<td>Pre-trained GoogLeNet</td>
<td>83.63</td>
<td>96.86</td>
<td>90.85</td>
</tr>
<tr>
<td>Pre-trained ResNet</td>
<td>89.54</td>
<td>96.38</td>
<td>93.56</td>
</tr>
<tr>
<td>Convolutional SVM Net</td>
<td>93.64</td>
<td>92.17</td>
<td>93.50</td>
</tr>
<tr>
<td>Fine-tuning SqueezeNet</td>
<td>96.05</td>
<td>89.14</td>
<td>93.19</td>
</tr>
<tr>
<td>Pre-trained VGG16</td>
<td>97.20</td>
<td>86.70</td>
<td>92.55</td>
</tr>
<tr>
<td>Fusion using OWA</td>
<td>97.66</td>
<td>89.86</td>
<td>94.36</td>
</tr>
<tr>
<td>Deep-MLP</td>
<td>82.00</td>
<td>89.00</td>
<td>86.10</td>
</tr>
<tr>
<td>Proposed EODM-AAS</td>
<td>98.17</td>
<td>93.75</td>
<td>96.45</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Comparative analysis of EODM-AAS model in terms of sensitivity</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-10.png"/>
</fig>
<fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Comparative analysis of EODM-AAS model in terms of specificity</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-11.png"/>
</fig>
<fig id="fig-12">
<label>Figure 12</label>
<caption>
<title>Comparative analysis of EODM-AAS model in terms of accuracy</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CMC_20827-fig-12.png"/>
</fig>
<p><?A3B2 "fig10",5,"anchor"?><xref ref-type="fig" rid="fig-10">Figs. 10</xref>&#x2013;<?A3B2 "fig12",5,"anchor"?><xref ref-type="fig" rid="fig-12">12</xref> showcase the specificity, sensitivity and accuracy analyses results of the proposed EODM-AAS model against existing methods respectively. From the above mentioned results of the analysis, it is evident that the proposed EODM-AAS technique is an effective tool over other techniques, thanks to the incorporation of YOLOv3 model in it.</p>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>The current research article introduced an effective DL-based YOLO v3 model to perform object detection process so as to assist visually-impaired people. The aim of the research article is to derive a model that can provide a detailed description of the objects around visually-impaired people. The input video is initially transformed into a set of frames. Every frame undergoes YOLO v3-based object detection process to identify and classify multiple objects in the frame. Once the YOLO v3 model identifies the objects in the frame, the next step is to determine the position of the object in the frame such as, on your left, on your right, on your center, etc. In the last stage, the detected object and its position in the frame are converted into an audio signal using pyttsx tool. In order to investigate the detection performance of the presented model, a detailed simulation analysis was performed on four datasets. The simulation outcomes inferred that the proposed method achieved better performance compared to that of the existing methods. As a part of future scope, the presented model can be implemented in real-time environment.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="other">
<p><bold>Funding Statement:</bold> The authors received no specific funding for this study.</p>
</fn>
<fn fn-type="conflict">
<p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. B.</given-names> <surname>Hoy</surname></string-name></person-group>, &#x201C;<article-title>Alexa, siri, cortana, and more: An introduction to voice assistants</article-title>,&#x201D; <source>Medical Reference Services Quarterly</source>, vol. <volume>37</volume>, no. <issue>1</issue>, pp. <fpage>81</fpage>&#x2013;<lpage>88</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F. M.</given-names> <surname>Hasanuzzaman</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Yang</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Tian</surname></string-name></person-group>, &#x201C;<article-title>Robust and effective component-based banknote recognition for the blind</article-title>,&#x201D; <source>IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews)</source>, vol. <volume>42</volume>, no. <issue>6</issue>, pp. <fpage>1021</fpage>&#x2013;<lpage>1030</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Chen</surname></string-name> and <string-name><given-names>A. L.</given-names> <surname>Yuille</surname></string-name></person-group>, &#x201C;<article-title>Detecting and reading text in natural scenes</article-title>,&#x201D; in <conf-name>Proc. of the 2004 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, 2004. CVPR 2004</conf-name>, <publisher-loc>Washington, DC, USA</publisher-loc>, vol. <volume>2</volume>, pp. <fpage>366</fpage>&#x2013;<lpage>373</lpage>, <year>2004</year>. </mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Jia</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Tang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Lik</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Lui</surname></string-name> and <string-name><given-names>W. H.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Plane-based detection of staircases using inverse depth</article-title>,&#x201D; in <conf-name>Proc. of Australasian Conf. on Robotics and Automation</conf-name>, <publisher-loc>New Zealand</publisher-loc>, <publisher-name>Victoria University of Wellington</publisher-name>, pp. <fpage>1</fpage>&#x2013;<lpage>10</lpage>, <year>2012</year>. </mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Viola</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Jones</surname></string-name></person-group>, &#x201C;<article-title>Rapid object detection using a boosted cascade of simple features</article-title>,&#x201D; in <conf-name>Proc. of the 2001 IEEE Computer Society Conf. on Computer Vision and Pattern Recognition, CVPR 2001</conf-name>, <publisher-loc>Kauai, HI, USA</publisher-loc><volume>1</volume>, pp. <fpage>511</fpage>&#x2013;<lpage>518</lpage>, <year>2001</year>. </mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B. S.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>C. C.</given-names> <surname>Lee</surname></string-name> and <string-name><given-names>P. Y.</given-names> <surname>Chiang</surname></string-name></person-group>, &#x201C;<article-title>Simple smartphone-based guiding system for visually impaired people</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>17</volume>, no. <issue>6</issue>, pp. <fpage>1371</fpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Lakshmanan</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Senthilnathan</surname></string-name></person-group>, &#x201C;<article-title>Depth map based reactive planning to aid in navigation for visually challenged</article-title>,&#x201D; in <conf-name>2016 IEEE International Conf. on Engineering and Technology (ICETECH)</conf-name>, <publisher-loc>Coimbatore, India</publisher-loc>, pp. <fpage>1229</fpage>&#x2013;<lpage>1234</lpage>, <year>2016</year>. </mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H. C.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>C. T.</given-names> <surname>Hsieh</surname></string-name> and <string-name><given-names>C. H.</given-names> <surname>Yeh</surname></string-name></person-group>, &#x201C;<article-title>An indoor obstacle detection system using depth information and region growth</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>15</volume>, no. <issue>10</issue>, pp. <fpage>27116</fpage>&#x2013;<lpage>27141</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Poggi</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Mattoccia</surname></string-name></person-group>, &#x201C;<article-title>A wearable mobility aid for the visually impaired based on embedded 3D vision and deep learning</article-title>,&#x201D; in <conf-name>2016 IEEE Symp. on Computers and Communication (ISCC)</conf-name>, <publisher-loc>Messina, Italy</publisher-loc>, pp. <fpage>208</fpage>&#x2013;<lpage>213</lpage>, <year>2016</year>. </mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Vlaminck</surname></string-name>, <string-name><given-names>L. H.</given-names> <surname>Quang</surname></string-name>, <string-name><given-names>H. V.</given-names> <surname>Nam</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Vu</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Veelaert</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Indoor assistance for visually impaired people using a RGB-D camera</article-title>,&#x201D; in <conf-name>2016 IEEE Southwest Symp. on Image Analysis and Interpretation (SSIAI)</conf-name>, <publisher-loc>Santa Fe, NM</publisher-loc>, pp. <fpage>161</fpage>&#x2013;<lpage>164</lpage>, <year>2016</year>. </mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Vlaminck</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Jovanov</surname></string-name>, <string-name><given-names>P. V.</given-names> <surname>Hese</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Goossens</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Philips</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Obstacle detection for pedestrians with a visual impairment based on 3D imaging</article-title>,&#x201D; in <conf-name>2013 Int. Conf. on 3D Imaging</conf-name>, <publisher-loc>Liege, Belgium</publisher-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>7</lpage>, <year>2013</year>. </mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V. N.</given-names> <surname>Hoang</surname></string-name>, <string-name><given-names>T. H.</given-names> <surname>Nguyen</surname></string-name>, <string-name><given-names>T. L.</given-names> <surname>Le</surname></string-name>, <string-name><given-names>T. H.</given-names> <surname>Tran</surname></string-name>, <string-name><given-names>T. P.</given-names> <surname>Vuong</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Obstacle detection and warning system for visually impaired people based on electrode matrix and mobile kinect</article-title>,&#x201D; <source>Vietnam Journal of Computer Science</source>, vol. <volume>4</volume>, no. <issue>2</issue>, pp. <fpage>71</fpage>&#x2013;<lpage>83</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>D. Y. K.</given-names> <surname>Sampath</surname></string-name> and <string-name><given-names>G. D. S. P.</given-names> <surname>Wimalarathne</surname></string-name></person-group>, &#x201C;<article-title>Obstacle classification through acoustic echolocation</article-title>,&#x201D; in <conf-name>2015 Int. Conf. on Estimation, Detection and Information Fusion (ICEDIF)</conf-name>, <publisher-loc>Harbin, China</publisher-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>7</lpage>, <year>2015</year>. </mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Takizawa</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Yamaguchi</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Aoyagi</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Ezaki</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Mizuno</surname></string-name></person-group>, &#x201C;<article-title>Kinect cane: Object recognition aids for the visually impaired</article-title>,&#x201D; in <conf-name>2013 6th Int. Conf. on Human System Interactions (HSI)</conf-name>, <publisher-loc>Sopot, Poland</publisher-loc>, pp. <fpage>473</fpage>&#x2013;<lpage>478</lpage>, <year>2013</year>. </mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V. N.</given-names> <surname>Mandhala</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Bhattacharyya</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Vamsi</surname></string-name> and <string-name><given-names>N.</given-names> <surname>T. Rao</surname></string-name></person-group>, &#x201C;<article-title>Object detection using machine learning for visually impaired people</article-title>,&#x201D; <source>International Journal of Current Research and Review</source>, vol. <volume>12</volume>, no. <issue>20</issue>, pp. <fpage>157</fpage>&#x2013;<lpage>167</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Bhole</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Dhok</surname></string-name></person-group>, &#x201C;<article-title>Deep learning based object detection and recognition framework for the visually-impaired</article-title>,&#x201D; in <conf-name>2020 Fourth Int. Conf. on Computing Methodologies and Communication (ICCMC)</conf-name>, <publisher-loc>Erode, India</publisher-loc>, pp. <fpage>725</fpage>&#x2013;<lpage>728</lpage>, <year>2020</year>. </mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Vaidya</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Shah</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Shah</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Shankarmani</surname></string-name></person-group>, &#x201C;<article-title>Real-time object detection for visually challenged people</article-title>,&#x201D; in <conf-name>2020 4th Int. Conf. on Intelligent Computing and Control Systems (ICICCS)</conf-name>, <publisher-loc>Madurai, India</publisher-loc>, pp. <fpage>311</fpage>&#x2013;<lpage>316</lpage>, <year>2020</year>. </mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhong</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>SlimYOLOv3: Narrower, faster and better for real-time uav applications</article-title>,&#x201D; in <conf-name>2019 IEEE/CVF Int. Conf. on Computer Vision Workshop (ICCVW), Seoul, Korea (South)</conf-name>, pp. <fpage>37</fpage>&#x2013;<lpage>45</lpage>, <year>2019</year>. </mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Bi</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Wang</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Deep learning approach to peripheral leukocyte recognition</article-title>,&#x201D; <source>PLOS One</source>, vol. <volume>14</volume>, no. <issue>6</issue>, pp. <fpage>e0218808</fpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>J. R.</given-names> <surname>Darknet</surname></string-name></person-group>, &#x201C;<article-title>Open source neural networks in c</article-title>,&#x201D; <year>2016</year>. [Online]. Available: <uri>http://pjreddie.com/darknet/</uri>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Zhao</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Object detection algorithm based on improved YOLOv3</article-title>,&#x201D; <source>Electronics</source>, vol. <volume>9</volume>, no. <issue>3</issue>, pp. <fpage>537</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="other">&#x201C;<article-title>Dataset</article-title>,&#x201D; <year>2021</year>. [Online]. Available: <uri>http://cvlab.hanyang.ac.kr/trackerbenchmark/datasets.html</uri>. (Accessed on Feb 10, 2021)</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Alhichri</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Bazi</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Alajlan</surname></string-name></person-group>, &#x201C;<article-title>Assisting the visually impaired in multi-object scene description using OWA-based fusion of CNN models</article-title>,&#x201D; <source>Arabian Journal for Science and Engineering</source>, vol. <volume>45</volume>, no. <issue>12</issue>, pp. <fpage>10511</fpage>&#x2013;<lpage>10527</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. K.</given-names> <surname>Jarraya</surname></string-name>, <string-name><given-names>W. S.</given-names> <surname>Al-Shehri</surname></string-name> and <string-name><given-names>M. S.</given-names> <surname>Ali</surname></string-name></person-group>, &#x201C;<article-title>Deep multi-layer perceptron-based obstacle classification method from partial visual information: Application to the assistance of visually impaired people</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>8</volume>, pp. <fpage>26612</fpage>&#x2013;<lpage>26622</lpage>, <year>2020</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>
