Human Verification over Activity Analysis via Deep Data Mining

,


Introduction
The modern age is characterized by innovation in a wide variety of fields, including information systems, machine learning, smart and intelligent systems, prediction and estimation-based frameworks, and automation.These fields provide opportunities for the sustainable development of advanced systems and intelligent tools for gathering data, from conventional camera systems to motion-based detectors.These intelligent technologies allow us to explore ideas in a variety of disciplines.One avenue of exploration is to discover and evaluate human verification technologies In this work, we designed an effective approach for human identification and verification as well as human activity analysis in various indoor and outdoor settings.Primarily, we used indoor and outdoor video-based data as input to the proposed research approach.After pre-processing human shapes, we perform human verification and human activity analysis.For activity analysis, we must extract the context of intelligent features over the video-based dataset.To deal with the associated computational costs, we used a deep features mining approach via graph mining.Then, to analyze the human activity, we adopted a machine learning-based AdaBoost algorithm.We used two publicly-accessible datasets, U-T interaction and the Sports Videos in the Wild dataset.Fig. 1 shows the overall description and architecture of our study.
Using these two datasets, we achieved a significantly higher recognition rate than the other stateof-the-art techniques.The following is an overview of the key contributions and improvements in this study: • We developed a comprehensive strategy for human verification and activity analysis using indoor and outdoor video-based data as well as various human involvement situations.• We examined two human detection and verification algorithms to obtain more accurate, robust results; this is the primary consideration of several useful applications.The proposed technique helps us to get accurate information regarding human activity prediction.• Context intelligent features are adopted for human verification and activity analysis.Furthermore, we used the deep features mining approach via a graph mining algorithm and activity prediction using an AdaBoost algorithm.• The performance and effectiveness of the proposed system are illustrated through experimental observations over two publicly-accessible datasets.This shows that our research has significantly outperformed existing state-of-the-art methods.The remainder of our work is organized as follows: Section 2 discusses the related and previous work.Section 3 shows the flow design of the proposed approach, which consists of pre-processing, human detection, feature extraction, and deep features mining via graph mining.And classification using the AdaBoost classification algorithm.Section 4 describes the comprehensive analysis and evaluation of two state-of-the-art datasets, such as the Sports Video in the Wild and U-T interaction.After this.The detailed comparison existing system.Finally, Section 6 shows the paper's conclusion, limitations and future directions.

Objective 1: Background Research and Literature Review Study
Developments in cell phone cameras and live streams, as well as advancements in object indicator motion gadgets, allow for improved data farming and collection while numerous research institutions focus on extracting features and human action recognition studies [11].
Wang et al. [12] recently developed a novel human activity analysis (HAA) approach for analyzing labelled statistics.The proposed technique has modified the current convolution neural network HAA by comparing the retrieved feature selection method.Through valuing consistency, the attentionoriented network design optimizes vital information while setting aside highly redundant and contradictory data.This HAA strategy is characterized by deep convolutional long short-term memory (LSTM) and convolution neural networks.
The results illustrated an improved performance, though on inaccurately specific data.The methodology has aided the technique of data stream categorization and used the simplest statistics.In [13], researchers created a compact strategic plan predicated on efficient allocation, illumination changes, and obtained image feature statistics.The researchers successfully accomplished human activity recognition and analysis via the conventional optimization procedure, body part identification, and compressed coefficient dictionary learning technique.Zhou et al. [14] recently introduced a novel HAA predicated model on a Bayesian convolution network (BCN), which allows every system to access data using either low-power back propagation connections or traditional radio frequency (RF) connectivity.Convolution layers are responsible for extracting the features.An autonomous decoder-a typical deep net classification-was added to improve its accuracy.In addition, the Bayesian network classified the security risks using the enhanced deep learning (EDL) framework and an efficient offloading strategy.The results indicated that the data was susceptible to multiple forms of ambiguity, such as cognitive ambiguity, which is referred to as durability and noise.
In [15], researchers expanded the computational infrastructure to support volumetric structures.Informed by learning psychology, the research identifies foreground patches as "key components" of the framework and asserts that they include abundant and distinct spatial features.Newell et al. [16] created an architecture resembling an hourglass for activity detection and appended a supervisory output to its base.The single individual posture problem identifies a single human stance with a basic environment and minimal distraction.Proposed techniques for estimating the posture of a single individual have had response and validity above 93%.Meanwhile, the majority of images contain numerous humans, making the single-person pose estimate methodology ineffective.Chen et al. [17] created a cascaded pyramid system to estimate the human activity of numerous individuals using a regression model and modification.The leading multi-person activity estimation algorithm subdivides the HAA and verification problem into several HAA and verification problems, which are then analyzed by object tracking and single-person identification and verification.This method is straightforward and reliable.However, its efficiency depends on the outcomes of object identification.Additionally, multiple people standings increase the diffraction issues.
Einfalt et al. [18] developed methods for predicting activities in the movement of sports players using multiple processes that extract 20 sequential posture frameworks from data that contains videos and sequential images.Considering translation activity classifications, researchers developed a neural sequence architecture for exact action analysis and recognition.Rado et al. [19] built a focused attentiveness (LSTM) framework that extracts CNN-based attributes and chronological positioning from challenging video sequences.To identify humans in images and video-based datasets, the YOLO v3 technique was formulated, whereas an LSTM-based technique was applied to identify anomalies.This research adopts supervised learning, convolutional neural network (CNN) techniques, or an insufficient number of attributes in multimedia databases to execute these methodologies.Franklin et al. [20] designed a complete deep learning system for classifying anomalous and routine activities.They utilized reduction, clustering, and graph-based methodologies to achieve relevant results.Through the deep learning method, the authors identified both normal and abnormal activity duration parameters.Additionally, Mishra et al. [21] propose a fractional derivative S-transform oriented feature-extraction and linear discriminant analysis (LDA) oriented feature reduction procedure.In addition, researchers have applied the AdaBoost method with random forests to the adequate detection of human activity.
Most of the proposed frameworks utilize supervised learning; fragmentation also plays a crucial part in the classification results.However, these procedures required that the humans and cells remain effectively segregated from input data.
Ghadi et al. [22] established a method for generating video characterization by combining a 3D CNN algorithm and LSTM-based decoder.They integrate the focus on visuals by establishing the likelihood function over the images, which is used to generate the actual words.Generally, studying the structure of a deep neural network-sometimes called a black box is challenging.Therefore, video feature descriptors acquire the model's focal point to improve the number of possible readings.In [23], the researcher described integrating movement energy projection and gait energy mapping to describe the action of the human body and, afterward, correlating the temporal pattern with the reference sequence.This technique effectively handles the given input data accurately and effectively, which has a negligible impact on the HAA.In another study [24], the author proposed an active learner incorporating Local Directional Pattern (LDP) as the representation to give the programmer more control over the feature extraction model.Additionally, the LDP framework was implemented for both simulated and actual active learning, achieving comparable performance.
Many of the research studies did not follow the pre-processing phase, which causes time complexity and resource requirements.In addition, numerous research studies incorporate traditional methods for human detection and recognition.Furthermore, they adopt a single technique to achieve this goal, which is less optimized.Moreover, researchers avoid extracting robust and multiple features and use a classification approach without data optimization.Due to these reasons, we face various issues such as less accuracy of the system, higher error rate, data normalization, optimization problems, and time-saving with resource utilization issues.To address aforementioned concerns, we created a robust framework to identify the human and analyze their activity in a human life log.

Objective 2: Proposed Methodology with Novelty Highlights
In this part, we present a complete explanation of the suggested system with detailed methods as well as results.

Methods
Initially, we transformed the video data into frames.Then, we reduced the converted frames to a set size, reduced distortion, and boosted image clarity.The next step was to detect the human from various structures and extract the following features: moveable body points, shape distance features, moveable body parts, and angular cosine features.After this, we needed to optimize the data for more accurate results.To achieve this, we applied graph mining.Finally, we used AdaBoost for classification and activity analysis.

Data Pre-Processing
Before human verification and identification, we utilized several pre-processing methods to reduce computational expense and time.This includes the preliminary conversion of video sequences to image data.These images have a constant size of 450 × 350 pixels.The images are then denoised via the median filtration process.Median filtering is performed to recognize deformed pixels in images and replace them with the median index.We used a 5 × 5 grid to reduce noise.The mathematical representation of the median filter is formulated in Eqs. ( 1)-(3): where I1, I2, I3, . . ., In is the order of the adjacent pixels.All available pixels of the given images must be organized in order.Subsequently, the categorization of the pixels and the arrangement of the selected pixels will be I m1 < I m2 < I m3 < I mn where n is generally abnormal.Fig. 2 shows the results of noise reduction and data preprocessing.

Background Subtraction
For background removal, we used an improved joining methodology wherein we applied a Markov random field on color parameters and region fusion techniques.Next, we performed change recognition in-frame sequence using a dynamic threshold-based method followed by spatial-temporal variance to achieve more precise results.Fig. 3 depicted the results of background subtraction over the U-T interaction data set.

Human Detection and Verification
In this section, we discuss the optimization of the identification of human silhouettes by combining change recognition, Markov random field, and spatial-temporal variance approaches with a dynamic thresholding strategy.Eq. ( 4) presents the equation we used for human head tracing, where T w H characterizes a human head position in numerous specified input frames, w, which is calculated via spatial-temporal variance.For human recognition and verification, Eq. ( 5) demonstrates the following mathematical formulation: where T w FH characterizes a human position in numerous specified input frames, w, and T w End indicates the bounding container dimension for the human identification and verification Algorithm 1, which describes the human detection technique in detail.Fig. 4 shows the results of human detection and verification.

Features I: Shape distance features
In shape distance features, we extract the six points' index values and find the distance of all covered points over two human figures.These points are driven by the adopted approach according to the size and distance between the two humans.Eq. ( 6) presents the formulation of shape distance features, where Sdf is the shape distance feature vector, 6/2 is a constant, s is the adjacent side of a hexagon and a represents the apothem distance.Fig. 5 shows the resulting shape distance features.

Features II: Moveable body parts features
This technique targets the specific movable body components of humanoid shapes.Whenever human action is first identified, a mask is drawn around the movable section, and its pixel position is determined.Finally, we collect the images' top 35 values and translate them into verticals (see Fig. 6).

Features III: Angular cosine features
In angular cosine features, we map six points over the human body and join them as a mesh.With the help of trigonometric function, we find the angle of these extracted points and insert the CMC, 2023, vol.75, no.1 1399 resulting angular cosine feature's vector.Eq. ( 7) shows the mathematical formulation of the angular cosine features: where cos (x + y) is the angle value and x, y are point A and point B, respectively.Fig. 7 shows the detailed results of the angular cosine features.

Features IV: Movement flow features
In movement flow features, we applied a color-and segmentation-based model over the given data frames.After this, we recognize the movement flow and mark it with various colors.Finally, we can get these index values and map them in vectors for future calculations and estimations.
The movement flow features are formulated as where M f denotes the movement flow features vector, iv is the index RGB values, and pi is the given frame.Fig. 8 shows the results of movement flow features.

Deep Feature Mining: Graph Mining
As features are retrieved from the entire dataset, the subsequent phase decreases the input indexes, which minimizes operational costs and enhances accuracy.To feature content that is also exposed to quantitative frameworks and indicators, scholars may have a high prediction result of retrieval by utilizing the graph mining methodology [25].Graph mining combines methods and equipment for data processing, anticipating data models, and constructing an organized and realistic graph for pattern recognition.Algorithm 3 presents the whole functioning description of graph mining.

Classification: AdaBoost
AdaBoost is one of the most frequently used techniques for classification; it builds a robust classifier by using a joint distribution of several component models.AdaBoost is used to determine low-quality participants to construct a group statistical method.During the training phase, the member classifiers are selected to achieve the lowest possible margin of error for each category.
In the second phase of recurrence, AdaBoost presents a technique that is not only straightforward to use, but also adequate for the generation of ensemble methods.This is done using a recurrent phase adjustment over the entire training collection, one of its hybrid properties [26].Eq. ( 9) shows the formulation of the training phase of the AdaBoost algorithm.
where a n is an enhanced learner that generates a feature x as a contribution and computes the worth to recognize an entity class.At each recurrence of the training technique, a weight, we i,n , is assigned to each segment in the input set that is identical to the obtainable inaccuracy, Er (B n−1 (a i )), on that segment.Here, B n−1 (a i ) is represented as a boosted classifier, which recognizes the fragile learner.Fig. 9 shows the AdaBoost model diagram.The leave-one-subject-out (LOSO) cross-validation technique has been employed to test the performance of the HVAA system over two openly accessible datasets, including the Sports Videos in the Wild (SVW) dataset and the UT-interaction dataset.The LOSO technique is a modified crossvalidation method that involves single-subject data for each fold.

Datasets Information
The benchmark datasets include diverse sports activities and sophisticated human-human interaction scenes.In the SVW dataset [27], most of the videos were captured with the Coach's Eye mobile app, an innovative sports training program developed by TechSmith specifically for smartphones.There are 19 activity categories for 19 various activities, including archery, baseball, basketball, BMX, bowling, boxing, cheerleading, football, golf, high jump, hockey, hurdling, javelin, long jump, pole vault, rowing, shotput, skating, tennis, volleyball, and weightlifting; all images were acquired at a resolution of 720 × 480 and at 30 frames per second.Fig. 10 depicts a sampling of photos from the SVW collection.The second benchmark UT-interaction dataset [28] contains recordings of six classes of periodically conducted human-human interactions: shaking hands, pointing, hugging, driving, and striking.We accessed a preview of twenty one-minute-long video feeds.The expanded video data include at least one additional execution per contact, resulting in an average of eight human encounters per film.A large number of participants engage in the videos, which include more than 15 different outfits.The entire video was taken at a resolution of 720 × 480 at 30 frames per second.There are six distinct interaction classes: handshake, embrace, kick, point, punch and push.Fig. 11 illustrates a sampling of the photos from the UT-interaction dataset.

Experimental Results and Analysis
In the experiment of the proposed HVAA system, we used MATLAB (R2021a) for all simulations and estimations.We also used an Intel (R) Core (TM) i7-8665U @ 1.90 GHz CPU with 64 bit Windows 11 in the testing device.The device had 16 GB of RAM.The new verdict on the SVW and UTinteraction datasets along with experimental outcomes is analyzed in the results section.

Experimental Setup and Evaluation
We undertook two tests to evaluate the performance of the proposed HVAA system across two benchmark datasets.Tables 1 and 2 display the real subject count and human verification average accuracy based on the variation of the frame data.Table 1 had five columns, the first of which represents the series of specified frames, the next represents the real track, the third contains successful tracking, the fourth column shows the number of failures, and the fifth indicates the accuracy of the SVW and UT-interaction datasets.The next stage of research was to determine the typical and abnormal events of the proposed HVAA system with the assistance of the AdaBoost model algorithm.Fig. 12 presents the confusion matrix for the SVW dataset with a recognition rate of 92.15%.Fig. 13 shows the confusion matrix for the UT-interaction dataset, which has an average accuracy of 92.83%.

Experiment II: Comparison with other Algorithms
After achieving significant mean recognition results for our proposed HVAA system, we compared it with novel classification techniques.Table 3 reveals that our performance [29] on the benchmark CMC, 2023, vol.75, no.1 1405 datasets is significantly higher than the results from previous techniques.For example, the framework of Markov random field is adopted by Park et al. [30], which combines pixels into interconnected blobs and tracks inter-blob correlations.On the other hand, the conventional neural network introduced by Li et al. [31] estimated the human body pose.Additionally, Chen et al. [32] employed morphological segmentation of the top color along with methodical thresholding.Rodriguez et al. [33] also developed a new approach for predicting future body movement.Additionally, they incorporated logical explanations and focused failure mechanisms to support a regenerative system that forecasts definite future human motion.The evaluation of complex event detection and classification with state-of-theart techniques is presented in Table 3.In Tables 4 and 5, we evaluated the performance of the proposed HVAA system by comparing it with two other state-of-the-art methods, namely, Maximum Entropy Markov Model (MEMM) and Genetic Algorithm (GA) classifiers.We compared their precision, their recall, and the F1 scores of all classes used in the two benchmark datasets, SVW and UT-Interaction.In Tables 4 and 5, we compared the SVW and UT-Interaction datasets with the Maximum Entropy Markov Model (MEMM) and the Genetic Algorithm (GA).These results show that AdaBoost achieves better classification scores (precision, recall, and F-measure) when employed for predicting and classifying extraneous human behavior.
Due to the complex nature of these benchmark datasets, this study has one drawback, namely, occlusion.This problem impacts human tracking and verification as well as the feature engineering process.This is the main factor that caused the mean recognition to drop.

Discussion
Our HVAA system is designed to predict the extraneous interactions of human activities in various indoor and outdoor environments using graph features-based mining and AdaBoost classification.This study focuses on denoising, human interaction and verification, multi-subject analyses, feature engineering, feature selection, and behavior analysis.Initially, we conducted the pre-processing phase to lower computational overhead, as some of the data involved both human and non-human random objects simultaneously.To mitigate this issue, human-related verification and robust multi-person were conducted.For multiclass classification and estimation, feature engineering is a significant step.We introduced robust, contextually intelligent features as well as deep mining features.Additionally, a graph mining strategy was applied for feature optimization.Finally, AdaBoost was employed for predicting and classifying extraneous human behavior.
Due to the complex nature of these benchmark datasets, this study has one drawback, namely, occlusion.This problem impacts human tracking and verification as well as the feature engineering process.This is the main factor that caused the mean recognition to drop.

Analysis
In this section, we critically analyze our proposed methodology.Initially, we present a robust approach to overcome the research gaps, such as the human detection methods.Then, we perform various algorithms to detect humans and optimize them to attain accurate results.Next, we provide multiple feature extraction approaches for abstracting valuable data.After feature representation, we optimize them through optimization algorithms.For classification, we utilized Adaboost in order to get more accurate results than existing methods.

Conclusion and Future Insight
Our proposed work presented a step forward in the system to predict human behavior and determine both normal and abnormal events in an indoor-outdoor environment.First, we used two benchmark datasets as the input stream via numerous preprocessing techniques.These datasets involved sports and event-sourced information.Second, we denoised the sequence of images and dimensions, tracking the human and non-human objects.Following that, we performed feature engineering to extract diverse features.Next, we employed the graph-mining strategy to reduce the computational overhead and improve the recognition rate.Finally, the AdaBoost model was incorporated to predict the activities and locomotion patterns of numerous subjects.This study also compares the performance of our HVAA proposed system with that of other state-of-the-art methods.The experimental results have shown significant performance improvement over two benchmark datasets when compared with other state-of-the-art techniques.
In future work, we will integrate more complex tasks from various contexts, including medical centers, workplaces, IoT based system, Security and surveillance based system and smart homes.Additionally, we will fuse more feature engineering techniques from different domains in order to recognize complex motion patterns in multiple contexts along with human 3D modeling and 3D image reconstruction.

Figure 1 :
Figure 1: Detailed overview of the proposed architecture via graph-based deep features mining and AdaBoost classification

Figure 2 :
Figure 2: Example images after data preprocessing

Figure 3 :
Figure 3: Background subtraction results in (a, c) background subtraction result and (b, d) binary conversion

Algorithm 1 :Figure 4 :
Figure 4: Human detection results in example images (a) original RGB image, (b) background subtraction result, and (c) human detection and verification

Figure 5 :
Figure 5: The results of shape distance features: (a) extracted background subtraction, (b) six points over humans, and (c) region of shape distance

Figure 6 :
Figure 6: The results of moveable body parts features (a) general movement detection, (b) points to human and human-linked objects, and (c) moveable human objects

Figure 7 :
Figure 7: Results of angular cosine features: (a) background-subtracted image and (b) angular cosine features at every point

1400CMC, 2023 1 Algorithm 2 :Figure 8 :
Figure 8: The results of movement flow feature: (a) original RGB image, (b) movement flow marked over the human body, and (c) movement values in various colors

Figure 9 :
Figure 9: Detailed model of the AdaBoost classification algorithm

Figure 10 :
Figure 10: Example images from various classes of the SVW dataset

Figure 11 :
Figure 11: Example images from various classes of the U-T interaction dataset

Table 1 :
Actual human detection and verification accuracy over the SVW dataset

Table 3 :
Event classification comparison of recognition rate of the HVAA proposed method with other state-of-the-art methods over UT and SVW datasets

Table 4 :
Measurements of evaluation metrics for the proposed HVAA system over the SVW dataset

Table 5 :
Measurements of evaluation metrics of the proposed HVAA system over the UT-Interaction dataset