TMCA-Net: A Compact Convolution Network for Monitoring Upper Limb Rehabilitation

,


Introduction
Upper extremity hemiparesis is commonly seen in the stroke survivor population, where patients have to repetitively undergo extensive upper extremity rehabilitation under the guidance of a physician or professional rehabilitation therapist, in order to continuously stimulate damaged brain nerves, remodel brain neural circuits, and achieve motor function recovery [1].Traditional inpatient rehabilitation requires substantial human and medical resources, resulting in high costs that impose a significant burden on both society and individuals.Home-based remote rehabilitation systems, on the other hand, offer low costs and no time or space restrictions, enabling patients to engage in rehabilitation training anytime and anywhere [2].Remote stroke rehabilitation systems monitor and recognize patients' posture, range of motion, and movement quality during rehabilitation training, record and provide feedback to rehabilitation therapists to timely assess patients' rehabilitation progress and formulate personalized treatment plans.Among them, camera-based rehabilitation monitoring systems are more mature, but suffer from limitations such as limited field of view, vulnerability to occlusion and lighting factors, and privacy concerns related to the personal data collected [3].Inertial sensor-based rehabilitation monitoring systems, on the other hand, have better stability, and with the widespread commercial use of MEMS inertial sensors and the reduction of costs, more research is focusing on how to achieve stable and accurate rehabilitation monitoring with fewer inertial devices.Monitoring system based on wearable inertial sensors and intelligent algorithms are more stable, economical and secure.How to model medical data with the goal of high accuracy upper limb gesture recognition is the problem needs to be solved.Previous studies have achieved some substantive progress, but the cost of calculation and memory consumption of the model limit its rehabilitation monitoring in the home environment.Furthermore, Feature extraction is the key to rehabilitation gesture recognition algorithm research [4].Classifiers based on hand-crafted feature in traditional machine learning methods rely on feature extraction design and proper selection of features, which is inefficient and complex.Existing stacked deep convolution neural networks can extract abstract features based on original sensor sequences, but they have problems such as high computational complexity and slow convergence.Not suitable for home equipment with limited computing resources.In response to the above problems and challenges, this paper presents an accurate, lightweight and fast convergence customized convolution neural network, TMCA-Net (Time Multiscale Channel Attention Convolutional Neural Network), which can accurately recognize patients' upper limb rehabilitation gesture based on the data recorded by a single inertial sensor fixed to the forearm.

Related Work
Human gesture recognition has potential applications in human-computer interaction, medical rehabilitation, virtual reality and other fields.Currently, there are three main types of gesture recognition methods: based on vision, based on electromyography (EMG), and based on the MEMS sensor.With the rapid development of computer vision technology, visual-based human gesture recognition is becoming mature and widely used in various fields, Agarwal et al. recovering 3D human gesture from image sequences [5].Fahn et al. proposed a real-time upper limb gesture recognition method based on particle filter and AdaBoost algorithm [6].Brattoli et al. proposed a self-supervised LSTM for detailed behavioral analysis, which learns accurate representations of gesture and behavior through self-supervision to analyze motor function [7].Htike et al. analyzed human activity [8] by converting video data into a dataset of static color images and using recognized gesture sequences.Chen et al. proposed Kinect data combined with deep convolutional neural network to estimate human gesture [9].Neili et al. used CNN to predict the 2-dimensional spatial position of joint points as gesture characteristics to monitor whether accidents occurred in the elderly in order to help monitor their activities in the home environment [10].
Compared with the vision-based monitoring system, sensor-based method is not limited by environmental impact and measurement range.More importantly, sensor does not record sensitive data of patients, EMG is a technology that uses sensors to record human skeletal muscle electrical activity.It is often used in EMG data features for gesture recognition [11].It has also been developing in recent years.It has the characteristics of low cost and high efficiency in clinical rehabilitation training [12], such as Lu et al. extracted features from the EMG data of the forearm muscles of stroke patients for human-computer interaction to achieve the purpose of auxiliary training [13], Bi et al. proposed a multi-signature reconstruction system based on the forearm EMG signal to strengthen the patient's autonomous control of FES and improve the therapeutic effect [14].EMG is a powerful biological information for analyzing human motion, but the problem of EMG is that the measured signal intensity is small, In particular, the signal amplitude around the wrist is low [15], and different click positions will lead to inconsistent measured signals, which leads to strict requirements on the wearing position.Because EMG is a biological signal, the signal difference between different individuals is large [16], compared with the physical signal recorded by IMU sensor, the individual difference is relatively small, and the requirement for wearing is lower, and the actual performance is more stable, but there are also measurement errors and poor anti-jamming ability [17].The commonly used method is usually based on acceleration signal data, using hand-crafted features [18], combined with machine learning method for classification, and the design of feature set is very key.Different task feature sets may be inconsistent, which requires professional a priori knowledge.For example, Liu et al. selected the mean and standard deviation of acceleration data to construct the feature set, combined a fully connected neural network for rehabilitation of upper limb gesture recognition [19].Ferreira et al. proposed to use the artificial neural network to analyze the dynamic data recorded by IMU, to identify the dynamic gesture of Alzheimer's disease (AD), to find the optimal number of hidden layers and neurons, and to develop a multi-layer sensor ANN for the diagnosis of Alzheimer's disease [20].Sanna et al. proposed an IMU sensor based on the Internet of Things (IOT) to monitor user activity and gesture transitions, transfer possible events to a home server or gateway, process the final process, and store the data in a telemedicine server [21].With the rise of deep learning and the availability of large-scale sports data, the combination of deep neural network and gesture recognition or activity recognition begins to appear.Based on large-scale data from wearable inertial sensors, um et al. used convolutional neural network (CNN) to automatically extract and identify the generated images and classified 50 gym sports items with an accuracy of 92.1% [22].Recently, the e-Health system based on the framework of the IOT has begun to emerge.Bisio et al. proposed a Smart-Pants medical system with multiple sensors and matching Android programs for remote real-time monitoring of lower extremity training of stroke patients [23].Jones et al. proposed a data collection method based on mobile phones.The original training data will be uploaded to the cloud server for analysis and evaluation, and medical staff can use the web system for remote access [24].Yuan and their team have designed a data glove device that integrates two arm rings and a 3D flexible sensor.They introduced a combination of residual convolutional neural network and LSTM module to capture the fine-grained movements of the arm and all finger joints [25].Bianco et al. implemented arm posture recognition, user identification, and identity verification based on a custom wireless wristband inertial sensor combined with a recursive neural network.The U-WeAr study found that numerical normalization preprocessing can enhance the anti-interference ability of gesture data with different amplitudes and speeds [26].Kang et al. achieved accurate gesture recognition during walking dynamics using a wristband-type inertial sensor, by combining empirical mode decomposition with a distribution-adaptive transfer learning method [27].
In summary, the combination of deep artificial neural networks and wearable devices has been proven to be an effective rehabilitation gesture recognition solution in multiple studies.However, accurately extracting key features from sensor data is the core issue for achieving reliable gesture recognition.Convolutional neural network models have been shown to have efficient feature extraction capabilities, but previous research has mostly used monotonous same-size convolution kernels to construct feature extractors.Such network structures may fail to capture multi-scale information in sensor data.Additionally, traditional stacked convolutional networks increase model non-linear expression ability by stacking convolutional layers to deepen the network, which improves performance in gesture recognition tasks but ignores the heavy computational load that increased model complexity places on computing devices.This paper proposes a more practical rehabilitation gesture recognition model to address these issues and assist in the monitoring of stroke patients' rehabilitation.

Methodology
Stroke patients need to perform a large number of functional arm movements over a long period of time to ensure the reconstruction and recovery of upper limb function.Effective and accurate rehabilitation monitoring methods can feedback the current recovery progress of patients to develop corresponding treatment programs.Accurate recognition of upper limb gesture is the biggest challenge facing current monitoring scenes, in order to assist the efficient recovery of stroke patients, this paper will describe an arm data recording and intelligent recognition method proposed in the following two parts.

Data Collection Device
Rehabilitation monitoring based on a single inertial sensor is a convenient and safe solution that avoids the need for multiple sensors on the upper extremity when performing rehabilitation actions, which can be confusing for patients or older people with inconvenient movements and may result in redundant recording data.A large number of studies have shown that upper extremity motion information can be captured based on a single inertial sensor.Based on this, this paper designs a data acquisition platform which is composed of a single inertial sensor (BWT901, Wit-motion), a PC with upper computer software, and a desktop device responsible for playing motion guidance.The inertial sensor is worn on the volunteer's forearm and wrist with a magic band.Data acquisition device is shown in Fig. 1.

Figure 1: BWT901
BWT901 connects to PC through wireless Bluetooth to avoid limited transmission activities.The device captures the three-axis acceleration, three-axis angular velocity, motion direction and other information when the upper limb moves at a frequency of 100 Hz.The transferred data will be retained in the format of text file and stored on the disk of PC for subsequent processing and analysis, the saved data frame is shown in Fig. 2.   The multi-branched structure in TMCA Module brings a variety of dense motion features.Among the rich and diverse sequence features, some key features are helpful to distinguish posture, but some of them can mislead posture distinction, such as outliers or noise in data, interference from gravitational acceleration characteristics in accelerometers, etc. to fit the data more accurately for the boost model.An adapted ECA [28] module is introduced to make the model more focused on the effective features while eliminating or suppressing the interference features.

Experiment
The purpose of the experiment was to verify the feasibility and validity of the proposed scheme in the upper extremity rehabilitation monitoring.The experiment is divided into two parts.The first part is based on 3.1 data acquisition device to collect data and build data set for upper limb rehabilitation training.The second part experimentally validates the recognition performance of the model TMCA and the optimal hyper-parameter settings.

Dataset Construction
Before data collection, the upper limb movements were selected and defined based on the FMA rehabilitation function evaluation scale and Brunnstrom rehabilitation stage theory.This paper focuses on the research of upper limb and arm rehabilitation monitoring.After communicating with the rehabilitation therapist, in order to ensure that the upper limb rehabilitation monitoring is accurate and not overly complex, five upper limb rehabilitation excises are designed to reflect the patients' upper limb functional status and rehabilitation progress, and have been confirmed by the rehabilitation therapist.Their specific definitions are shown in Table 1.Initial state without noticeable reflex in the table corresponds to the arm function status of brunnstrom stage I and II patients.The forearm flexor posture can be used to determine whether the upper limbs show partial CO movement (pathological abnormal movement pattern, relying on adjacent joint muscles to compensate for completion), forearm pronation accompanied by obvious joint movement, corresponding to the three-stage state of sufficient co movement, hand touching lumbar movement accompanied by partial separation movement appears, corresponding to the fourth stage of rehabilitation, shoulder abduction is required for patients to be able to control joints and muscles to complete free separation movement, corresponding to the fifth stage of rehabilitation.
Volunteers learned the normative execution of the trained movements through upper extremity action demonstration while watching the video before data acquisition, after which they wore the calibrated inertia sensor in the dominant hand forearm position in a sitting position, with the arms naturally placed in the initial posture after wearing it.Volunteers performed the predefined postures sequentially as indicated by the video, each action was repeated to follow the rehabilitation training truth, the action execution was slightly different according to individual habits, the execution speed was slightly slower than normal, each action was performed 20 times in one group, each action was performed with an interval of 1-2 min rest between completion, and the initial posture was acquired by default for 60 ± 5 s.The data acquisition process is illustrated in Fig. 5.
The motion data is segmented by fixed window size 600, and the overlap rate is 50%.Unify the length and size, and construct a data sample dimension of 600 × 6.Take 100 Hz as the sampling frequency, and the actual interception time of a single sample is 6 s.In order to adapt to the data modeling of deep convolution network, the data set is divided according to the object, of which 80% is used for the source data of training set, and 20% is used for the source data of model test.2. The task of upper extremity rehabilitation monitoring and recognition using a deep convolution neural network is a multi-classification problem.This section describes the evaluation indexes used to evaluate the performance of the proposed methods and the meaning of their respective representatives.In order to verify the validity and robustness of the method, four evaluation indexes, Accuracy, precision, recall and F1-score, are used to evaluate the identification method comprehensively.The calculation formulas for each index are shown in Table 3.In the aspect of model performance test, the TMCA-Net network based on multi-scale features proposed in this paper has significant recognition accuracy advantages on the upper extremity rehabilitation posture recognition dataset by comparing the collected dataset with the public dataset.The comparative evaluation results of each model are shown in Table 4.The F1-score indicates that the model performs more balanced in recognition performance, which proves the validity of the method proposed in this paper.In networks with similar performance, iSPLInception network and adapted Resnet show better performance in identifying tasks, but the large number of model parameters is the main factor that restricts the scenario applicable to traditional stacked deep convolution network.Baseline CNN network also shows good performance.Compared with LSTM, the strong feature extraction capability of CNN convolution network plays a role.Rehab-Net, a deep learning framework that focuses on upper extremity rehabilitation monitoring, performed relatively well in this experiment, inferring that recognition performance depends to some extent on pre-treatment methods.The performance of the TMCA-Net model on public datasets is shown in Table 5.The performance of the model is affected by many factors, including the size of data split window, loss function and optimizer algorithm.In this paper, the hyper-parameters of the proposed network are optimized to achieve the optimal recognition performance for the upper limb rehabilitation posture recognition task.The execution time of different upper extremity postures is inconsistent, resulting in different time span sizes of complete data.In order to explore the optimal time window for data segmentation, the sensor multivariate sequence data streams are segmented with window sizes of 400 and 600, overlapping 50%, and the effect of sample dimensions on the performance of recognition is explored, and the optimal segmented window is found.At the same time, different combinations of hyper-parameters were set up.Considering the balance of computing resources, the fixed model structure was a two-layer TMCA stacking structure.Based on the rehabilitation posture dataset, training and testing were carried out.The results of three experiments were averaged under each setting, and the accuracy of model recognition was used as the evaluation index to explore the best combination of hyper-parameters.
The experimental results in Table 6 show that the performance of the split window with 400 window size is slightly worse than that with 600 window size, the performance under different windows sizes is shown in Fig. 6.The main misclassification is the forearm pronation.The analysis shows that the 400 window cannot include the complete forearm pronation posture due to the long timeconsuming part of the forearm pronation.The ablation results show that the optimal optimization algorithm is Adam, and Relu shows stronger task applicability in terms of activation function.In view of the current problems faced by the remote upper limb rehabilitation monitoring system, this paper puts forward a set of reliable upper limb rehabilitation monitoring scheme, which uses a single inertial sensor fixed on the forearm to achieve effective motion information collection.The adaptive algorithm is also the core work of this paper.For the current feature extraction methods are complex, inaccurate and incomplete, based on the use of multi-branch convolution TMCA module.Automatically extracts multi-scale features from inertial data, combines attention mechanism to learn distinguishable features efficiently, and achieves accurate upper extremity rehabilitation movement recognition.Experiments verify the performance and task adaptability of the proposed model.Considering the limited memory of the device in the home environment, network parameters are kept at a low level by optimizing the structure design and replacing the convolution method.This is of great significance to the actual deployment of the upper extremity rehabilitation monitoring program.The future work will further focus on researching upper limb rehabilitation posture recognition and monitoring, and attempt to introduce reinforcement learning algorithms to automatically adjust applicable models in different scenarios, in order to improve the stability of monitoring solutions.Alternatively, edge computing technology may be combined to deploy deep models on edge devices, achieving low-latency remote real-time upper limb rehabilitation posture recognition.

Figure 2 :
Figure 2: The saved data frame

Figure 5 :
Figure 5: Volunteer in data collection process

Table 3 :
Model evaluation indices

Table 4 :
Model evaluation results

Table 5 :
Model performance in public dataset