The learning status of learners directly affects the quality of learning. Compared with offline teachers, it is difficult for online teachers to capture the learning status of students in the whole class, and it is even more difficult to continue to pay attention to students while teaching. Therefore, this paper proposes an online learning state analysis model based on a convolutional neural network and multi-dimensional information fusion. Specifically, a facial expression recognition model and an eye state recognition model are constructed to detect students’ emotions and fatigue, respectively. By integrating the detected data with the homework test score data after online learning, an analysis model of students’ online learning status is constructed. According to the PAD model, the learning state is expressed as three dimensions of students’ understanding, engagement and interest, and then analyzed from multiple perspectives. Finally, the proposed model is applied to actual teaching, and procedural analysis of 5 different types of online classroom learners is carried out, and the validity of the model is verified by comparing with the results of the manual analysis.

Due to the rapid development of Internet technology, network users are also increasing. According to Cisco’s “Visual Network Index-Global Mobile Data Traffic Forecast”. The number of global mobile device users will increase to 5.4 billion in 2020, accounting for 70% of the world’s total population. According to the “China Internet Development Report (2021)”, by 2021, the number of online education users in China will reach 342 million, accounting for 34.6% of the total netizens. It can be seen that with the integration and development of Internet technology and education, informatization will lead to the development of education and promote the innovation of education [

In order to make the detection more accurate and the learner’s learning status assessment more comprehensive, this research made the following improvements:

An analysis model of learners’ learning status in online classes is proposed, as shown in

The quantitative methods of emotion and fatigue information were studied. On the basis of the existing methods of emotion detection and fatigue detection, the emotion information was quantified according to Wundt’s emotion theory and PAD 3D emotion model, and the fatigue information of learners was quantified according to the expert survey method. The two methods were used to describe the information related to the learning state in a numerical way.

A hierarchical information fusion method adapted to this model is proposed. Existing multidimensional information fusion methods do not take into account the guidance of learning state analysis model, so it is difficult to accurately understand the learning state of learners. In this study, all secondary information under the primary information is fused first, and then the primary information is fused in a unified subjective and objective way to make the fused data more accurate and reasonable.

Condition analysis model based on the above study, multi-dimensional information identification, quantification, and fusion method to carry out the experiment, the experimental results show that this model can identify the differences of different learners learning state, can well reflect the learners of different learning state, at the same time, the model test results compared with the teacher’s evaluation, the feasibility and effectiveness of the method was verified.

The rest of the structure of this paper is arranged as follows. The second section summarizes the related research work on the analysis of learners’ learning state, the third section introduces the process of emotion detection and quantification, the fourth section introduces the fatigue detection process and determination method, and the fifth section introduces the multi-dimensional information fusion process and calculation method, the sixth section is to apply the model to the actual teaching for analysis, and the seventh section summarizes this article.

In this section, we mainly summarize the research status of emotion recognition or fatigue detection in the field of online education, as well as the learning status analysis of online learners by integrating multidimensional information.

Psychological studies show that emotion is a very important non-intellectual factor in teaching activities. In teaching activities, emotion cannot only affect memory, reasoning operation and problem solving, but also affect learners’ perception, hinking, executive control and decision-making. Many researchers mainly use facial expression recognition, gesture recognition, speech and text recognition or multi-modal recognition of learner emotions to evaluate learners’ learning status. D’mello et al. [

The fatigue state of online learners is also an important factor affecting their learning state. Most studies on the learning fatigue state of students in class are the content of education and psychology, and the application of fatigue detection technology in the field of education is limited, mainly focusing on the field of motor vehicle driving. Wood et al. [

However, the use of single-dimensional information to analyze the learning state detection results is not accurate enough. Some researchers use sensors or neural networks to detect learners’ emotions, gestures, and fatigue status. Kamath et al. [

To sum up, many scholars have done a lot of research on the learning emotion, learning state and learning behavior in the online learning environment. Image recognition technology and advanced detection and sensing technology are widely used in online learning, and have achieved good results. In this research, facial expression recognition and eye state recognition are performed by constructing a deep separable convolution model and a deep convolutional neural network. Finally, the fusion calculation is carried out according to the sequence changes of the learners’ facial expressions, the results of fatigue detection and the results of answering questions after class, obtaining the values of learner understanding, engagement and interest in each time period, and draw the change curve of the three at the same time.

Learning emotion is the most intuitive manifestation of a student’s learning state. Learners will show different emotions according to the difficulty of the course, their own receptive ability and the teaching method of teachers. As early as the end of the last century, the American scholar Paul Ekman’s experiment defined universal human emotions as six types, namely, anger, disgust, fear, happiness, sadness, and surprise. According to the FACS (Facial Action Coding System) proposed by Friesen et al. [

The emotion detection process is as follows: (1) Obtain the learning videos of learners in online classes through the camera and convert the videos into image sequences; (2) Load the face detection model and the trained expression recognition model for batch recognition of image sequences; (3) Count the recognized images in segments, and count the frequency and percentage of various expressions in each period of time. In the training process of the model, by selecting an appropriate facial expression data set and preprocessing it, the processed images are subjected to feature extraction and expression classification, and the classification method adopts the basic emotion classification proposed by Ekman.

This research builds a deep separable convolutional network model based on the Inception network proposed by Szegedy et al. [

The model omits 4 repeated modules in the hidden layer, and adjusts the number, size and moving step size of the convolution kernel. In order to prevent the gradient from disappearing or exploding, the BN regularization method is selected to control the numerical range before activation, and speed up model convergence at the same time. During model training, the Adam algorithm is used for optimization, the loss function adopts the cross entropy function, and the data enhancement adopts the built-in data generator of Keras. The fer2013 data set was trained and tested in a 4:1 manner, and the model training accuracy reached 72.14%. In order to prove the effectiveness of the model, this study compares the experimental results with other algorithms.

Method | Accuracy (%) |
---|---|

Deepemotion [ |
70.02 |

Inception [ |
71.6 |

Ad-Corre [ |
72.03 |

Our | 72.14 |

Mehrabian and Russell proposed a three-dimensional emotion model, PAD for short. The model divides emotions into three dimensions: pleasure, activation, and dominance [

S1∼S12 are the 12 items of the scale, after normalization, the value range of each dimension of PAD is

Learning emotions | P | A | D |
---|---|---|---|

Anger | −0.43 | 0.40 | 0.37 |

Disgust | −0.27 | −0.20 | 0.21 |

Happiness | 0.46 | 0.35 | −0.32 |

Sad | 0.06 | −0.28 | 0.05 |

Fear | −0.16 | 0.18 | 0.38 |

Surprise | 0.43 | 0.38 | −0.44 |

Neutral | 0.18 | 0.10 | −0.12 |

At the same time, according to the emotional dimension space theory, the corresponding relationship between the emotional learning space and the learning state space can be established. Corresponding to the three dimensions of Wundt’s three-dimensional emotion model [

Therefore, according to the learning state and the P value, A value, and D value of the PAD three-dimensional emotional space, the corresponding model as shown in

According to the mapping relationship of the learning state in

In order to prevent the single-dimensional information incompleteness caused by emotion detection in the evaluation of students’ learning state, this paper uses fatigue detection as a dimension of learning state evaluation, and again uses convolutional neural network to build a human eye state recognition model to detect student fatigue [

In the process of eye state recognition, this paper selects the CEW closed-eye data set (

Layer_name | Output size | 18-layer |
---|---|---|

Conv1 | 112 × 112 | 7 × 7,64, stride = 2 |

Conv2_X | 56 × 56 | 3 × 3 Maxpool, stride = 2 |

[[3 × 3,64] [3 × 3,64]] × 2 | ||

Conv3_X | 28 × 28 | [[3 × 3,128] [3 × 3,128]] × 2 |

Conv4_X | 14 × 14 | [[3 × 3,256] [3 × 3,256]] × 2 |

Conv5_X | 7 × 7 | [[3 × 3,512] [3 × 3,512]] × 2 |

1 × 1 | Average pooling, 1000-d Fc, Softmax | |

Flops | 1.8 × 10^{9} |

During the training process, the recognition accuracy rate of the model in the training set and the test set was mainly recorded. The accuracy rate on the training set reached 97.40%, and the accuracy rate on the test set reached 92.43%. To illustrate the effectiveness of the model, we compare the experimental results with the classical network model lenet-5 and the existing machine learning algorithms, as shown in

Method | Accuracy (%) |
---|---|

LBP + SVM [ |
90.37 |

96.83 | |

94.52 | |

Our | 97.40 |

It can be seen from

According to the PERCLOS measurement method proposed by the Carjimelon Institute in the 1980s [

Learning status | Fatigue status | Expert review average | Researcher review average | Comprehensive score | Weight |
---|---|---|---|---|---|

Comprehension | fatigue | 2.56 | 0.97 | 1.924 | 0.214 |

Non-fatigue | 7.02 | 7.18 | 7.084 | 0.786 | |

Engagement | fatigue | 2.49 | 0.98 | 1.886 | 0.206 |

Non-fatigue | 7.09 | 7.57 | 7.282 | 0.794 | |

Interest | fatigue | 1.97 | 1.12 | 1.63 | 0.183 |

Non-fatigue | 6.94 | 7.81 | 7.288 | 0.817 |

Since each dimension information has different effects on the learning state, it is necessary to assign weights to each dimension information. In this paper, the combination weighting method is used to determine the information weight of each dimension, and the combination method adopts the “multiplication” synthesis [

(1) According to the correlation and affiliation between three-dimensional information, a multi-level analysis structure model of understanding, engagement, and interest is formed, respectively.

(2) A positive and negative matrix is established according to the model, and the value of the matrix is given by the decision maker using Santy’s 1–9 scaling method, the values of the upper and lower triangular regions of the matrix are opposite. If the value of the i-th row and the j-th column is, then the value of the j-th row and the i-th column is.

(3) Solve the weight vector, calculate the average value of each analysis item, and then divide the average value to obtain the judgment matrix. The larger the average value, the higher the importance and the higher the weight, as shown in

Average | Item | Emotional information | Fatigue information | Answering information |
---|---|---|---|---|

1.400 | Emotional information | 1 | 2.295 | 0.525 |

0.610 | Fatigue information | 0.436 | 1 | 0.229 |

2.667 | Answering information | 1.905 | 4.372 | 1 |

Average | Item | Emotional information | Fatigue information | Answering information |
---|---|---|---|---|

1.167 | Emotional information | 1 | 0.583 | 1.913 |

2.000 | Fatigue information | 1.714 | 1 | 3.279 |

0.610 | Answering information | 0.523 | 0.305 | 1 |

Average | Item | Emotional information | Fatigue information | Answering information |
---|---|---|---|---|

2.333 | Emotional information | 1 | 1.556 | 4.430 |

1.500 | Fatigue information | 0.643 | 1 | 2.848 |

0.527 | Answering information | 0.226 | 0.351 | 1 |

(4) Use the sum-product method to obtain the weight of each dimension information, it is to add the normalized judgment matrix by row according to

Then normalize according to

At the same time, the maximum eigenroot of the judgment matrix is calculated according to

Finally, the consistency test is carried out according to

Item | Eigenvector | Weight value | Eigenvalue | Ci value |
---|---|---|---|---|

Emotional information | 0.898 | 29.936% | 3.000 | 0.000 |

Fatigue information | 0.391 | 13.043% | ||

Answering information | 1.711 | 57.021% |

Item | Eigenvector | Weight value | Eigenvalue | Ci value |
---|---|---|---|---|

Emotional information | 0.927 | 30.891% | 3.000 | 0.000 |

Fatigue information | 1.589 | 52.957% | ||

Answering information | 0.485 | 16.152% |

Item | Eigenvector | Weight value | Eigenvalue | Ci value |
---|---|---|---|---|

Emotional information | 1.606 | 53.517% | 3.000 | 0.000 |

Fatigue information | 1.032 | 34.404% | ||

Answering information | 0.362 | 12.080% |

Consistency Indicator:

(5) Consistency test is carried out according to

The maximum eigenroot | Ci | Ri | Cr | Consistency check results |
---|---|---|---|---|

0.000 | 0.520 | 0.000 | PASS |

The calculation steps of the entropy method are as follows: (1) Draw up a survey scale on the impact of three-dimensional information on learning status, and invite 2 field experts and 4 researchers to score each item in the scale. (2) According to the scoring data of each expert, the decision matrix of the three-dimensional information weight problem is constructed as:

At the same time, calculate the proportion of the score of the j-th expert under the i-th dimension information according to

Then calculate the entropy value of each dimension information according to

And calculate the weight of each dimension information according to

Finally, the AHP weight and the entropy method weight are multiplied and synthesized according to _{j} _{j}_{j}

The weight distribution is shown in

Item | AHP weight | Entropy value weight | Combination weight |
---|---|---|---|

Emotional information | 29.9% | 32.2% | 27.67% |

Fatigue information | 13.1% | 30.7% | 11.56% |

Answering information | 57% | 37.1% | 60.77% |

Item | AHP weight | Entropy value weight | Combination weight |
---|---|---|---|

Emotional information | 30.89% | 32.4% | 29.13% |

Fatigue information | 52.96% | 36.5% | 56.25% |

Answering information | 16.15% | 31.1% | 14.62% |

Item | AHP weight | Entropy value weight | Combination weight |
---|---|---|---|

Emotional information | 53.5% | 38.5% | 58.71% |

Fatigue information | 34.4% | 31.6% | 30.98% |

Answering information | 12.1% | 29.9% | 10.31% |

How to integrate three-dimensional information is a key issue in evaluating students’ learning status. This paper adopts the method of decision-level fusion to carry out level-by-level fusion. The mathematical explanation is as follows: if there are N students participating in the test, the sample set is represented as

There are a variety of second-level information under the first-level information, denoted as

This paper selects 17 students in the seventh grade of a middle school in Chongqing as the subjects, and analyzes the learning state process and overall analysis by obtaining their learning videos during class and after-class answering data. The learning videos are collected in an environment with bright light and clear face contours. The video is about 45 min long, and the learning content is the same teaching course at the same time.

First, 5 different types of students were selected from these 17 students, namely positive, sad, irritable, natural, and tired or sleepy. They have different personality characteristics and listening status. It typically reflects the daily learning status of students. Through the learning state process detection and fusion calculation of these five students, the following detection results are obtained, as shown in

From

From

From

According to the test results, it is fully proved that the model can effectively identify the difference in learning status of different learners, and can effectively identify different learners’ understanding, commitment and interest in the course. To ensure the validity of the model detection, by inviting teachers to rate the understanding, engagement and interest of these 5 learners according to the video and answering scores. If the score is 80–100, it is considered as excellent, 60–80 is qualified, and below 60 is unqualified. The results of manual evaluation and model detection are shown in

Evaluation method | Positive | Sad | Grumpy | Natural | Fatigue | |
---|---|---|---|---|---|---|

Comprehension | Expert | Excellent | Failed | Failed | Passed | Failed |

Model | Excellent | Passed | Failed | Passed | Failed | |

Engagement | Expert | Passed | Passed | Passed | Passed | Failed |

Model | Passed | Passed | Passed | Passed | Failed | |

Interest | Expert | Passed | Passed | Failed | Passed | Failed |

Model | Passed | Passed | Failed | Passed | Failed |

From the information in the table, it can be seen that the model in this paper can basically reflect the real learning state of the learners in class. The understanding of the students in the sad state is on the edge of passing, which is different from the artificial scoring, but the overall impact is not large. Therefore, the comparison can confirm that the model detection is feasible and effective.

In order to evaluate the learning quality of the entire class, this experiment randomly selected 12 students to test through the model, and calculated the overall understanding, engagement, and interest of the entire class. The test results are shown in

Information category | Stu1 | Stu2 | Stu3 | Stu4 | Stu5 | Stu6 | |
---|---|---|---|---|---|---|---|

Emotional Information | Comprehension | 0.75 | 0.55 | 0.65 | 0.65 | 0.58 | 0.72 |

Engagement | 0.66 | 0.54 | 0.53 | 0.59 | 0.47 | 0.66 | |

Interest | 0.75 | 0.55 | 0.65 | 0.64 | 0.58 | 0.72 | |

Fatigue Information | Comprehension | 0.79 | 0.66 | 0.72 | 0.79 | 0.60 | 0.79 |

Engagement | 0.79 | 0.66 | 0.73 | 0.79 | 0.60 | 0.79 | |

Interest | 0.82 | 0.68 | 0.75 | 0.82 | 0.61 | 0.82 | |

Answering Information | Comprehension | 0.85 | 0.62 | 0.81 | 0.69 | 0.50 | 0.87 |

Engagement | 0.15 | 0.38 | 0.19 | 0.31 | 0.50 | 0.13 | |

Result after fusion | Interest | 0.82 | 0.61 | 0.76 | 0.69 | 0.53 | 0.82 |

Comprehension | 0.76 | 0.62 | 0.68 | 0.72 | 0.55 | 0.76 | |

Engagement | 0.78 | 0.60 | 0.70 | 0.70 | 0.58 | 0.77 | |

Information category | Stu7 | Stu8 | Stu9 | Stu10 | Stu11 | Stu12 | |

Emotional information | Comprehension | 0.68 | 0.56 | 0.45 | 0.49 | 0.63 | 0.62 |

Engagement | 0.58 | 0.44 | 0.61 | 0.40 | 0.58 | 0.58 | |

Interest | 0.68 | 0.55 | 0.35 | 0.48 | 0.63 | 0.61 | |

Fatigue information | Comprehension | 0.79 | 0.75 | 0.79 | 0.44 | 0.79 | 0.60 |

Engagement | 0.79 | 0.76 | 0.79 | 0.43 | 0.79 | 0.60 | |

Interest | 0.82 | 0.78 | 0.82 | 0.43 | 0.82 | 0.61 | |

Answering information | Comprehension | 0.84 | 0.44 | 0.47 | 0.52 | 0.78 | 0.72 |

Engagement | 0.16 | 0.56 | 0.53 | 0.48 | 0.22 | 0.18 | |

Result after fusion | Interest | 0.79 | 0.62 | 0.50 | 0.45 | 0.74 | 0.68 |

Comprehension | 0.74 | 0.65 | 0.69 | 0.42 | 0.73 | 0.61 | |

Engagement | 0.74 | 0.63 | 0.51 | 0.46 | 0.70 | 0.62 |

It can be seen from the table that the learning status of students 1 and 6 is very good. According to the information after the integration, these two students have a high degree of understanding, a good grasp of knowledge, and the degree of engagement and interest are both above 0.75. The learning status of students 3, 4, 7, and 11 is relatively good. Judging from the information after integration, the understanding, engagement and interest of the four students are all around 0.7, Among them, students 3 and 7 have a high degree of understanding, have a good understanding of knowledge, and are generally more engaged and interested in the classroom; students 2, 8, and 12 have poor learning status. According to the results after integration, the three students’ understanding, engagement, and interest are all around 0.6, and their concentration and interest in the classroom are not strong. The learning status of students 5, 9, and 10 is very poor. Judging from the results after integration, except for student 9, whose engagement is higher than 0.6, the other values are between 0.4 and 0.6. They have a poor understanding of knowledge and are not interested in the classroom, the concentration of learning is very low, and the final learning effect is not good.

According to the statistical data of the experimental results, the learning status data of 12 students after multi-dimensional information fusion are displayed graphically as shown in

This research proposes an online learner learning state analysis model based on multi-dimensional information fusion, and analyzes it from three aspects: Understanding, engagement and interest according to the correlation between detection data and learning state. Combined with the learning state analysis model proposed in this paper, the process analysis and overall analysis of 17 online learners are carried out, which proves that the model can effectively identify the students’ process learning state and overall learning state. At the same time, the teacher’s score is compared with the model detection results to verify the effectiveness of the model.