Major issues currently restricting the use of learning analytics are the lack of interpretability and adaptability of the machine learning models used in this domain. Interpretability makes it easy for the stakeholders to understand the working of these models and adaptability makes it easy to use the same model for multiple cohorts and courses in educational institutions. Recently, some models in learning analytics are constructed with the consideration of interpretability but their interpretability is not quantified. However, adaptability is not specifically considered in this domain. This paper presents a new framework based on hybrid statistical fuzzy theory to overcome these limitations. It also provides explainability in the form of rules describing the reasoning behind a particular output. The paper also discusses the system evaluation on a benchmark dataset showing promising results. The measure of explainability, fuzzy index, shows that the model is highly interpretable. This system achieves more than 82% recall in both the classification and the context adaptation stages.

Learning analytics (LA) has been defined as “the measurement, collection, analysis and reporting of data about learners and their contexts, for purposes of understanding and optimizing learning and the environments in which it occurs” [

There are two basic limitations that are preventing across-the-board adoption of these methods with full confidence. First, these methods lack adaptability, and second there is poor understandability of the results of these models. Each course offering and the cohort has its own nuances like class size, students’ demographic background, the year in which the student registers the course etc. Traditional Machine learning (ML) models are strongly dependent on the dataset used for training. Due to this reason, most LA systems are beneficial for only a specific type of context and cannot be adapted to different contexts involving other courses or other institutions. Poor understandability is a very common issue in ML approaches. Especially in this domain of learning analytics, such understanding is an important factor that limits the adaptability of these systems by different stakeholders. In recent years, interpretable and explainable ML techniques have gained a lot of popularity to design intelligent learning systems providing explanations of the predictions that are easily comprehended by common users [

In this paper, the hybrid framework proposed for facilitating the portability of models for learning analytics [

● A statistical fuzzy framework enabling early intervention to help weak students.

● Adaptability of the model to new courses without retraining.

● Production of a set of linguistic fuzzy rules that are highly interpretable.

● Quantification of the interpretability of the fuzzy rules.

● High recall of both the classification and adaptation modules.

● Responsive and easy to use system.

Remainder of this paper is organized as follows. Section 2 discusses various research reports on LA, particularly addressing the interpretability and adaptability aspects. Section 3 describes the materials and methods used in the present study. Section 4 discusses the results. Lastly, Section 5 concludes the paper with recommendations for future research.

The amount of research in the domain of learning analytics involving ML approaches is tremendous. Surprisingly, very few research efforts consider explainability and interpretability of the ML models in this context. A few interpretable techniques are proposed for predicting students’ performance based on data collected from learning management systems, assignment marks and other enrolment systems and provide interpretability of the results. However, none of the approaches measures the interpretability of student prediction systems, and hence any comparison and evaluation is difficult to conduct. When it comes to measuring interpretability of ML models, a common approach is to use qualitative methods using human subjects to analyse the perceived interpretability of these models [

Moreover, the literature on quantification or measurement of explainability of AI models used for learning analytics is even more limited. In this section, a comprehensive overview of literature in the domain of learning analytics is provided with a focus on interpretability.

Socio-demographic data in addition to academic and LMS activity data is used to predict student performance in order to provide early support for at-risk students [

Most early warning systems predict students’ performance using large student datasets that ignore the idiosyncrasies of underrepresented students and consider general student population data only [

The interpretation of rules is provided by CN2 rule inducer and multivariate projection for the student performance prediction system that uses video learning analytics and data mining techniques. Student academic data, student activity data and student video interaction data are used to predict student performance by multiple algorithms. In addition, the effect of feature selection and transformation is also compared. The best prediction accuracy was achieved by the Random Forest algorithm with equal width transformation method and information gain ratio selection technique. The CN2 rule inducer algorithm also performed well but it provided easy rule induction with probability for non-expert viewers like educators that require this interpretation for providing support to students [

High drop-out rate and poor academic performance are two main issues affecting the reputation of educational institutes [

The student performance prediction systems that use black box techniques can be converted to interpretable systems by employing some design practices [

A knowledge gap has been identified between the model creation for student performance prediction and the interpretation of that prediction for actionable decision-making process. For this pedagogical change, a model based on recursive partitioning and automatic selection of features for robust classification was developed with high interpretability. The strength of the model was the transparent characterization of student subgroups based on relevant features for easy translation into actionable processes [

For student performance prediction systems, an important aspect that must be considered is prediction uncertainty or confidence in addition to prediction accuracy for developing a reliable early warning system for at-risk students. Two Bayesian deep learning models and Long Short-Term Memory (LSTM) models were used for predicting students’ performance in future courses based on their performance in courses already completed by the students. Prediction uncertainty associated with at-risk student prediction was considered before reaching out to provide additional support for effective and targeted utilization of resources. Also, the explainable results of the models provided information about the previous courses whose results influenced the prediction, which can be used for guiding students [

A combination of black box and white box prediction approaches was proposed for high prediction accuracy and interpretability. High prediction accuracy was achieved by using the SVM model and for interpretability the Decision Tree (DT) and Random Forest (RF) models were employed for extracting symbolic rules. In addition, an attribute dictionary was built from students’ comments which was converted to attribute vectors for predicting students’ grades. The combination of these techniques showed accurate prediction of students’ performance based on students’ comments after each lesson and the interpretable results showed the characteristics of attribute patterns for each grade [

The above mentioned literature provides interpretable student performance prediction models but the quantification and measurement of the interpretability is not considered in any of these [

The main components of the proposed framework are discussed in this section and

Decision Trees are an established and efficient tool in Machine Learning that are increasingly being adopted to support explainability of algorithmic decisions of classification and regression. Various algorithms have been developed for constructing optimized decision trees under various conditions. Some examples include CART, ID3, REPTree, C4.5 etc.

Tree algorithms apply a top-down, divide-and-conquer approach to the data to construct a tree or a set of rules. In a decision tree, the inner nodes represent value sub-ranges of the input variable and the leaf nodes represent the output values. The tree is constructed by recursively splitting the dataset by applying statistical measures to the variables and selecting a split variable based on the results. Some examples of statistical measures are Entropy, Information Gain, and Gini Index. Once the tree has been constructed, various input-output mapping rules can be traced out by traversing the tree from the root to a leaf. Moreover, there are algorithms for optimizing the tree in terms of its complexity. Pruning algorithms are the most popular in this category and many tree construction algorithms incorporate pruning in their operation based on various criteria, e.g., reduced error pruning (REP) is used by a REPTree. In the present work, the REPTree algorithm is used, which is an efficient decision tree algorithm capable of learning both classification and regression problems.

Fuzzy inference systems (FIS) have been used effectively in various domains including pattern recognition, healthcare, robotics, and control engineering etc. Based around Fuzzy sets (FS), FIS are ideally suited to domains with a large number of complex factors and non-linear relationships which cannot be expressed by clear mathematical equations. FIS's also have the unique advantage in terms of their adaptability. Unlike other modeling approaches like neural networks, regression, etc., which need to be redeveloped and retrained for every new context, FIS can be easily adapted to the new context without having to be retrained.

A fuzzy set (FS) is an extension of the classical set where members of a set have different degrees of membership in the set. FS offer an ideal representation tool to represent imprecise and approximate concepts. FS are used to define linguistic variables to partition a Domain of discourse (DD). For instance, class size may be partitioned into three linguistic variables {low, medium, high}, with each linguistic variable expressed as a FS

An FIS is a rule-based decision system making use of fuzzy sets. Various fuzzy operators are applied to aggregate the rule activations in the RB. The main components of an FIS are the rule base (RB) and the knowledge base (KB). The rule base comprises various rules of the form:

Once the fuzzy rules and fuzzy sets are created the interpretability of the FIS is measured. A number of approaches have been proposed in literature for measuring interpretability of rule-based systems and tree based systems [

Fuzzy index is proposed as an interpretability measure for fuzzy systems and is inspired by the Nauck index [

● Rule Base Dimension of the system: considering the number of rules and premises.

● Rule Base Complexity of the system: considering the number of rules with one, two and three

or more variables.

● Rule Base Interpretability of the system: considering Rule Base Complexity and Rule Base

Dimension of the system.

● Fuzzy index of the system (final output): considering Rule Base Interpretability and average number of labels defined by input variables.

The labels of the six inputs are identified and the four rule bases in the hierarchical fuzzy system work together to find the fuzzy index of the fuzzy system. The inputs and output of the system that calculates the interpretability of the fuzzy system are shown in

Type | Name of the variable | Description | Labels |
---|---|---|---|

Input | Total number of rules | Number of rules in the fuzzy system | 3 (Low, medium, high) |

Input | Total number of premises | Number of premises in the fuzzy system | 2 (Low, high) |

Input | Number of rules that use one input variable | Number of rules in the fuzzy system |
2 (Low, high) |

Input | Number of rules that use two input variables | Number of rules in the fuzzy system |
2 (Low, high) |

Input | Number of rules that use three or more input variables | Number of rules in the fuzzy system |
2 (Low, high) |

Input | Average number of labels defined by input variables | Number of labels defined by the input |
2 (Low, high) |

Output | Interpretability index | Interpretability index of the fuzzy system | 5 (Very low, low, medium, high, very high) |

As mentioned earlier, FS can be adapted to new contexts without needing to be retrained. Context is extremely important in correctly applying an LA model to predict future performance of the students. For instance, a class of size 30 might be considered large in case of an elective course but would be considered medium for a core course. However, research on context-adaptation for LA applications is scanty so far.

It is generally agreed that, among the two main components of an FIS, the RB is universal and context-independent whereas the KB is context-aware. Accordingly, an FIS can be adapted by transforming the KB according to the context without affecting the logical structure of the RB. The method given in [

Initially, the KB is populated with the definition of A according to the base context B defined over the DD

The context adapted version of A is calculated by replacing the base parameters with the adapted values as follows:

In this section the implementation and evaluation of the proposed model is discussed.

The validation of the proposed framework has been carried out using the Open University Learning Analytics Dataset (OULAD) [

To extract the tree, student data for 2 offerings of the same course in 2013 and 2014 has been used, represented in OULAD as course BBB with offerings 2013B and 2014B respectively. There were 8590 relevant records of 2531 distinct students, considering that only those assessments were chosen that were marked by the tutor (and not by the computer), as they carried the majority of the marks, and were due at most by the 120th day of the course, since an early intervention based on these is desired. There were 4 such assessments for each of the chosen course modules. All the relevant data was combined into one file and was then further processed by taking the average of the scores obtained in the aforementioned assessments for each student. As it was required to have a binary classification of the final result of a student as Pass or Fail, the instances of a final result of Distinction were renamed as Pass, and Withdrawn as Fail, without a loss of accuracy. Context adaptation was later applied to another cohort comprising 2 courses represented as AAA in OULAD with offerings 2013 J and 2014 J respectively. The data was preprocessed as for the pre-classification step and resulted in 678 relevant records that were then input to the context adaptation module (see Section 4.7).

Decision tree learning was preferred as the predictive technique because it uses a white-box model and is thus easy to explain and interpret. Moreover, it is computationally less intensive and requires less data preprocessing. The Scikit-learn library for CART (Classification and Regression Trees) algorithm and REPTree (Reduced error pruning) algorithm] from WEKA (Waikato environment for knowledge analysis) library] were used.

Both algorithms gave similar results. CART's binary tree was built by splitting nodes on the basis of Gini impurity index. Pandas and Numpy were used for data manipulation. WEKA's REPTree algorithm builds a decision or regression tree using information gain/variance reduction and prunes it using reduced-error pruning. Optimized for speed, it only sorts values for numeric attributes once. It deals with missing values by splitting instances into pieces, as C4.5 does. You can set the minimum number of instances per leaf, maximum tree depth (useful when boosting trees), minimum proportion of training set variance for a split (numeric classes only), and number of folds for pruning.

The features found to be most important for classifying a student as likely to Pass or Fail are: i) the student's average score in the assessments, ii) highest education level attained by the student previously, and iii) the index of multiple deprivation, which is essentially a poverty measure. A training-test split of 80-20 was used to build the REPTree model.

After obtaining the tree, each path from the root to a leaf is traversed and the feature sub-intervals expressed in each of the nodes are extracted. After examining the collected feature sub-intervals, fuzzy linguistic variables were defined to express these sub-intervals. Following input features were fuzzified with the help of linguistic variables:

The parameters used for calculation of the fuzzy index of the FIS are shown in

Input variable | Value |
---|---|

Total number of rules | 4 |

Total number of premises | 6 |

Number of rules that use one input variable | 2 |

Number of rules that use two input variables | 2 |

Number of rules that use three or more input variables | 0 |

Average number of labels defined by input variables | 2.5 |

Using the values given in

● Rule Base Dimension of the system: Low.

● Rule Base Complexity of the system: Low.

● Rule Base Interpretability of the system: Very high.

● Fuzzy index of the system (final output): Very high.

The fuzzy index of the designed FIS is evaluated as very high which makes the system highly interpretable and hence, easy to adapt by professional and academic staff in educational institutions.

Context adaptation was also experimented with by using the existing data of previous cohorts to predict the performance of a new cohort (see Section 4.3) during the early phase of a semester. The parameters

TP Rate | FP Rate | Precision | Recall | F-Measure | MCC | ROC Area | PRC Area | Class | |
---|---|---|---|---|---|---|---|---|---|

0.828 | 0.515 | 0.531 | 0.828 | 0.647 | 0.322 | 0.729 | 0.622 | Fail | |

0.485 | 0.172 | 0.800 | 0.485 | 0.604 | 0.322 | 0.729 | 0.765 | Pass | |

Weighted average | 0.626 | 0.314 | 0.689 | 0.626 | 0.622 | 0.322 | 0.729 | 0.705 |

Precision | Recall | TP Rate | FP Rate | F Score | Class |
---|---|---|---|---|---|

0.293478 | 0.823171 | 0.823171 | 0.056641 | 0.432692 | Fail |

0.865741 | 0.365234 | 0.365234 | 0.176829 | 0.513736 | Pass |

Considering the class imbalance present in the dataset, weighted averages of the performance measures are also reported, using the class size as a weight. It is clear from

This paper presented a comprehensive approach to provide an early answer to the million-dollar question in learning analytics: which students are likely to fail this course? Early intervention can put such students back on the path of success. This hybrid statistical fuzzy system identifies such students using their performance in the initial assessments of the course and a few other features through a learning decision tree and then generates a set of fuzzy rules. These rules are easy to understand, and this is quantified by measuring their interpretability using the fuzzy index. Finally, the FIS is context adapted and used to predict the likelihood of student success and failure in other courses, without any retraining being required. The performance of the system is high in terms of recall, the main parameter of success in this scenario.

Some of the limitations of the proposed system are due to the fact that the system is designed to identify students that are at risk of failing the course considering only the subset of assessments undertaken in the early weeks of the semester. Hence, the accuracy of the system is not very high when compared to other models based on data from all assessments of the course. The other limitation of the system arises from the requirement of high interpretability which necessitates keeping the decision tree simple.

The proposed system is an end-to-end solution to a key learning analytics problem and this work shall be extended by using the framework on several datasets to discover more features that contribute to student success and failure. Other learning algorithms will also be used, both supervised and unsupervised, in the classification stage of the framework and their performance and interpretability will be measured. Another research thread to be pursued in future is the human in the loop paradigm to make this framework even more accurate, interpretable, and sentient.