Presents corrections to author names in the paper, “Jutge.org: Characteristics and experiences,” IEEE Trans. Learn. Technol., vol. 11, no. 3, pp. 321–333, Jul.–Sep. 2018.
Most educational institutions adopted the hybrid teaching mode through learning management systems. The logging data/clickstream could describe learners' online behavior. Many researchers have used them to predict students' performance, which has led to a diverse set of findings, but how to use insights from captured data to enhance learning engagement is an open question. Furthermore, identifying students at risk of failure is only the first step in truly addressing this issue. It is important to create actionable predictive model in the real-world contexts to design interventions. In this paper, we first extracted features from students' learning activities and study habits to predict students' performance in the Kung Fu style competency education. Then, we proposed a TrAdaBoost-based transfer learning model, which was pretrained using the data of the former course iteration and applied to the current course iteration. Our results showed that the generalization ability of the prediction model across the teaching iterations is high, and the model can achieve relatively high precision even when the new data are not sufficient to train a model alone. This work helped in timely intervention toward the at-risk students. In addition, two intervention experiments with split-test were conducted separately in Fall 2017 and Summer 2018. The statistical tests showed that both behavior-based reminding intervention and error-related recommending intervention that based on early prediction played a positive role in improving the blended learning engagement.
From Lab to Production: Lessons Learnt and Real-Life Challenges of an Early Student-Dropout Prevention System
This paper presents the work done to support student dropout risk prevention in a real online e-learning environment: A Spanish distance university with thousands of undergraduate students. The main goal is to prevent students from abandoning the university by means of retention actions focused on the most at-risk students, trying to maximize the effectiveness of institutional efforts in this direction. With this purpose, we generated predictive models based on the C5.0 algorithm using data from more than 11,000 students collected along five years. Then, we developed SPA (Sistema de Predicciæn de Abandono, dropout prediction system in Spanish), an early warning system that uses these models to generate static early dropout-risk predictions and dynamic periodically updated ones. It also supports the recording of the resulting retention-oriented interventions for further analysis. SPA is in production since 2017 and is currently in its fourth semester of continuous use. It has calculated more than 117,000 risk scores to predict the dropout risk of more than 5700 students. About 13,000 retention actions have been recorded. The white-box predictive models used in production provided reasonably good results, very close to those obtained in the laboratory. On the way from research to production, we faced several challenges that needed to be effectively addressed in order to be successful. In this paper, we share the challenges faced and the lessons learnt during this process. We hope this helps those who wish to cross the road from predictive modeling with potential value to the exploitation of complete dropout prevention systems that provide sustained value in real production scenarios.
An Early Feedback Prediction System for Learners At-Risk Within a First-Year Higher Education Course
Identifying at-risk students as soon as possible is a challenge in educational institutions. Decreasing the time lag between identification and real at-risk state may significantly reduce the risk of failure or disengage. In small courses, their identification is relatively easy, but it is impractical on larger ones. Current Learning Management Systems store a large amount of data that could help to generate predictive models to early identification of students in online and blended learning. The contribution of this paper is twofold: First, a new adaptive predictive model is presented based only on students' grades specifically trained for each course. A deep analysis is performed in the whole institution to evaluate its performance accuracy. Second, an early warning system is developed, focusing on dashboards visualization for stakeholders (i.e., students and teachers) and an early feedback prediction system to intervene in the case of at-risk identification. The early warning system has been evaluated in a case study on a first-year undergraduate course in computer science. We show the accuracy of the correct identification of at-risk students, the students' appraisal, and the most common factors that lead to at-risk level.
Developing tools to support students and learning in a traditional or online setting is a significant task in today's educational environment. The initial steps toward enabling such technologies using machine learning techniques focused on predicting the student's performance in terms of the achieved grades. However, these approaches do not perform as well in predicting poor-performing students. The objective of our work is twofold. First, in order to overcome this limitation, we explore if poorly performing students can be more accurately predicted by formulating the problem as binary classification, based on data provided before the start of the semester. Second, in order to gain insights as to which are the factors that can lead to poor performance, we engineered a number of human-interpretable features that quantify these factors. These features were derived from the students' grades from the University of Minnesota, an undergraduate public institution. Based on these features, we perform a study to identify different student groups of interest, while at the same time, identify their importance. As the resulting models provide us with different subsets of correct predictions, their combination can boost the overall performance.
In the European academic systems, the public funding to single universities depends on many factors, which are periodically evaluated. One of such factors is the rate of success, that is, the rate of students that do complete their course of study. At many levels, therefore, there is an increasing interest in being able to predict the risk that a student will abandon the studies, so that (specific, personal) corrective actions may be designed. In this paper, we propose an innovative temporal optimization model that is able to identify the earliest moment in a student's career in which a reliable prediction can be made concerning his/her risk of dropping out from the course of studies. Unlike most available models, our solution can be based on the academic behavior alone, and our evidence suggests that by ignoring classically used attributes such as the gender or the results of pre-academic studies one obtains more accurate, and less biased, models. We tested our system on real data from the three-year degree in computer science offered by the University of Ferrara (Italy).
Educational data mining has gained a lot of attention among scientists in recent years and constitutes an efficient tool for unraveling the concealed knowledge in educational data. Recently, semisupervised learning methods have been gradually implemented in the educational process demonstrating their usability and effectiveness. Cotraining is a representative semisupervised method aiming to exploit both labeled and unlabeled examples, provided that each example is described by two features views. Nevertheless, it is yet to be used in various scientific fields, among which the educational field as well, since the assumption about the existence of two feature views cannot be easily put into practice. Within this context, the main purpose of this study is to evaluate the efficiency of a proposed cotraining method for early prognosis of undergraduate students' performance in the final examinations of a distance course based on a plethora of attributes which are naturally divided into two distinct views, since they are originated from different sources. More specifically, the first view consists of attributes regarding students' characteristics and academic achievements which are manually filled out by their tutors, whereas the second one consists of attributes tracking students' online activity in the course learning management system and which are automatically recorded by the system. The experimental results demonstrate the superiority of the proposed cotraining method as opposed to state-of-the-art semisupervised and supervised methods.
Early warning systems have been progressively implemented in higher education institutions to predict student performance. However, they usually fail at effectively integrating the many information sources available at universities to make more accurate and timely predictions, they often lack decision-making reasoning to motivate the reasons behind the predictions, and they are generally biased toward the general student body, ignoring the idiosyncrasies of underrepresented student populations (determined by socio-demographic factors such as race, gender, residency, or status as a freshmen, transfer, adult, or first-generation students) that traditionally have greater difficulties and performance gaps. This paper presents a multiview early warning system built with comprehensible Genetic Programming classification rules adapted to specifically target underrepresented and underperforming student populations. The system integrates many student information repositories using multiview learning to improve the accuracy and timing of the predictions. Three interfaces have been developed to provide personalized and aggregated comprehensible feedback to students, instructors, and staff to facilitate early intervention and student support. Experimental results, validated with statistical analysis, indicate that this multiview learning approach outperforms traditional classifiers. Learning outcomes will help instructors and policy-makers to deploy strategies to increase retention and improve academics.
Blended courses that mix in-person instruction with online platforms are increasingly common in secondary education. These platforms record a rich amount of data on students' study habits and social interactions. Prior research has shown that these metrics are correlated with students performance in face-to-face classes. However, predictive models for blended courses are still limited and have not yet succeeded at early prediction or cross-class predictions, even for repeated offerings of the same course. In this paper, we use data from two offerings of two different undergraduate courses to train and evaluate predictive models of student performance based on persistent student characteristics including study habits and social interactions. We analyze the performance of these models on the same offering, on different offerings of the same course, and across courses to see how well they generalize. We also evaluate the models on different segments of the courses to determine how early reliable predictions can be made. This paper tells us in part how much data is required to make robust predictions and how cross-class data may be used, or not, to boost model performance. The results of this study will help us better understand how similar the study habits, social activities, and the teamwork styles are across semesters for students in each performance category. These trained models also provide an avenue to improve our existing support platforms to better support struggling students early in the semester with the goal of providing timely intervention.
A Quest for a One-Size-Fits-All Neural Network: Early Prediction of Students at Risk in Online Courses
A significant amount of research effort has been put into finding variables that can identify students at risk based on activity records available in learning management systems (LMS). These variables often depend on the context, for example, the course structure, how the activities are assessed or whether the course is entirely online or a blended course. To the best of our knowledge, a predictive model that can generalize well to many different types of courses using data available in the LMS does not currently exist in the learning analytics literature. In this study, early prediction of students at risk is tackled by training a number of neural networks to predict which students would likely submit their assignments on time based on their activity up to two days before assignments' due dates. Five different datasets that cover a total of 78 722 student enrolments in 5487 courses have been used in this study. In order to improve how well the neural networks generalize, our networks can perform different forms of feature engineering using course peers data. The different architectures of these networks have been compared to find the one with more predictive power. To validate the models trained from the networks, both new datasets and unseen examples extracted from the same datasets have been used for training. Our research show that adding contextual information results in better prediction accuracies and F1 scores. Our networks are able to give predictions with accuracies in the 67.46-81.63% range and F1 scores in the 71.30-83.09% range.
The increased usage of computer-based learning platforms and online tools in classrooms presents new opportunities to not only study the underlying constructs involved in the learning process, but also use this information to identify and aid struggling students. Many learning platforms, particularly those driving or supplementing instruction, are only able to provide aid to students who interact with the system. With this in mind, student persistence emerges as a prominent learning construct contributing to students success when learning new material. Conversely, high persistence is not always productive for students, where additional practice does not help the student move toward a state of mastery of the material. In this paper, we apply a transfer learning methodology using deep learning and traditional modeling techniques to study high and low representations of unproductive persistence. We focus on two prominent problems in the fields of educational data mining and learner analytics representing low persistence, characterized as student “stopout,” and unproductive high persistence, operationalized through student “wheel spinning,” in an effort to better understand the relationship between these measures of unproductive persistence (i.e., stopout and wheel spinning) and develop early detectors of these behaviors. We find that models developed to detect each within and across-assignment stopout and wheel spinning are able to learn sets of features that generalize to predict the other. We further observe how these models perform at each learning opportunity within student assignments to identify when interventions may be deployed to best aid students who are likely to exhibit unproductive persistence.
Performance prediction is a leading topic in learning analytics research due to its potential to impact all tiers of education. This study proposes a novel predictive modeling method to address the research gaps in existing performance prediction research. The gaps addressed include: the lack of existing research focus on performance prediction rather than identifying key performance factors; the lack of common predictors identified for both K-12 and higher education environments; and the misplaced focus on absolute engagement levels rather than relative engagement levels. Two datasets, one from higher education and the other from a K-12 online school with 13 368 students in more than 300 courses, were applied using the predictive modeling technique. The results showed the newly suggested approach had higher overall accuracy and sensitivity rates than the traditional approach. In addition, two generalizable predictors were identified from instruction-intensive and discussion-intensive courses.
The papers in this special section focus on early prediction and the support of learning performance. Predicting student's learning performance in traditional face-to-face learning, online learning (LMS, MOOCs, etc.), and blended learning is a challenging but essential task in education . On the one hand, it has become a difficult challenge due to the high number of factors that can influence a student’s final status. On the other hand, it is a critical issue in education because it concerns many students of all levels (primary education, secondary education, and tertiary or higher education) and institutions over the entire world. Moreover, also, an increase in the number of low performing students can cause a lower graduation rate, an inferior institution reputation in the eyes of all involved, and it usually results in overall financial loss. The task of predicting students’ performance is one of the oldest and most studied tasks in Educational Data Mining (EDM) and Learning Analytics (LA), and a wide range of classification and regression approaches have been successfully applied.
Presents a listing of the editorial board, board of governors, current staff, committee members, and/or society editors for this issue of the publication.
Presents the table of contents for this issue of the publication.