1. Introduction
The core goal of smart education is to promote learners’ deep learning and holistic development, and learners’ psychological states are a key factor influencing the effectiveness of deep learning. In digital learning contexts, learners are highly susceptible to negative psychological states such as learning anxiety, mental fatigue, and lack of concentration due to factors including an accelerated learning pace, changes in interaction patterns, and increased demands for self-directed learning. If these states cannot be identified and addressed in a timely manner, they may seriously affect learners’ engagement and learning outcomes. However, current research on monitoring learners’ psychological states in smart education still has many shortcomings. First, monitoring methods mostly rely on single-modal behavioral data and learners’ subjective self-reports, making it difficult to comprehensively and objectively represent the complex psychological states of learners. Second, targeted monitoring systems have not been developed in accordance with the diverse learning scenarios of smart education, resulting in a lack of contextual adaptability in data collection and analysis and greatly reducing the accuracy of monitoring results. Third, most monitoring efforts remain at the level of shallow data processing and state recognition, making it impossible to achieve real-time perception and proactive early warning of negative psychological states, and thus difficult to connect with subsequent educational intervention.

Fig. 1
The development of multimodal data fusion technologies and artificial intelligence algorithms has provided technical possibilities for addressing these problems. Compared with unimodal data, multimodal data can comprehensively capture learners’ external characteristics from multiple dimensions, including physiological, behavioral, and environmental aspects, thereby enabling more accurate inference of their implicit psychological states.
Based on Triadic Reciprocal Determinism and Embodied Cognition Theory, this study integrates three categories of multimodal data – individual physical data, interaction behavior data, and intelligent environment data – and, in combination with typical learning scenarios in smart education, constructs a model for monitoring and intelligent early warning of learners’ psychological states. The study aims to achieve precise, real-time monitoring and graded early warning of learners’ negative psychological states, provide new ideas and methods for safeguarding learners’ mental health in smart education, and promote the development of smart education toward greater personalization and refinement.
2. Theoretical Foundations and Classification of Multimodal Data
2.1. Core Theoretical Foundations
The theoretical foundations of this study are mainly Triadic Reciprocal Determinism and Embodied Cognition Theory, both of which provide important theoretical support for the classification of multimodal data and the identification of dimensions for psychological state monitoring.
Triadic Reciprocal Determinism was proposed by Bandura. This theory holds that human learning and development in social contexts are influenced by the interaction among individual factors, behavior, and environment, which together form a dynamic interactive system. In smart education contexts, learners’ psychological states, as core individual characteristics, do not emerge or change in isolation; rather, they are closely related to learners’ learning behaviors and the intelligent learning environments in which they are situated.
2.2. Classification of Multimodal Data in Smart Education
Based on Triadic Reciprocal Determinism and Embodied Cognition Theory, and in combination with the contextual characteristics of smart education, the multimodal data used to monitor learners’ psychological states can be classified into three categories: individual physical data, interaction behavior data, and intelligent environment data. These three types of data complement one another and jointly form a comprehensive representation system of learners’ psychological states.
Individual physical multimodal data are the most direct source of information for reflecting learners’ psychological states, mainly including physiological indicator data and individual movement data. Physiological indicator data include measures such as electrodermal activity, electroencephalographic changes, and heart rate variation, which can accurately capture physiological responses caused by changes in psychological states, such as accelerated heart rate and enhanced electrodermal responses under anxiety.

Fig. 2
3. Existing Problems in Monitoring Learners’ Psychological States in Smart Education
Although research on monitoring learners’ psychological states in the field of smart education has made certain progress, it still faces many urgent challenges due to limitations in technical methods, scenario adaptability, and data-processing capacity. These problems are mainly reflected in the following three aspects.
First, the sources of monitoring data remain relatively single, and multidimensional integration is insufficient. Existing monitoring approaches mainly rely on a single type of behavioral data, such as online learning duration or the number of video views, or on learners’ subjective self-report scales. Such methods are insufficient to comprehensively represent learners’ psychological states. On the one hand, single-source behavioral data are superficial in nature and fail to reveal the psychological motivations underlying the observed behaviors. For example, the same amount of video-watching time may correspond to different levels of attention. On the other hand, subjective self-report scales are characterized by time lag and subjectivity. Learners may fail to accurately reflect their true psychological states due to cognitive bias or psychological defense mechanisms, which reduces the objectivity and accuracy of the monitoring results.
Second, there is a lack of scenario adaptability, resulting in a homogenized monitoring system. Smart education includes a variety of typical learning scenarios, such as teacher instruction, group discussion, online self-directed learning, and practical or experimental activities. Learners’ behaviors, interaction patterns, and manifestations of psychological states differ significantly across these scenarios. However, existing monitoring studies have not developed targeted monitoring systems based on scenario-specific characteristics. Instead, they tend to adopt homogeneous monitoring indicators and data-collection methods.
4. Practical Value and Future Prospects of the Model
The multimodal data-driven model for monitoring learners’ psychological states and providing intelligent early warning, as constructed in this study, offers a novel technological pathway and practical paradigm for safeguarding learners’ mental health in smart education. It demonstrates significant theoretical and practical value.
At the theoretical level, the model integrates triadic reciprocal determinism, embodied cognition theory, educational psychology, and artificial intelligence technologies, thereby enriching the theoretical framework of smart education and learner-state monitoring. It also provides a valuable theoretical reference for the application of multimodal data in the field of educational psychology.
At the practical level, the model overcomes the limitations of traditional psychological state monitoring approaches, which rely on single-modality data and subjective self-report measures [1, p. 1035]. It enables scenario-based, precise, and real-time monitoring of learners’ psychological states. Furthermore, the model establishes a graded early-warning mechanism and a closed-loop feedback system, effectively linking monitoring outcomes with educational interventions. This allows for timely identification and mitigation of learners’ negative psychological states, thereby promoting mental well-being and enhancing learning engagement as well as deep learning outcomes.
In addition, the model provides teachers with a visualized interface for monitoring learners’ psychological states, enabling them to gain a comprehensive understanding of learners’ overall learning conditions. This supports the implementation of personalized instructional adjustments and facilitates the transformation of smart education from “data-driven” approaches to “precision-oriented education.”

.png&w=640&q=75)