Beyond Cheating: Reshaping Undergraduate Assessment for Deep Learning in the Age of AI
Abstract
This study presents findings from semi-structured interviews with 13 undergraduate students across various years and majors, exploring their perceptions of evaluating higher education and exposure to Generative Artificial Intelligence (GenAI). The study finds that students appreciate authentic evaluations—activities that are most closely embedded in real-world contexts and foster higher-order skills, such as problem-solving and critical thinking. The findings indicate that GenAI use is not just a matter of academic fraud but is also directly related to test design. Low-level, routine tasks are apt to encourage unproductive use of GenAI as a replacement for studying, whereas more complex, higher-level assignments encourage its productive use as a thinking tool. Through connecting interview data to foundational education theories, this study proposes a set of principles for redesigning assessment. The intent is to strike a new balance between technological advancement and academic integrity, turning assessment into a strong tool for ongoing learning.
1. Introduction
GenAI has been integrated into higher education at an unprecedented rate since late 2022, profoundly impacting teaching, learning, and assessment. GenAI high capability to generate high-standard content, code, and creative output significantly undermines traditional evaluation models relying on memorization and replication of knowledge (United Nations Educational, Scientific and Cultural Organization [UNESCO], 2021). For example, if an assignment is just a reproduction of textbook theory or a superficial synthesis of facts, the students will most likely use GenAI tools to complete it in advance without undergoing the intended learning process. It goes against academic integrity and demands a rethinking of the overarching objective of assessment.
Despite much academic discourse and attempts to find technical or policy- based solutions (Humble, 2025), the students—the immediate Interviewees of the assessment process—have not had their views, preferences, and
tendencies regarding GenAI seriously explored. This study addresses this gap by using qualitative interviewing to discover rich insight into undergraduate students' views and needs on evaluation during the era of GenAI. It aims to unveil the inherent relationship between evaluation design and GenAI usage behaviors.
2. Research Methods
This study used semi-structured interviews to engage 13 students from various grades and majors, primarily from the University of Hong Kong with some from mainland Chinese universities. The interviews focused on core questions about their undergraduate assessment experiences and GenAI usage, including:
• What types of assessments do students like or dislike, and why?
• To what extent, how, and why do students use GenAI for their assignments?
• Do specific assessment types lead to more productive or unproductive uses of GenAI?
By exploring these questions, this report offers empirical contributions from the students' point of view, making practical recommendations to higher education institutions on how to evolve assessment in the GenAI era to promote deeper and more active learning.
3. ResearchFindings
3.1 Students' Preferences for Authentic Assessment
Student response suggests that students' evaluations of assessment are not based on any naive good or bad dichotomy but on a thorough analysis of their own subjective meaning and authenticity. They prefer to avoid superficial, isolated exercises which are perceived as abstracted away from everyday life, for example, superficial essays or drill-dominated exams.
Interviewee2, for instance, grumbled about an introductory economics class with a strictly timed exam, which he saw as focusing on an "abnormal" amount
of skills and putting enormous pressure to drill rather than really understand economic matters. Similarly, Interviewee6 found a closed-book, paper-and- pencil test of C language to be "inhumane," deeming tests that allow GenAI assistance preferable, which he sees as an inevitable trend that benefits programming principles more than memorized coding.
Students also expressed a dislike for assessments that include elements divorced from realistic human interactions. For example, many Interviewees voiced concerns about the random grouping in group projects, which they felt hindered effective collaboration. Interviewee 5 countered the common professorial justification that random groups simulate the workplace, arguing that real-world professionals have the autonomy to choose their colleagues. He suggested that academic group assignments should similarly prioritize collaboration based on shared goals or interests.
These findings align with the concept of authentic assessment, which advocates for tasks that require students to apply knowledge and judgment to real-world problems and situations (Wiggins, 2011). Interviewees shared several positive experiences with such assessments. Interviewee2 found a business negotiation course particularly engaging when it involved a simulated negotiation with a German class. He felt this format was both interesting and challenging because it was aligned with the course content and used unfamiliar Interviewees, encouraging greater effort and leading to significant learning. Another student, Interviewee8, valued a major course that required his team to independently develop a new medical tool. Despite a heavy workload, he found the project-based, problem-solving format to be highly meaningful, as it provided ample opportunity for exploration, reflection, and comprehensive skill development.
Students also showed an interest in using GenAI itself as a subject for authentic assessment. For instance, Interviewee 12 mentioned an AI ethics issue in a course about varying racial recognition accuracy rates, a topic that required critical consideration. Interviewee 3, driven by personal interest, independently chose to research whether GenAI-generated content falls under Chinese copyright law, leading him to explore the different logics behind image generation and the roles of trainers, producers, and users of AI. This kind of
exploration, driven by genuine interest in contemporary issues, demonstrates a positive student attitude toward academic challenges that are both relevant and engaging.
3.2 High Acquisition Requires Effective Feedback
Educational theory often distinguishes between summative and formative assessments (Taras, 2005). Summative assessments, such as standardized tests and final papers, primarily summarize learning outcomes and assign grades for accountability. However, this approach often has a negative backwash effect, where assessment content dictates teaching and learning, causing students and teachers to focus on test-taking rather than on skill development and knowledge construction (Carless, 2011).
Interviewee1, for example, described the pressure of multiple assessments clustered within a short two-week period, which forces students to "fight deadlines" and prevents them from starting work early. Furthermore, this focus on grades over learning outcomes can stifle reflection and improvement. Interviewee8 criticized the practice of not releasing final exam answers, which he felt discouraged students from seeking understanding. Students in this situation often resort to using GenAI to "guess" answers to past papers, a multi- platform process that, while seemingly promoting critical thinking, still lacks reliable validation.
In contrast, formative assessments are designed to promote learning. Many students noted that timely and accurate feedback from instructors helped them better understand course content and boosted their confidence. Interviewee3 explained how receiving feedback helped him identify areas for improvement and understand the specific preferences of different instructors, which he found crucial for both a deeper understanding and for achieving higher grades.
Feedback from peers and external critics was also highly valued. Interviewee13, an architecture student, praised his program's small exhibitions where students, teachers from other grades, and professional architects provide feedback. He found peer critiques insightful and valued the professional, grade-independent
advice from invited architects. However, he noted that this process was only effective when the feedback was convincing and well-considered. In a different course, he found peer and instructor evaluations unconvincing due to a lack of shared context and mid-course evaluation, leading to a superficial grading process.
GenAI has also emerged as a tool for students to get immediate, private quasi- feedback, such as grammar checks, structural suggestions, and code corrections. However, this feedback can be inaccurate or superficial, requiring students to develop the ability to critically filter and verify the information. This shift necessitates a new role for educators, moving beyond being the sole feedback provider to fostering students' feedback literacy (Carless & Boud, 2018). Teachers need to guide students on how to effectively use GenAI, recognize its limitations, and integrate its "feedback" with human feedback to deepen learning.
3.3 Assessment Type and GenAI's Impact on Learning Autonomy
An in-depth analysis reveals a causal relationship between assessment design and GenAI usage patterns. When assessments are low-level and lack inherent challenge, student motivation decreases. If tasks are perceived as "meaningless," students are inclined to use GenAI as a "ghostwriting" tool, directly generating outlines, code, or answers. This leads to unproductive use and shortcuts the learning process. For example, Interviewee6 described how a university's poorly implemented mandatory advanced AI course led to non- computer science students buying assignments from others. With the help of GenAI, a single student could mass-produce over 20 qualified assignments in three days. Similarly, Interviewee10, facing immense academic pressure from taking 15 courses in one semester, used GenAI to generate essays for less important classes.
Students are often aware of the impropriety of using AI as a shortcut, but they rationalize its extensive use when faced with poorly designed, unproductive assessments.
Conversely, when assessments are higher-order and authentic, GenAI cannot simply provide a perfect answer. In these cases, students are more willing to engage in independent exploration, and GenAI becomes a cognitive partner that enhances thinking and efficiency, enabling productive use. For instance, Interviewee9 used a Transformer AI model to predict patient readmission rates in a machine learning course, converting diverse data types into model- readable formats. Similarly, in a storytelling course, Interviewee2's teacher encouraged the use of GenAI for visual production and storyboards, which the student felt helped them more vividly showcase their sci-fi ideas.
These results align with the principles of constructivist learning theory, asserting that knowledge is actively built by learners as they engage with their surroundings (Vollmers, 1997). In this context, technology like GenAI is not just an information transmission tool but a cognitive tool that assists in knowledge construction (Díaz & Delgado, 2024). Therefore, rather than focusing on ineffective measures to detect AI use, educators should fundamentally reshape assessment design. The focus should be on creating authentic, challenging, and meaningful tasks that guide students toward a synergistic cognitive relationship with AI, rather than passive dependence.
4. Specific Recommendations for Assessment Redesign
Bloom's Taxonomy provides a framework for cognitive processes, from low- level tasks like Remembering to high-level ones like Evaluating and Creating (Bloom, Engelhart, Furst, Hill, & Krathwohl, 1956). GenAI can easily handle low- level tasks, such as generating summaries or providing facts. This makes it essential for assessment design to shift focus to tasks requiring higher-order thinking, where GenAI can only serve as a supporting tool, not a replacement for cognitive processes(Díaz & Delgado, 2024).
Based on the interview findings and theoretical reflections, this report proposes the following specific recommendations for assessment redesign:
4.1 Shift Toward "Authenticity-Oriented" Assessment Design
• Design tasks that require students to solve real-world problems, such as
case studies, project reports, or policy briefs. This allows students to apply knowledge in complex, uncertain contexts.
• Create assessments that stimulate student interest and a sense of purpose, helping them gain positive feedback from problem-solving.
• Acknowledge the widespread use of AI tools and focus on assessing students' ability to utilize these tools to produce innovative and productive outcomes.
4.2 Reshape the Teaching Relationship with "Feedback Literacy"
• Collaborate with students to clarify the acceptable boundaries for GenAI use in assignments and establish related policies.
• Integrate formative assessment, peer evaluation, and self-assessment to cultivate students' metacognitive abilities and judgment.
4.3 Enhance Students' Technological Competency
• Employ GenAI as a thinking device by designing specific tasks to teach students prompt engineering and information validation. This enhances their higher-order thinking abilities.
• Ask students to include an "AI use statement" with their assignments, detailing how they utilized GenAI in the assignment.
5. Conclusion
This study, underpinned by undergraduate experience, offers important new findings regarding GenAI-era higher education evaluation. The study demonstrates that students engage critically with the meaning and authenticity of evaluations, rather than receiving them passively. A clear association between assessment design and implementation of GenAI exists: low-level, synthetic tasks result in counterproductive, substitutive AI application, while challenging, real-world tasks enable productive AI co-work.
This paper contends that GenAI is a "stress test" revealing the vulnerabilities of traditional test models—most especially those that highly value memorization
and summative assessment. Rather than seeing AI as a threat to academic integrity, teachers should embrace it as a force of change. The real test is not in constraining technology, but in designing assessments so engaging that they form a synergy between learners and AI, promoting deeper learning and critical thinking—preparing students not for tests, but for a future defined by human- technology collaboration.
References
Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956).Taxonomy of educational objectives: The classification of educational goals. Handbook I: Cognitive domain. David McKay Company.
Carless, D. (2011). From testing to productive student learning : implementing formative assessment in confucian-heritage settings. Routledge.
Carless, D., & Boud, D. (2018). The development of student feedback literacy: Enabling uptake of feedback. Assessment & Evaluation in Higher Education, 43(8), 1315–1325.
Díaz, B., & Delgado, C. (2024). Artificial intelligence: Tool or teammate? Journal of Research in Science Teaching, 61(10), 2575–2584. https://doi.org/10.1002/tea.21993
Humble, N. (2025). Higher Education AI Policies: A Document Analysis of University Guidelines. European Journal of Education, 60(3). https://doi.org/10.1111/ejed.70214
Taras, M. (2005). ASSESSMENT - SUMMATIVE AND FORMATIVE - SOME THEORETICAL REFLECTIONS. British Journal of Educational Studies, 53(4), 466–478. https://doi.org/10.1111/j.1467-8527.2005.00307.x
United Nations Educational, Scientific and Cultural Organization. (2021). AI and education: Guidance for policy-makers. UNESCO. https://unesdoc.unesco.org/ark:/48223/pf0000376709
Vollmers, B. (1997). Learning by doing - Piaget’s constructivist learning theories and their consequences for pedagogical practice. International Review of Education, 43(1), 73–85.
Wiggins, G. (2011). Kappan Classic: A True Test: Toward more authentic and equitable assessment. Phi Delta Kappan, 92(7), 81.