Why universities need to radically rethink exams in the age of AI

-


Since the launch of the chatbot ChatGPT in late 2022, educators have been grappling with how to harness artificial intelligence to enhance learning while minimizing risks to educational outcomes and the fairness of assessments.

AI use among students is now the norm. In February, a survey1 of more than 1,000 full-time UK undergraduates found that 92% use AI in some form, up from 66% in 2024. And 88% of students reported relying on generative AI (a form of AI that can create text, images and code from vast data sets) to support their academic coursework, compared with 53% in 2024.

As AI continues to outperform humans in basic tasks such as reading comprehension and computer programming2, concerns have been mounting about its impact on learning and academic integrity. For example, the value of conventional essays and other written assessments is increasingly in doubt, given that AI can now produce writing that often surpasses the quality of most student work.

Other concerns include an over-reliance on chatbots leading to superficial learning3, reduced opportunities for self-reflection4 and a loss of student agency5, with students becoming passive users of technology rather than active learners.

Universities have responded by using tools to try to detect student use of generative AI. But these have proven to be unreliable6. This has led to short-term fixes such as ‘stress-testing’ written assessments1 and replacing them with oral examinations, handwritten tests or reflective formats (portfolios and journals; see go.nature.com/43btcxf), as well as clearer guidelines on when AI can and cannot be used. Although these measures help, their effectiveness is limited.

Instead, a fundamental rethink of learning and assessment is needed. Here, we highlight three promising approaches to examination that adapt existing methods — such as conversation-based assessments — to the AI era. These strategies aim to foster genuine intellectual development while ensuring that evaluations accurately reflect students’ understanding and skills.

Use other types of assessment

One of the cornerstones of modern education is that ‘writing is thinking’7. Writing is a non-linear process8 that requires authentic engagement, critical thinking and problem-solving. All of these activities stimulate human intellectual development.

When AI assists with or generates student texts, however, it becomes nearly impossible to know how much of the final work reflects the student’s own understanding and critical thinking (see go.nature.com/47tjv93). This uncertainty undermines the use of writing as evidence of learning.

Having a student and teacher follow a structured conversation is one way to enable critical thinking. For example, the Socratic questioning method is a form of disciplined inquiry that helps students to work through complex ideas, question their assumptions and judge the validity of information. In ancient Greece, the value placed on intellectual dialogue was so great that some philosophers at the time expressed concern that an over-reliance on writing could weaken human memory (see go.nature.com/43grxsp).

A large group of students taking an exam at desks facing away from the viewer.

Conventional exams can still have a place alongside AI-based assessment.Credit: Jorge Gil/Europa Press via Getty

A contemporary version of the discourse-based approach used in ancient Greece, known as conversation-based assessment, has been used for several decades in primary, secondary and university education settings. For example, AutoTutor, developed at the University of Memphis in Tennessee, has been used to teach subjects such as Newtonian physics while improving skills in computer literacy and critical thinking9. It engages students in natural-language conversations and uses computational techniques to gauge their understanding — analysing factors such as accuracy, choice of words and the time taken for a response. However, such systems usually have limited conversational capabilities and still rely mostly on simple text analyses and the detection of specific words and expressions.

This is where the integration of AI can be a game changer. AI can sustain open-ended, context-sensitive dialogue in a much more realistic manner than current conversation-based assessment methods can. AI tools can ask students follow-up questions, provide tailored hints and adapt to a student’s level of knowledge in real time, providing flexible and personalized learning support. And its questioning can range more widely than that of conventional conversational assessment systems, which are usually specialized to a particular domain.

The crucial opportunity of AI is not merely to automate question-answering, but to enable students to learn through conversation with AI systems and to use that dialogue as a form of assessment, making it a dynamic and personalized process.

Challenges remain. First, AI systems will need to guide conversations in a balanced way — encouraging students to ask questions, explore topics that interest them and take an active role in their learning. At the same time, the dialogue must be structured enough for the AI system to gather meaningful evidence of the student’s understanding, such as how they reason through a problem, explain a concept or apply knowledge in context. Achieving this balance between open-ended exploration and measurable assessment remains a major research challenge.

Miscommunication is another concern — AI systems might misunderstand a student’s intent or provide inaccurate or misleading information. When this happens, students can struggle to identify the sources of their errors. The highly personalized and open-ended nature of AI-based learning and assessment would also make standardization difficult. So, there would still be a place for conventional assessments, especially in the university admissions process, in which consistency and fairness across large student populations is a priority.

Assess continuously

One crucial issue with many proposed responses to widespread student adoption of AI is that, although they attempt to safeguard academic integrity, they continue to operate within a high-stakes exam model. Even if the exam is reframed as a conversation, students remain aware that the outcome carries a lot of weight. Students find high-stakes exams stressful and might underperform or be tempted to cheat. The key challenge, then, is to reduce the need for high-stakes exams in an AI-fuelled era in which cheating might become easier.

Continuous assessment can be an effective alternative10. Replacing end-of-term exams with a series of interconnected assessments that build a comprehensive picture of student learning is urgently required in many academic fields. Continuous assessment is well established in medical education. For example, during clinical rotations, medical students are continually assessed by supervisors who observe their clinical reasoning, teamwork skills and communication with patients. These observations, combined with written reflections and peer evaluations, create a holistic picture of the student’s competence over time. However, such models remain rare in other disciplines, mainly because of the increased workload they place on educators.

The growing availability of AI-based systems makes continuous assessment more feasible. Conversations between students and an AI tool can be seen not as one-off exchanges, but as part of an ongoing learning process in which multiple low-stakes interactions gradually build a rich picture of student progress10.

The main challenge lies in ensuring that AI systems can track and analyse this learning progression effectively. Current general-purpose tools such as ChatGPT, Gemini and Copilot are not designed for this purpose — they do not analyse students’ responses over time to identify growth or persistent misconceptions. To truly support continuous assessment, there is a pressing need for learning-oriented AI platforms that can capture longitudinal data on student performance, provide meaningful insights into learning trajectories and integrate seamlessly into the design of courses and programmes.



Source link

Latest news

Melinda French Gates on Secrets: ‘Live a Truthful Life, Then You Don’t Have Any’

Take it from me: Spending an hour with Melinda French Gates will restore at least an iota of...

Your iPhone Already Has iPhone Fold Software, but Apple Won’t Let You Use It

The hack relies on an exploit that tricks the iPhone’s operating system into thinking it’s running on an...

DOGE Isn’t Dead. Here’s What Its Operatives Are Doing Now

To one member of Elon Musk’s so-called Department of Government Efficiency, the last few months have been “crazy.”...

Great science happens in great teams — research assessments must try to capture that

The scientific enterprise has long been driven by strong-minded people. Institutions, funding and recognition are still organized...

Will blockbuster obesity drugs revolutionize addiction treatment?

Last April, neuroscientist Sue Grigson received an e-mail from a man detailing his years-long struggle to kick...

The Last Minute Cyber Monday Deals on the Best Windows Laptops, MacBooks, and Chromebooks

It's almost the end of the year, and with Cyber Monday upon us, it's a great time to...

Must read

You might also likeRELATED
Recommended to you