Outcomes – Numeracy

Research synthesis is AI-generated, human reviewed. Updated 05/2026.

Displaying 91 - 120 of 224

Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study

BENJAMIN J. WALKER, NIKOLETA KALAYDZHIEVA, BEATRIZ NAVARRO LAMEDA, RUTH A. REYNOLDS. (09/2025). arXiv. http://arxiv.org/pdf/2509.13359v3
Comparing RAG and GraphRAG for Page-Level Retrieval Question Answering on Math Textbook

Eason Chen, Chuangji Li, Shizhuo Li, Zimo Xiao, Jionghao Lin, Kenneth R. Koedinger. (09/2025). arXiv. http://arxiv.org/pdf/2509.16780v2
Generative AI alone may not be enough: Evaluating AI Support for Learning Mathematical Proof

Eason Chen, Sophia Judicke, Kayla Beigh, Xinyi Tang, Zimo Xiao, Chuangji Li, Shizhuo Li, Reed Luttmer, Shreya Singh, Maria Yampolsky, Naman Parikh, Yi Zhao, Meiyi Chen, Scarlett Huang, Anishka Mohanty, Gregory Johnson, John Mackey, Jionghao Lin, Ken Koedinger. (09/2025). arXiv. http://arxiv.org/pdf/2509.16778v1
Gen AI In Proof-Based Math Courses: A Pilot Study

Hannah Klawa, Shraddha Rajpal, Cigole Thomas. (09/2025). arXiv. http://arxiv.org/pdf/2509.13570v1
MathBuddy: A Multimodal System for Affective Math Tutoring

Debanjana Kar, Leopold B¬öss, Dacia Braca, Sebastian Maximilian Dennerlein, Nina Christine Hubig, Philipp Wintersberger, Yufang Hou. (08/2025). arXiv. http://arxiv.org/pdf/2508.19993v1
MAB Optimizer for Estimating Math Question Difficulty via Inverse CV without NLP

Surajit Das, Gourav Roy, Aleksei Eliseev, Ram Kumar Rajendran. (08/2025). arXiv. http://arxiv.org/pdf/2508.19014v1
Who Is Lagging Behind: Profiling Student Behaviors with Graph-Level Encoding in Curriculum-Based Online Learning Systems

Qian Xiao, Conn Breathnach, Ioana Ghergulescu, Conor O'Sullivan, Keith Johnston, Vincent Wade. (08/2025). arXiv. http://arxiv.org/pdf/2508.18925v1
Explainable AI for Predicting and Understanding Mathematics Achievement: A Cross-National Analysis of PISA 2018

Liu Liu, Dai Rui. (08/2025). arXiv. http://arxiv.org/pdf/2508.16747v1
Alvorada-Bench: Can Language Models Solve Brazilian University Entrance Exams?

Henrique Godoy. (08/2025). arXiv. http://arxiv.org/pdf/2508.15835v1
Mathematical Computation and Reasoning Errors by Large Language Models

Liang Zhang, Edith Aurora Graf. (08/2025). arXiv. http://arxiv.org/pdf/2508.09932v2
Aryabhata: An exam-focused language model for JEE Math

Ritvik Rastogi, Sachin Dharashivkar, Sandeep Varma. (08/2025). arXiv. http://arxiv.org/pdf/2508.08665v2
CODAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation

Shuzhou Yuan, William LaCroix, Hardik Ghoshal, Ercong Nie, Michael F¬ärber. (08/2025). arXiv. http://arxiv.org/pdf/2508.08386v1
Automated Generation Of Curriculum-Aligned Multiple-Choice Questions For Malaysian Secondary Mathematics Using Generative AI

Rohaizah Abdul Wahid, Muhamad Said Nizamuddin Nadim, Suliana Sulaiman, Syahmi Akmal Shaharudin, Muhammad Danial Jupikil, Iqqwan Jasman Su Azlan Su. (08/2025). arXiv. http://arxiv.org/pdf/2508.04442v1
From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation

Chengliang Zhou, Mei Wang, Ting Zhang, Qiannan Zhu, Jian Li, Hua Huang. (08/2025). arXiv. http://arxiv.org/pdf/2508.10005v1
A Mixed User-Centered Approach to Enable Augmented Intelligence in Intelligent Tutoring Systems: The Case of MathAlde app

Guilherme Guerino, Luiz Rodrigues, Luana Bianchini, Mariana Alves, Marcelo Marinho, Thomaz Veloso, Valmir Macario, Diego Dermeval, Thales Vieira, Ig Bittencourt, Seiji Isotani. (08/2025). arXiv. http://arxiv.org/pdf/2508.00103v2
What counts as evidence in AI & ED: Towards Science-for-Policy 3.0

Ilkka Tuomi. (08/2025). European Journal of Education Policy and Practice. https://www.aup-online.com/content/journals/10.5117/EJEP2025.1.001.TUOM
Beyond Agreement: Rethinking Ground Truth in Educational AI Annotation

Danielle R. Thomas, Conrad Borchers, Kenneth R. Koedinger. (07/2025). arXiv. http://arxiv.org/pdf/2508.00143v1
Personalized Education with Ranking Alignment Recommendation

Haipeng Liu, Yuxuan Liu, Ting Long. (07/2025). arXiv. http://arxiv.org/pdf/2507.23664v1
Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale

Alex Liu, Lief Esbenshade, Shawon Sarkar, Victor Tian, Zachary Zhang, Kevin He, Min Sun. (07/2025). arXiv. http://arxiv.org/pdf/2507.17985v2
A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges

Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui. (07/2025). arXiv. http://arxiv.org/pdf/2507.18882v1
Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning

Tosin Adewumi, Foteini Simistira Liwicki, Marcus Liwicki, Viktor Gardelli, Lama Alkhaled, Hamam Mokayed. (07/2025). arXiv. http://arxiv.org/pdf/2507.12079v1
AI-Powered Math Tutoring: Platform for Personalized and Adaptive Education

Jaros_aw A. Chudziak, Adam Kostka. (07/2025). arXiv. http://arxiv.org/pdf/2507.12484v1
Findings of the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors

Ekaterina Kochmar, Kaushal Kumar Maurya, Kseniia Petukhova, KV Aditya Srivatsa, Ana¬Øs Tack, Justin Vasselli. (07/2025). arXiv. http://arxiv.org/pdf/2507.10579v1
Exploring LLMs for Predicting Tutor Strategy and Student Outcomes in Dialogues

Fareya Ikram, Alexander Scarlatos, Andrew Lan. (07/2025). arXiv. http://arxiv.org/pdf/2507.06910v1
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

Lixin Wu, Na Cai, Qiao Cheng, Jiachen Wang, Yitao Duan. (06/2025). arXiv. http://arxiv.org/pdf/2506.18330v2
"Check My Work?" Measuring Sycophancy in a Simulated Educational Context

Chuck Arvin. (06/2025). arXiv. http://arxiv.org/pdf/2506.10297v1
RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?

Santiago G¬óngora, Ignacio Sastre, Santiago Robaina, Ignacio Remersaro, Luis Chiruzzo, Aiala Ros¬á. (06/2025). arXiv. http://arxiv.org/pdf/2506.11243v1
SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research

Ahmed Adel Attia, Jing Liu, Carol Espy-Wilson. (06/2025). arXiv. http://arxiv.org/pdf/2506.09206v1
Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

Sankalan Pal Chowdhury, Terry Jingchen Zhang, Donya Rooein, Dirk Hovy, Tanja Kasser, Mrinmaya Sachan. (06/2025). arXiv. http://arxiv.org/pdf/2506.08702v1
Intent Matters: Enhancing AI Tutoring with Fine-Grained Pedagogical Intent Annotation

Kseniia Petukhova, Ekaterina Kochmar. (06/2025). arXiv. http://arxiv.org/pdf/2506.07626v1

Search and Filter

Submit a research study

Outcomes – Numeracy

Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study

Comparing RAG and GraphRAG for Page-Level Retrieval Question Answering on Math Textbook

Generative AI alone may not be enough: Evaluating AI Support for Learning Mathematical Proof

Gen AI In Proof-Based Math Courses: A Pilot Study

MathBuddy: A Multimodal System for Affective Math Tutoring

MAB Optimizer for Estimating Math Question Difficulty via Inverse CV without NLP

Who Is Lagging Behind: Profiling Student Behaviors with Graph-Level Encoding in Curriculum-Based Online Learning Systems

Explainable AI for Predicting and Understanding Mathematics Achievement: A Cross-National Analysis of PISA 2018

Alvorada-Bench: Can Language Models Solve Brazilian University Entrance Exams?

Mathematical Computation and Reasoning Errors by Large Language Models

Aryabhata: An exam-focused language model for JEE Math

CODAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation

Automated Generation Of Curriculum-Aligned Multiple-Choice Questions For Malaysian Secondary Mathematics Using Generative AI

From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation

A Mixed User-Centered Approach to Enable Augmented Intelligence in Intelligent Tutoring Systems: The Case of MathAlde app

What counts as evidence in AI & ED: Towards Science-for-Policy 3.0

Beyond Agreement: Rethinking Ground Truth in Educational AI Annotation

Personalized Education with Ranking Alignment Recommendation

Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale

A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges

Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning

AI-Powered Math Tutoring: Platform for Personalized and Adaptive Education

Findings of the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors

Exploring LLMs for Predicting Tutor Strategy and Student Outcomes in Dialogues

Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

"Check My Work?" Measuring Sycophancy in a Simulated Educational Context

RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?

SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research

Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

Intent Matters: Enhancing AI Tutoring with Fine-Grained Pedagogical Intent Annotation