Breadcrumb
- Home
- AI Hub For Education
- Research Study Repository
- Outcomes – Numeracy
Outcomes – Numeracy
Research synthesis is AI-generated, human reviewed. Updated 05/2026.
Displaying 91 - 120 of 224
Evaluating undergraduate mathematics examinations in the era of generative AI: a curriculum-level case study
BENJAMIN J. WALKER, NIKOLETA KALAYDZHIEVA, BEATRIZ NAVARRO LAMEDA, RUTH A. REYNOLDS. (09/2025). arXiv. http://arxiv.org/pdf/2509.13359v3
Comparing RAG and GraphRAG for Page-Level Retrieval Question Answering on Math Textbook
Eason Chen, Chuangji Li, Shizhuo Li, Zimo Xiao, Jionghao Lin, Kenneth R. Koedinger. (09/2025). arXiv. http://arxiv.org/pdf/2509.16780v2
Generative AI alone may not be enough: Evaluating AI Support for Learning Mathematical Proof
Eason Chen, Sophia Judicke, Kayla Beigh, Xinyi Tang, Zimo Xiao, Chuangji Li, Shizhuo Li, Reed Luttmer, Shreya Singh, Maria Yampolsky, Naman Parikh, Yi Zhao, Meiyi Chen, Scarlett Huang, Anishka Mohanty, Gregory Johnson, John Mackey, Jionghao Lin, Ken Koedinger. (09/2025). arXiv. http://arxiv.org/pdf/2509.16778v1
Gen AI In Proof-Based Math Courses: A Pilot Study
Hannah Klawa, Shraddha Rajpal, Cigole Thomas. (09/2025). arXiv. http://arxiv.org/pdf/2509.13570v1
MathBuddy: A Multimodal System for Affective Math Tutoring
Debanjana Kar, Leopold B¬öss, Dacia Braca, Sebastian Maximilian Dennerlein, Nina Christine Hubig, Philipp Wintersberger, Yufang Hou. (08/2025). arXiv. http://arxiv.org/pdf/2508.19993v1
MAB Optimizer for Estimating Math Question Difficulty via Inverse CV without NLP
Surajit Das, Gourav Roy, Aleksei Eliseev, Ram Kumar Rajendran. (08/2025). arXiv. http://arxiv.org/pdf/2508.19014v1
Who Is Lagging Behind: Profiling Student Behaviors with Graph-Level Encoding in Curriculum-Based Online Learning Systems
Qian Xiao, Conn Breathnach, Ioana Ghergulescu, Conor O'Sullivan, Keith Johnston, Vincent Wade. (08/2025). arXiv. http://arxiv.org/pdf/2508.18925v1
Explainable AI for Predicting and Understanding Mathematics Achievement: A Cross-National Analysis of PISA 2018
Liu Liu, Dai Rui. (08/2025). arXiv. http://arxiv.org/pdf/2508.16747v1
Alvorada-Bench: Can Language Models Solve Brazilian University Entrance Exams?
Henrique Godoy. (08/2025). arXiv. http://arxiv.org/pdf/2508.15835v1
Mathematical Computation and Reasoning Errors by Large Language Models
Liang Zhang, Edith Aurora Graf. (08/2025). arXiv. http://arxiv.org/pdf/2508.09932v2
Aryabhata: An exam-focused language model for JEE Math
Ritvik Rastogi, Sachin Dharashivkar, Sandeep Varma. (08/2025). arXiv. http://arxiv.org/pdf/2508.08665v2
CODAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation
Shuzhou Yuan, William LaCroix, Hardik Ghoshal, Ercong Nie, Michael FŠrber. (08/2025). arXiv. http://arxiv.org/pdf/2508.08386v1
Automated Generation Of Curriculum-Aligned Multiple-Choice Questions For Malaysian Secondary Mathematics Using Generative AI
Rohaizah Abdul Wahid, Muhamad Said Nizamuddin Nadim, Suliana Sulaiman, Syahmi Akmal Shaharudin, Muhammad Danial Jupikil, Iqqwan Jasman Su Azlan Su. (08/2025). arXiv. http://arxiv.org/pdf/2508.04442v1
From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation
Chengliang Zhou, Mei Wang, Ting Zhang, Qiannan Zhu, Jian Li, Hua Huang. (08/2025). arXiv. http://arxiv.org/pdf/2508.10005v1
A Mixed User-Centered Approach to Enable Augmented Intelligence in Intelligent Tutoring Systems: The Case of MathAlde app
Guilherme Guerino, Luiz Rodrigues, Luana Bianchini, Mariana Alves, Marcelo Marinho, Thomaz Veloso, Valmir Macario, Diego Dermeval, Thales Vieira, Ig Bittencourt, Seiji Isotani. (08/2025). arXiv. http://arxiv.org/pdf/2508.00103v2
What counts as evidence in AI & ED: Towards Science-for-Policy 3.0
Ilkka Tuomi. (08/2025). European Journal of Education Policy and Practice. https://www.aup-online.com/content/journals/10.5117/EJEP2025.1.001.TUOM
Beyond Agreement: Rethinking Ground Truth in Educational AI Annotation
Danielle R. Thomas, Conrad Borchers, Kenneth R. Koedinger. (07/2025). arXiv. http://arxiv.org/pdf/2508.00143v1
Personalized Education with Ranking Alignment Recommendation
Haipeng Liu, Yuxuan Liu, Ting Long. (07/2025). arXiv. http://arxiv.org/pdf/2507.23664v1
Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale
Alex Liu, Lief Esbenshade, Shawon Sarkar, Victor Tian, Zachary Zhang, Kevin He, Min Sun. (07/2025). arXiv. http://arxiv.org/pdf/2507.17985v2
A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges
Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui. (07/2025). arXiv. http://arxiv.org/pdf/2507.18882v1
Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning
Tosin Adewumi, Foteini Simistira Liwicki, Marcus Liwicki, Viktor Gardelli, Lama Alkhaled, Hamam Mokayed. (07/2025). arXiv. http://arxiv.org/pdf/2507.12079v1
AI-Powered Math Tutoring: Platform for Personalized and Adaptive Education
Jaros_aw A. Chudziak, Adam Kostka. (07/2025). arXiv. http://arxiv.org/pdf/2507.12484v1
Findings of the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors
Ekaterina Kochmar, Kaushal Kumar Maurya, Kseniia Petukhova, KV Aditya Srivatsa, Ana¯s Tack, Justin Vasselli. (07/2025). arXiv. http://arxiv.org/pdf/2507.10579v1
Exploring LLMs for Predicting Tutor Strategy and Student Outcomes in Dialogues
Fareya Ikram, Alexander Scarlatos, Andrew Lan. (07/2025). arXiv. http://arxiv.org/pdf/2507.06910v1
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning
Lixin Wu, Na Cai, Qiao Cheng, Jiachen Wang, Yitao Duan. (06/2025). arXiv. http://arxiv.org/pdf/2506.18330v2
"Check My Work?" Measuring Sycophancy in a Simulated Educational Context
Chuck Arvin. (06/2025). arXiv. http://arxiv.org/pdf/2506.10297v1
RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?
Santiago G—ngora, Ignacio Sastre, Santiago Robaina, Ignacio Remersaro, Luis Chiruzzo, Aiala Ros‡. (06/2025). arXiv. http://arxiv.org/pdf/2506.11243v1
SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research
Ahmed Adel Attia, Jing Liu, Carol Espy-Wilson. (06/2025). arXiv. http://arxiv.org/pdf/2506.09206v1
Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting
Sankalan Pal Chowdhury, Terry Jingchen Zhang, Donya Rooein, Dirk Hovy, Tanja Kasser, Mrinmaya Sachan. (06/2025). arXiv. http://arxiv.org/pdf/2506.08702v1
Intent Matters: Enhancing AI Tutoring with Fine-Grained Pedagogical Intent Annotation
Kseniia Petukhova, Ekaterina Kochmar. (06/2025). arXiv. http://arxiv.org/pdf/2506.07626v1

