Outcomes – Numeracy

Research synthesis is AI-generated, human reviewed. Updated 09/2025.

Displaying 61 - 90 of 180

Aryabhata: An exam-focused language model for JEE Math

Ritvik Rastogi, Sachin Dharashivkar, Sandeep Varma. (08/2025). arXiv. http://arxiv.org/pdf/2508.08665v2
CODAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation

Shuzhou Yuan, William LaCroix, Hardik Ghoshal, Ercong Nie, Michael F¬ärber. (08/2025). arXiv. http://arxiv.org/pdf/2508.08386v1
Automated Generation Of Curriculum-Aligned Multiple-Choice Questions For Malaysian Secondary Mathematics Using Generative AI

Rohaizah Abdul Wahid, Muhamad Said Nizamuddin Nadim, Suliana Sulaiman, Syahmi Akmal Shaharudin, Muhammad Danial Jupikil, Iqqwan Jasman Su Azlan Su. (08/2025). arXiv. http://arxiv.org/pdf/2508.04442v1
From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation

Chengliang Zhou, Mei Wang, Ting Zhang, Qiannan Zhu, Jian Li, Hua Huang. (08/2025). arXiv. http://arxiv.org/pdf/2508.10005v1
A Mixed User-Centered Approach to Enable Augmented Intelligence in Intelligent Tutoring Systems: The Case of MathAlde app

Guilherme Guerino, Luiz Rodrigues, Luana Bianchini, Mariana Alves, Marcelo Marinho, Thomaz Veloso, Valmir Macario, Diego Dermeval, Thales Vieira, Ig Bittencourt, Seiji Isotani. (08/2025). arXiv. http://arxiv.org/pdf/2508.00103v2
What counts as evidence in AI & ED: Towards Science-for-Policy 3.0

Ilkka Tuomi. (08/2025). European Journal of Education Policy and Practice. https://www.aup-online.com/content/journals/10.5117/EJEP2025.1.001.TUOM
Beyond Agreement: Rethinking Ground Truth in Educational AI Annotation

Danielle R. Thomas, Conrad Borchers, Kenneth R. Koedinger. (07/2025). arXiv. http://arxiv.org/pdf/2508.00143v1
Personalized Education with Ranking Alignment Recommendation

Haipeng Liu, Yuxuan Liu, Ting Long. (07/2025). arXiv. http://arxiv.org/pdf/2507.23664v1
Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale

Alex Liu, Lief Esbenshade, Shawon Sarkar, Victor Tian, Zachary Zhang, Kevin He, Min Sun. (07/2025). arXiv. http://arxiv.org/pdf/2507.17985v2
A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges

Meriem Zerkouk, Miloud Mihoubi, Belkacem Chikhaoui. (07/2025). arXiv. http://arxiv.org/pdf/2507.18882v1
Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning

Tosin Adewumi, Foteini Simistira Liwicki, Marcus Liwicki, Viktor Gardelli, Lama Alkhaled, Hamam Mokayed. (07/2025). arXiv. http://arxiv.org/pdf/2507.12079v1
AI-Powered Math Tutoring: Platform for Personalized and Adaptive Education

Jaros_aw A. Chudziak, Adam Kostka. (07/2025). arXiv. http://arxiv.org/pdf/2507.12484v1
Findings of the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors

Ekaterina Kochmar, Kaushal Kumar Maurya, Kseniia Petukhova, KV Aditya Srivatsa, Ana¬Øs Tack, Justin Vasselli. (07/2025). arXiv. http://arxiv.org/pdf/2507.10579v1
Exploring LLMs for Predicting Tutor Strategy and Student Outcomes in Dialogues

Fareya Ikram, Alexander Scarlatos, Andrew Lan. (07/2025). arXiv. http://arxiv.org/pdf/2507.06910v1
Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

Lixin Wu, Na Cai, Qiao Cheng, Jiachen Wang, Yitao Duan. (06/2025). arXiv. http://arxiv.org/pdf/2506.18330v2
RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?

Santiago G¬óngora, Ignacio Sastre, Santiago Robaina, Ignacio Remersaro, Luis Chiruzzo, Aiala Ros¬á. (06/2025). arXiv. http://arxiv.org/pdf/2506.11243v1
"Check My Work?" Measuring Sycophancy in a Simulated Educational Context

Chuck Arvin. (06/2025). arXiv. http://arxiv.org/pdf/2506.10297v1
SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research

Ahmed Adel Attia, Jing Liu, Carol Espy-Wilson. (06/2025). arXiv. http://arxiv.org/pdf/2506.09206v1
Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

Sankalan Pal Chowdhury, Terry Jingchen Zhang, Donya Rooein, Dirk Hovy, Tanja Kasser, Mrinmaya Sachan. (06/2025). arXiv. http://arxiv.org/pdf/2506.08702v1
Intent Matters: Enhancing AI Tutoring with Fine-Grained Pedagogical Intent Annotation

Kseniia Petukhova, Ekaterina Kochmar. (06/2025). arXiv. http://arxiv.org/pdf/2506.07626v1
Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback

Junior Cedric Tonga, KV Aditya Srivatsa, Kaushal Kumar Maurya, Fajri Koto, Ekaterina Kochmar. (06/2025). arXiv. http://arxiv.org/pdf/2506.04920v1
Evaluating Vision - Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms

Nurul Aisyah, Muhammad Dehan Al Kautsar, Arif Hidayat, Raqib Chowdhury, Fajri Koto. (06/2025). arXiv. http://arxiv.org/pdf/2506.04822v1
Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models

Junling Wang, Anna Rutkiewicz, April Yi Wang, Mrinmaya Sachan. (06/2025). arXiv. http://arxiv.org/pdf/2506.03735v1
TestAgent: An Adaptive and Intelligent Expert for Human Assessment

Junhao Yu, Yan Zhuang, YuXuan Sun, Weibo Gao, Qi Liu, Mingyue Cheng, Zhenya Huang, Enhong Chen. (06/2025). arXiv. http://arxiv.org/pdf/2506.03032v1
Towards Generating Controllable and Solvable Geometry Problem by Leveraging Symbolic Deduction Engine

Zhuoxuan Jiang, Tianyang Zhang, Peiyan Peng, Jing Chen, Yinong Xun, Haotian Zhang, Lichi Li, Yong Li, Shaohua Zhang. (06/2025). arXiv. http://arxiv.org/pdf/2506.02565v1
BD at BEA 2025 Shared Task: MPNet Ensembles for Pedagogical Mistake Identification and Localization in AI Tutor Responses

Shadman Rohan, Ishita Sur Apan, Muhtasim Ibteda Shochcho, Md Fahim, Mohammad Ashfaq Ur Rahman, AKM Mahbubur Rahman, Amin Ahsan Ali. (06/2025). arXiv. http://arxiv.org/pdf/2506.01817v1
Evaluating Gemini in an Arena for Learning

LearnLM Team, Google. (05/2025). arXiv. http://arxiv.org/pdf/2505.24477v1
A Structured Unplugged Approach for Foundational AI Literacy in Primary Education

Maria Cristina Carrisi, Mirko Marras, Sara Vergallo. (05/2025). arXiv. http://arxiv.org/pdf/2505.21398v1
LMCD: Language Models are Zeroshot Cognitive Diagnosis Learners

Yu He, Zihan Yao, Chentao Song, Tianyu Qi, Jun Liu, Ming Li, Qing Huang. (05/2025). arXiv. http://arxiv.org/pdf/2505.21239v1
From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization

Haonian Ji, Shi Qiu, Siyang Xin, Siwei Han, Zhaorun Chen, Dake Zhang, Hongyi Wang, Huaxiu Yao. (05/2025). arXiv. http://arxiv.org/pdf/2505.16832v2

Search and Filter

Submit a research study

Outcomes – Numeracy

Aryabhata: An exam-focused language model for JEE Math

CODAE: Adapting Large Language Models for Education via Chain-of-Thought Data Augmentation

Automated Generation Of Curriculum-Aligned Multiple-Choice Questions For Malaysian Secondary Mathematics Using Generative AI

From Answers to Questions: EQGBench for Evaluating LLMs' Educational Question Generation

A Mixed User-Centered Approach to Enable Augmented Intelligence in Intelligent Tutoring Systems: The Case of MathAlde app

What counts as evidence in AI & ED: Towards Science-for-Policy 3.0

Beyond Agreement: Rethinking Ground Truth in Educational AI Annotation

Personalized Education with Ranking Alignment Recommendation

Decoding Instructional Dialogue: Human-AI Collaborative Analysis of Teacher Use of AI Tool at Scale

A Comprehensive Review of AI-based Intelligent Tutoring Systems: Applications and Challenges

Findings of MEGA: Maths Explanation with LLMs using the Socratic Method for Active Learning

AI-Powered Math Tutoring: Platform for Personalized and Adaptive Education

Findings of the BEA 2025 Shared Task on Pedagogical Ability Assessment of AI-powered Tutors

Exploring LLMs for Predicting Tutor Strategy and Student Outcomes in Dialogues

Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

RETUYT-INCO at BEA 2025 Shared Task: How Far Can Lightweight Models Go in AI-powered Tutor Evaluation?

"Check My Work?" Measuring Sycophancy in a Simulated Educational Context

SimClass: A Classroom Speech Dataset Generated via Game Engine Simulation For Automatic Speech Recognition Research

Educators' Perceptions of Large Language Models as Tutors: Comparing Human and AI Tutors in a Blind Text-only Setting

Intent Matters: Enhancing AI Tutoring with Fine-Grained Pedagogical Intent Annotation

Simulating LLM-to-LLM Tutoring for Multilingual Math Feedback

Evaluating Vision - Language and Large Language Models for Automated Student Assessment in Indonesian Classrooms

Generating Pedagogically Meaningful Visuals for Math Word Problems: A New Benchmark and Analysis of Text-to-Image Models

TestAgent: An Adaptive and Intelligent Expert for Human Assessment

Towards Generating Controllable and Solvable Geometry Problem by Leveraging Symbolic Deduction Engine

BD at BEA 2025 Shared Task: MPNet Ensembles for Pedagogical Mistake Identification and Localization in AI Tutor Responses

Evaluating Gemini in an Arena for Learning

A Structured Unplugged Approach for Foundational AI Literacy in Primary Education

LMCD: Language Models are Zeroshot Cognitive Diagnosis Learners

From EduVisBench to EduVisAgent: A Benchmark and Multi-Agent Framework for Reasoning-Driven Pedagogical Visualization