
The potential for artificial intelligence (AI) chatbots to serve as educational tutors—assisting children with reading comprehension and homework—has gained significant attention following predictions by tech visionary Bill Gates. In China, tech giants Baidu and Alibaba have entered this arena with their large language models, Ernie Bot (Wenxin Yiyan) and Tongyi Qianwen, marking a new phase in AI's educational applications.
The Transformative Potential of AI in Education
Mature AI models could fundamentally reshape core elements of education systems, including learning objectives, knowledge acquisition methods, teaching approaches, and evaluation frameworks. The traditional teacher-centered instruction model may gradually evolve into student-centric personalized learning. AI could provide tailored educational content and tutoring based on individual learning progress and characteristics, potentially enhancing both efficiency and outcomes.
Additionally, AI assistants could support teachers in lesson planning, material preparation, and grading—freeing educators to focus on personalized guidance and emotional engagement with students.
Evaluation Methodology
To assess Ernie Bot and Tongyi Qianwen's educational capabilities, we designed a series of tests across five domains: high school English translation, middle school mathematics, high school Chinese composition, code generation, and adolescent psychological counseling. Each model both generated and solved problems in a cross-evaluation format, allowing comparison of their question-design quality and problem-solving abilities.
1. Question Design: Assessing Knowledge Application
In the question-generation phase, both models created representative problems for each subject. Ernie Bot typically offered two versions (A/B sets) per subject—demonstrating an attempt at variety but lacking specificity. Tongyi Qianwen directly produced five questions per request, better matching practical needs. For Chinese composition, Tongyi Qianwen's prompts more closely resembled actual college entrance exam topics.
Both models generated relatively simple questions overall. English translation tasks fell below high school difficulty levels, suggesting current limitations in the AI's knowledge depth and breadth.
2. Translation Capabilities: Precision in Language
For English-to-Chinese translation, both models produced similar results, though Tongyi Qianwen offered more context-aware phrasing. While Ernie Bot automatically provided Chinese translations, Tongyi Qianwen proactively asked about target languages—a user-experience advantage.
However, in Chinese-to-English tasks, Ernie Bot delivered more accurate translations. Surprisingly, Tongyi Qianwen's translation quality deteriorated significantly during extended conversations, indicating room for improvement in multi-turn dialogue consistency.
3. Mathematical Problem-Solving: Logical Accuracy
In middle school math, Ernie Bot incorrectly solved its own generated problem—misinterpreting fraction multiplication as division while producing erroneous calculations. Tongyi Qianwen solved the problem correctly but used approximation symbols ("≈"), suggesting precision limitations.
Neither model provided step-by-step explanations, revealing gaps in logical reasoning demonstration. When addressing Tongyi Qianwen's geometry problem, Ernie Bot failed to recognize equilateral triangle properties and couldn't generate correct diagrams despite multiple attempts.
4. Composition Writing: Creativity and Structure
For Chinese writing, both models' opening sentences showed striking similarity, suggesting overlapping language patterns. Their essays shared comparable structure—discussing "happiness" from multiple perspectives—with Ernie Bot focusing on personal experiences and Tongyi Qianwen incorporating societal dimensions. Both outputs contained detectable "machine-like" qualities, lacking human flexibility.
On Tongyi Qianwen's writing prompt, Ernie Bot produced an overly academic response, while Tongyi Qianwen crafted a more persuasive argument despite shorter length.
5. Code Generation: Programming Logic
For programming tasks, both models generated similar core code segments, though Ernie Bot omitted variable declarations. When interpreting Tongyi Qianwen's prompt, they produced different solutions—Ernie Bot creating random strings versus Tongyi Qianwen's more context-appropriate random English words.
6. Psychological Counseling: Emotional Intelligence
Both models provided comparable, bullet-pointed advice for adolescent emotional management, suggesting techniques like meditation. Tongyi Qianwen offered more diverse solutions, while Ernie Bot included anxiety etiology explanations.
Conclusions and Future Prospects
This evaluation reveals that China's leading AI models remain in early developmental stages for educational applications. Tongyi Qianwen demonstrated stronger question-design specificity, while Ernie Bot emphasized variety. Their strengths varied by subject:
- Translation: Ernie Bot performed slightly better
- Mathematics: Tongyi Qianwen showed superior capability
- Composition: Comparable results with different strengths
- Programming: Both demonstrated basic competency
- Counseling: Both provided detailed, practical guidance
Ernie Bot exhibited more conservative responses with marginally better multi-turn conversation ability, while Tongyi Qianwen responded faster with greater flexibility. Both models require significant improvement in visual content generation.
Current limitations notwithstanding, AI's educational potential appears promising. While Bill Gates predicted AI tutors within 18 months—likely referencing Western models—China's systems may require longer development. Continued advancement suggests increasingly meaningful educational roles for AI technology.