Evolutionary Journey of Large Language Models from Guidance to Self-Growth a weak-to-strong
evolution framework that enables LLMs to progressively evolve from supervised guidance to
autonomous enhancement. While LLMs have demonstrated remarkable general capabilities across
various applications, developing highly versatile LLMs requires substantial computational
resources and financial investment, making them impractical for many domain-specific scenarios
where specialized expertise is required. Current approaches to domain specialization either rely
on costly external enhancements limited to large models, struggle with manual data annotation
scalability, or remain bounded by their supervisors' performance ceiling. To address these
limitations, METEOR introduces a comprehensive three-stage evolution framework that guides
models from basic domain knowledge acquisition through supervised learning to autonomous
capability enhancement via progressive computational scaling.
An Effective Process-supervised Policy Optimization for Reasoning Alignment is a novel process
supervision paradigm, which systematically outlines the workflow from reward model training to
policy optimization, and highlights the importance of nonlinear rewards in process supervision.
Based on PSPO*, we develop the PSPO-WRS, which considers the number of reasoning steps in
determining reward scores and utilizes an adjusted Weibull distribution for nonlinear reward
shaping. Experimental results on six mathematical reasoning datasets demonstrate that PSPO-WRS
consistently outperforms current mainstream models.
AI systems require language style understanding for accurate language generation, but existing
style transfer methods are limited. We introduce Public-Speaking Style Transfer (PSST) to
address this, analyzing and breaking down public-speaking style into sub-styles, and propose a
detailed evaluation framework. Our experiments show that current LLMs have difficulty producing
human-preferred public-speaking texts, often due to over-stylization and semantic issues.
SRA-MCTS 通过蒙特卡洛树搜索的策略来引导模型自我生成数据并提升模型代码能力, SRA-MCTS能够生成由自然语言策略和具体实现代码组成的数据,在数据质量和数据多样性上满足您的需求.
DirectionAI通过最前沿的人工智能技术,打造出一个充满活力和个性化的学习环境。无论您是希望在学业上更进一步的学生,还是致力于提高教学效果的教育者,DirectionAI教育平台都能为您提供智能化、精准化的帮助,让学习与教学变得更加高效和有趣。
MindLLM is a Transformer model developed by Beijing Engineering Research Center of High Volume
Language Information Processing and Cloud Computing Applications & Beijing Institute of
Technology Southeast Academy of Information Technology.