Projects

METEOR: Evolutionary Journey of Large Language Models from Guidance to Self-Growth

Evolutionary Journey of Large Language Models from Guidance to Self-Growth a weak-to-strong evolution framework that enables LLMs to progressively evolve from supervised guidance to autonomous enhancement. While LLMs have demonstrated remarkable general capabilities across various applications, developing highly versatile LLMs requires substantial computational resources and financial investment, making them impractical for many domain-specific scenarios where specialized expertise is required. Current approaches to domain specialization either rely on costly external enhancements limited to large models, struggle with manual data annotation scalability, or remain bounded by their supervisors' performance ceiling. To address these limitations, METEOR introduces a comprehensive three-stage evolution framework that guides models from basic domain knowledge acquisition through supervised learning to autonomous capability enhancement via progressive computational scaling.

PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

An Effective Process-supervised Policy Optimization for Reasoning Alignment is a novel process supervision paradigm, which systematically outlines the workflow from reward model training to policy optimization, and highlights the importance of nonlinear rewards in process supervision. Based on PSPO*, we develop the PSPO-WRS, which considers the number of reasoning steps in determining reward scores and utilizes an adjusted Weibull distribution for nonlinear reward shaping. Experimental results on six mathematical reasoning datasets demonstrate that PSPO-WRS consistently outperforms current mainstream models.

PSST: A Benchmark for Evaluation-driven Text Public-Speaking Style Transfer

AI systems require language style understanding for accurate language generation, but existing style transfer methods are limited. We introduce Public-Speaking Style Transfer (PSST) to address this, analyzing and breaking down public-speaking style into sub-styles, and propose a detailed evaluation framework. Our experiments show that current LLMs have difficulty producing human-preferred public-speaking texts, often due to over-stylization and semantic issues.

Projects

METEOR: Evolutionary Journey of Large Language Models from Guidance to Self-Growth

PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

PSST: A Benchmark for Evaluation-driven Text Public-Speaking Style Transfer

SRA-MCTS

Direction AI

MindLLM