Light-R1-360 Zhinao's open source long-term thinking chain reasoning model
AI Product Observation

Light-R1-360 Zhinao's open source long-term thinking chain reasoning model

  • Light-R1
  • Open Source AI
  • Mathematics Reasoning
  • Cost-effective Training
  • Problem Solving
  • Long-term Thinking
  • Machine Learning
  • Educational Tools
  • Logical Reasoning
  • Model Development
Tina

By Tina

March 27, 2025

What is Light-R1?

Light-R1 is 360 Zhinao's open source A! model, focusing on long-term thinking chain reasoning in the field of mathematics, specifically Light-R1-32B. The model is based on Qwen2.5-32B-Instruct, trained with 70,000 mathematical data and two-stage course learning (SFT+DPO), and achieved performance that surpassed DeepSeekR1-Distil-Qwen-32B from scratch. In the AIME24 test, Light-R1 scored 76.6 points, significantly higher than DeepSeek-R1-Distil's 72.6 points. The model training cost is low, only 12 H800 machines are required to run for 6 hours, and the cost is about $1,000. The model supports full open source, including models, data sets, training frameworks, and evaluation codes, to promote the development of the open source community and provide a reference for low-cost training of specialized models in the field.

Main functions of Light-R1

Efficient math problem solving: can quickly and accurately solve complex math problems, including but not limited to algebra, geometry, probability and other fields.

Inference ability improvement: has strong logical reasoning ability and supports processing long thought chain problems.

Generalization ability: shows generalization ability in other fields (such as logical reasoning, language comprehension).

Low-cost training and deployment: extremely low cost to achieve high performance, suitable for rapid deployment and application by users or enterprises with limited resources.

Technical principles of Light-R1

Basic model and starting point: the model is developed based on Qwen2.5-32B-Instruct, achieving performance improvement from zero to surpassing DeepSeek-R1-Disti.

Course learning:

SFT (Supervised Fine-Tuning): screen data with difficulty levels and perform supervised fine-tuning in two stages. The first stage uses 70,000 data, and the second stage screens out the 3,000 data with the highest difficulty for further fine-tuning.

DPO (Direct Preference Optimization): Based on SFT, based on multiple sampling and preference pair construction, optimize the output quality of the model.

Data processing and deduplication: The training data comes from multiple open source mathematical data sets (such as OpenR1-Math-220k, OpenThoughts-114k, etc.), and is strictly deduplicated to avoid the impact of test data leakage on model performance.

Model fusion: The final Liaht-R1-328 is obtained by integrating SFT stage 2, DPO and another DPO version of the model. Further improve the performance and stability of the model.

Training framework and optimization: Use the 360-LLaMA-factory training framework to support sequential parallelism and efficient distributed training. Based on the optimized training process, Light-R1 can complete training in just 6 hours on 12 H800 machines.

Light-R1 project address

GitHub repository: https://github.com/Qihoo360/Light-R1

HuggingFace model library: https://huggingface.co/collections/gihoo360/light-r1z

Application scenarios of Light-R1

Education: As a mathematics learning tool, it helps students solve complex problems, provides problem-solving steps and ideas, and is suitable for mathematics competitions and daily learning.

Scientific research and academic: Assists in mathematical research and interdisciplinary problem solving, such as physical construction, engineering optimization, etc.

Enterprise application: Used to solve complex problems such as data analysis, risk assessment, and supply chain optimization.

Software integration: Integrate into smart assistants and mathematical software to enhance reasoning and problem-solving functions.

Open source and developers: Support developers to customize and expand models and promote the development of the open source community.



Related articles

HomeiconAI Product Observationicon

Light-R1-360 Zhinao's open source long-term thinking chain reasoning model

Β© Copyright 2025 All Rights Reserved By Neurokit AI.