Research Engineer, LLM Pre-training & Post-training
Singapore
Beijing
Palo Alto
Senior / Technical Staff
Remote
Role Overview
We are looking for a Research Engineer to lead the post-training and alignment pipelines for our advanced reasoning models.
You will work across pre-training, post-training, and human-in-the-loop systems to enable efficient reasoning, alignment, and generalization. In this role, you will not only manage data pipelines but also actively research and design data strategies that amplify model intelligence.
You will work on models where data, learning signals, and architecture are tightly coupled.
You will help enable small models to outperform much larger ones on reasoning tasks.
You will shape not only what the model sees, but how it learns.
Key Responsibilities
Synthetic Data & Pretraining Strategy: Design synthetic data generation, filtering, and curriculum strategies that improve pretraining efficiency and reasoning performance.
Post-Training & Alignment (SFT / RL): Build and optimize post-training pipelines, including SFT and RL, to improve reasoning quality, alignment, and controllability.
Human Data Operations: Develop scalable human data workflows, annotation protocols, and quality-control systems for reasoning model training.
Evaluation & Analysis: Lead evaluation, ablation, and failure analysis to measure data impact and continuously improve model reasoning behavior.
Required Qualifications
3+ years of experience training in NLP, Deep Learning, or ML Engineering.
Comfort working with large-scale data processing systems (Apache Spark, Ray Data, Databricks, or similar).
Ability to read, critique, and implement research related to synthetic data, data selection, and weak-to-strong generalization.
Preferred Qualifications
Experience training LLMs (7B+) or SLMs (<7B) end-to-end, with ownership over major parts of the data pipeline.
Published research, technical blog posts, or open-source contributions related to:
- Synthetic data generation
- Dataset pruning or filtering
- Reasoning or alignment
Familiarity with automated evaluation techniques such as LLM-as-a-judge or verifier-based evaluation.
