job
laze
Jobs
Companies
Remote
Sign in
Create alert
Jobs
Companies
Remote
← Back to jobs
Direct Preference Optimization (DPO)
1 job tagged Direct Preference Optimization (DPO)
L
Applied Research Engineer
Remote
Labelbox
Multimodal Models
Direct Preference Optimization (DPO)
JAX
Large Language Models
Reinforcement Learning from Human Feedback (RLHF)
San Francisco, USA
$250k–$300k/yr
Indexed 1 day ago
Apply →