joblaze

Direct from source · No middlemen

Low Latency Inference Jobs

73 open positions · Updated 2 months ago

Average salary: 181.5k–236.0k/yr

Showing 20 of 73 positions

Search with filters →

ML Research Engineer (Inference)

Cerebras Systems Bengaluru, Karnataka, India Published 2 months ago

SGLang Machine Learning vLLM PyTorch Transformers

Apply →

ML Performance Benchmarking Engineer Remote

Cerebras Systems Toronto, Ontario, Canada Published 3 months ago

C++ AI/ML Python

Apply →

LLM Inference Frameworks and Optimization Engineer

Together AI San Francisco, Singapore, Amsterdam $160k–$230k/yr Published 1 year ago

C++ TensorRT-LLM SGLang CUDA vLLM

Apply →

Senior Backend Engineer, Inference Platform

Together AI San Francisco $160k–$250k/yr Published 10 months ago

TypeScript NVLink Triton CUDA Go

Apply →

Research Intern, Inference (Fall 2026)

Join Together AI as a Research Intern to work on cutting-edge distributed inference and optimization for large foundation models.

Together AI San Francisco $58–$63/yr Published 2 weeks ago

JAX CUDA Machine Learning PyTorch Python

Flexible on stack

Apply →

Staff Inference ML Runtime Engineer

Cerebras Systems Sunnyvale CA or Toronto Canada Published 7 months ago

C++ TensorRT-LLM SGLang vLLM PyTorch

Apply →

Staff Software Engineer - GenAI inference

Databricks San Francisco, California $190.9k–$232.8k/yr Published 8 months ago

memory partitioning GPU programming CUDA AI/ML distributed systems

Apply →

Principal Engineer, AI Inference Reliability

Cerebras Systems Remote, California, United States; Sunnyvale CA or Toronto Canada Published 8 months ago

C++ Go Rust Python

Apply →

Senior Software Engineer, Inference Platform Remote

MongoDB Palo Alto $126k–$248k/yr Published 5 months ago

C++ HNSW vLLM Go Faiss

Apply →

Engineering Manager, Model Routing & Inference

Cursor San Francisco Published 2 months ago

TensorRT-LLM vLLM AI/ML GPU TGI

Apply →

Member of Technical Staff (Software Engineer) Remote

Cerebras Systems Sunnyvale, CA $169.6k–$175k/yr Published 1 month ago

TypeScript C++ OracleDB JavaScript Git

Apply →

Engineering Manager, Inference ML Runtime

Cerebras Systems Sunnyvale CA or Toronto Canada Published 3 months ago

C++ distributed systems LLM serving frameworks ML systems PyTorch

Apply →

Software Engineer, Model Routing & Inference

Cursor New York Published 2 months ago

AI/ML distributed systems real-time data pipelines traffic routing inference serving

Apply →

Member of Technical Staff (AI Inference Engineer)

Perplexity London Published 2 months ago

NVLink CuTe DSL Triton JAX CUTLASS

Apply →

Application Software Engineer, Inference Visa

Join SpaceX as an Application Software Engineer to develop high-performance AI inference systems in a mission-driven environment.

SpaceX Palo Alto, CA $135k–$185k/yr Published 1 week ago

C++ TensorRT-LLM gRPC Triton SGLang

Flexible on stack

Apply →

Member of Technical Staff (AI Inference Engineer)

Perplexity San Francisco Published 2 months ago

NVLink CuTe DSL Nsight Systems Triton JAX

Apply →

Sr. Member of Technical Staff Remote

Cerebras Systems Sunnyvale, CA $230k–$250k/yr Published 1 month ago

Terraform AWS CDK AWS CloudFormation AWS Fargate Flask

Apply →

Staff Machine Learning Engineer, Voice AI

Join Together AI as a Staff ML Engineer to optimize voice model serving for real-time applications on a high-impact team.

Together AI San Francisco $220k–$280k/yr Published 1 month ago

TTS H100 TensorRT-LLM ASR SGLang

Flexible on stack 60% coding

Apply →

Staff+ Software Engineer, Inference Runtime Remote Visa

Join Anthropic as a Staff Engineer to lead the technical direction of the Inference Runtime for AI systems serving millions of users.

Anthropic Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NY $405k–$485k/yr Published 2 weeks ago

CUDA Trainium Rust AWS Neuron TPU

Flexible on stack

Apply →

Senior Machine Learning Engineer, Voice AI

Together AI San Francisco $200k–$260k/yr Published 3 months ago

TTS STT H100 TensorRT-LLM ASR

Apply →

Page 1 of 4