joblaze

Direct from source · No middlemen

Low Latency Inference Jobs in San Francisco

32 open positions · Updated 2 months ago

Average salary: 190.5k–304.2k/yr

Showing 20 of 32 positions

Search with filters →

LLM Inference Frameworks and Optimization Engineer

Together AI San Francisco, Singapore, Amsterdam $160k–$230k/yr Published 1 year ago

C++ TensorRT-LLM SGLang CUDA vLLM

Apply →

Senior Backend Engineer, Inference Platform

Together AI San Francisco $160k–$250k/yr Published 10 months ago

TypeScript NVLink Triton CUDA Go

Apply →

Research Intern, Inference (Fall 2026)

Join Together AI as a Research Intern to work on cutting-edge distributed inference and optimization for large foundation models.

Together AI San Francisco $58–$63/yr Published 2 weeks ago

JAX CUDA Machine Learning PyTorch Python

Flexible on stack

Apply →

Staff Software Engineer - GenAI inference

Databricks San Francisco, California $190.9k–$232.8k/yr Published 8 months ago

memory partitioning GPU programming CUDA AI/ML distributed systems

Apply →

Engineering Manager, Model Routing & Inference

Cursor San Francisco Published 2 months ago

TensorRT-LLM vLLM AI/ML GPU TGI

Apply →

Member of Technical Staff (AI Inference Engineer)

Perplexity San Francisco Published 2 months ago

NVLink CuTe DSL Nsight Systems Triton JAX

Apply →

Staff Machine Learning Engineer, Voice AI

Join Together AI as a Staff ML Engineer to optimize voice model serving for real-time applications on a high-impact team.

Together AI San Francisco $220k–$280k/yr Published 1 month ago

TTS H100 TensorRT-LLM ASR SGLang

Flexible on stack 60% coding

Apply →

Staff+ Software Engineer, Inference Runtime Remote Visa

Join Anthropic as a Staff Engineer to lead the technical direction of the Inference Runtime for AI systems serving millions of users.

Anthropic Remote-Friendly (Travel-Required) | San Francisco, CA | Seattle, WA | New York City, NY $405k–$485k/yr Published 2 weeks ago

CUDA Trainium Rust AWS Neuron TPU

Flexible on stack

Apply →

Senior Machine Learning Engineer, Voice AI

Together AI San Francisco $200k–$260k/yr Published 3 months ago

TTS STT H100 TensorRT-LLM ASR

Apply →

Research Engineer, Core ML

Together AI San Francisco $200k–$280k/yr Published 4 months ago

LLMs RLAIF SGLang vLLM quantization

Apply →

Staff Backend Software Engineer- (AI Platform)

Databricks San Francisco, California $192k–$260k/yr Published 3 months ago

SGLang vLLM AI/ML APIs GPU

Apply →

Research Intern RL & Post-Training Systems, Turbo (Fall 2026)

Join Together AI as a Research Intern to explore efficient reinforcement learning and post-training systems for large language models.

Together AI San Francisco $58–$63/yr Published 1 week ago

C++ Large Language Models CUDA Machine Learning reinforcement learning

Flexible on stack

Apply →

Performance Engineer Remote Visa

Anthropic San Francisco, CA | New York City, NY | Seattle, WA $280k–$850k/yr Published 2 years ago

ML framework internals Machine Learning AI/ML distributed systems Language modeling

Apply →

Staff Software Engineer, Foundational Model Serving

Databricks San Francisco, California $192k–$260k/yr Published 8 months ago

SGLang vLLM AI/ML GPU

Apply →

Staff Engineer, Distributed Storage and HPC & AI Infrastructure

Design and deliver multi-petabyte storage systems for AI workloads at Together AI, optimizing performance and cost.

Together AI San Francisco $250k–$300k/yr Published 3 weeks ago

Terraform iSCSI Grafana RAID Prometheus

Flexible on stack

Apply →

Senior Machine Learning Operations Engineer

Join Mercury as a Senior Machine Learning Operations Engineer to build and operate real-time inference services for risk decisioning.

Mercury San Francisco, CA, New York, NY, Portland, OR, or Remote within Canada or United States $166.6k–$208.3k/yr Published 1 week ago

Flask Redpanda FastAPI SQL Kinesis

Flexible on stack

Apply →

AI Researcher, Core ML (Turbo)

Together AI San Francisco $200k–$280k/yr Published 2 years ago

RLAIF speculative decoding SGLang vLLM reward modeling

Apply →

TPU Kernel Engineer Remote Visa

Anthropic San Francisco, CA | New York City, NY | Seattle, WA $280k–$850k/yr Published 1 year ago

low-precision inference Large Language Models collective communication algorithms ML systems TPU

Apply →

Staff Backend Software Engineer- (AI Platform)

Databricks San Francisco, California $166k–$225k/yr Published 3 months ago

AI/ML model serving CPU inference systems large-scale distributed systems

Apply →

Senior Software Engineer, Model Serving

Databricks San Francisco, California $166k–$225k/yr Published 8 months ago

scheduling Routing AI/ML model serving CPU

Apply →

Page 1 of 2