← Back to results

This position may no longer be available

This job was last seen 1 week ago. The listing may have been removed by the employer.

Senior Site Reliability Engineer

Location
United States
Compensation
Not disclosed
Level
senior
Type
full time · Remote OK

Joblaze summary

In this role, the Senior Site Reliability Engineer at Stability AI focuses on enhancing the cloud infrastructure by collaborating with various teams to ensure system reliability and performance. Key skills include expertise in AWS, Terraform for infrastructure as code, and experience with monitoring tools like Grafana and the ELK stack. This position is ideal for seasoned professionals with a background in software development and a strong understanding of SRE principles. The role offers an opportunity to influence a growing team in a dynamic environment.

Joblaze insights

Quick facts

Is the Senior Site Reliability Engineer role remote?
Yes — Stability AI lists this as a fully remote position.
What's the tech stack?
Joblaze extracted these technologies from the posting: Terraform, AWS, Grafana, ELK stack, Kubernetes.
What seniority level is this role?
Stability AI targets senior candidates for this position.
Is this full-time or contract?
Full-time for this Senior Site Reliability Engineer role at Stability AI.

From the original posting

< Remote - United States >

Job Description:
Stability AI’s Engineering Operations team is looking for a Senior Site Reliability Engineer (SRE) to join our growing team and play a pivotal role in improving and shaping our cloud infrastructure. The person will closely work with engineering, IT, security, and product teams to drive innovation and reliability in an evolving environment. Candidates should have the initiative to build and improve a maturing cloud landscape.

Responsibilities:

  • Developing and enforcing SRE best practices and standards across the organization.
  • Architecting and managing scalable systems in AWS and other cloud environments, focusing on high availability and resilience.
  • Implementing and maintaining infrastructure as code using Terraform.
  • Setting up and refining monitoring, logging, and alerting systems.
  • Driving incident management and root cause analysis to improve system reliability.
  • Championing SRE principles and mentoring junior team members.

Qualifications:

  • Collaborating with development teams to enhance CI/CD pipelines.
  • Experience scaling resource intensive systems, be it storage, networking, or compute.
  • Knowledge and experience with Kubernetes or other container scaling solutions
  • Background in software development or automation scripting.
  • Knowledge and experience with Grafana, ELK stack, or similar tools.
  • Cloud security experience.

Equal Employment Opportunity:

We are an equal opportunity employer and do not discriminate on the basis of race, religion, national origin, gender, sexual orientation, age, veteran status, disability or other legally protected statuses.

Similar positions

Elastic
MongoDB
Site Reliability Engineer (Senior or Staff), Atlas
MongoDB · Austin; Boston; Chicago; Miami; New York City; Philadelphia; Pittsburgh; Raleigh; United States; Washington DC
MongoDB
Site Reliability Engineer (Senior or Staff), Infrastructure Security
MongoDB · Austin; New York City; San Francisco; Seattle; United States
Okta
Staff Site Reliability Engineer - Observability
Okta · Bellevue, Washington; Chicago, Illinois; New York, New York; San Francisco, California; Washington, DC
Okta
Staff Site Reliability Engineer - Observability
Okta · Bellevue, Washington; New York, New York; San Francisco, California; Washington, DC