Sr. Site Reliability Engineer

Job description

US/Canada Remote

Digital Health start-up

MPA has been retained by a leading Digital Health organisation to recruit a Senior Site Reliability Engineer to join their team.

You will be joining an organisation that has grown significantly in recent years to become a leader within the Digital Health sector by bringing much-needed innovation to the stagnant health care industry. Their mission is to help people live healthier, happier lives every day.

Our client is building out a brand new SRE team so as one of their first Site Reliability Engineers, you will help define and implement reliability best practices, which contribute to the platform’s stability and performance through massive growth. You will work within the infrastructure group and frequently collaborate across teams to grow the adoption of reliability tooling and techniques across the whole Engineering department and technical stack.

Essential Duties & Responsibilities

· Support the adoption of Service Level Objectives (SLO) and error budgets

· Enhance the way availability, latency and overall system health is measured and

· monitored

· Improve tooling for observability, alerting and incident response across backend, web

· and mobile platforms

· Improve tooling and guidelines for Load and Performance testing

· Contribute to prioritization of reliability features

· Participate in technical design, architecture decisions and planning discussions

· Participate in an on-call rotation, including incident management

· Support and guide team members in adopting reliability best practices

Desired Outcomes

· Improved systems instrumentation, stability and performance

· Adoption of reliability best pNoractices across the whole technical stack

Required Education / Certificates / Experience:

· 2 years experience in a SRE or related role

· Hands-on experience with cloud infrastructure and monitoring

· Experience with databases (ideally NoSQL), configuration management and container

· orchestration (ideally Kubernetes)

· Proficiency in a backend programming language

· Strong technical background with experience in performance optimizations across different platforms and codebases

Tech stack: Google Cloud, Kubernetes, MongoDB, Go (backend), React, Kotlin (Android), Swift

(iOS), Prometheus, Grafana, Sentry

Sr. Site Reliability Engineer

Job description

For candidates

For employers