Pablo Opazo

site reliability engineer

Bachelor of Engineering (B.E.) Computer Science

Summary


Results-driven software engineer with a proven track record of supporting development and infrastructure teams in delivering robust software solutions. Skilled in designing and implementing efficient SDLC processes, ensuring seamless delivery of software services. Experienced in providing services on Linux environments, focusing on continuous integration, continuous deployment, and infrastructure optimization. Adept at executing complex tasks and achieving successful results through diligence and problem-solving abilities.

Areas of Expertise

  • Software Development
  • Database Administration
  • Configuration Management
  • Root Cause Analysis
  • Performance Analysis
  • Systems Architecture
  • Project Management
  • Agile Methodologies
  • DevOps Practitioner

Professional Experience

Split.io (Acquired by Harness.io) •United States (Remote)

Split is a feature delivery platform that powers feature flag management, software experimentation, and continuous delivery.

Staff SRE

2021 to 2024

All things related with Reliability/Observability (tracing, network metrics, performance analysis)

Key Accomplishments:
Conducted root cause analysis investigations to identify and resolve diverse issues, resulting in a 30% year-over-year reduction in incidents.
As a member of the technical leadership team, developed and reviewed multiple specification documents.
Spearheaded technical evaluations of third-party solutions, identifying cost-effective tooling that maintained critical functionality. Achieved 50% cost savings on observability expenses.
Deployed and maintained multiple Temporal clusters for durable execution of workflows, resulting in considerable cost savings by migrating from Databricks and enhancing developer experience and productivity.
Scaled network traffic from thousands of requests per second to a range of 300,000+ using NGINX, Linkerd, and Apache APISIX.
Developed metrics collectors in Go, Rust, and Python, reducing MongoDB cluster Mean Time to Detection (MTTD) from 2 hours to 1 minute, significantly improving observability and issue resolution.
Participated in incident management process and post-mortem document reviews.
Maintained and enhanced in-house software deployment tools written in Python.
Managed and administered multiple Kubernetes clusters, including EKS and AKS.
Implemented a distributed tracing architecture using OpenTelemetry Collectors for application performance monitoring.

Science (Sequoia Capital) • United States (Remote)

A Healthcare startup based in Miami that focuses on creating AI tools to help doctors and clinics to reduce operational complexity.

Lead DevOps Engineer

2020 to 2021

Built out our entire infrastructure from scratch. Successfully integrated multiple systems on top Kubernetes (OpenShift) like Kafka, CockroachDB, Elasticsearch using the operators pattern.

Key Accomplishments:
As the sole infrastructure engineer, I designed and deployed scalable production, development, and staging environments in just three months.
Managed multiple Kubernetes OpenShift clusters using RHACM (Red Hat Advanced Cluster Management), optimizing the provisioning process to reduce deployment time from 4 hours to just 30 minutes, an improvement of 87.5%.
Architected our tracing infrastructure using Jaeger + OpenTelemetry.
Implemented Service Mesh networking with Istio.
Developed our software delivery pipeline using Tekton. (Build/Testing/Deploy)
Implemented a self-service ML platform on Kubeflow and OpenShift, streamlining model development and deployment.

uBiome (YCombinator S14) • United States (Remote)

A biotechnology company based in San Francisco that has developed technology to sequence the human microbiomes.

Production Engineer - Technical Lead

2016 to 2019

Designed and implemented an SDLC solution using Kubernetes, Drone CI, and Spinnaker CD to ensure smooth delivery of software services. Developed cost-saving strategies and automation for cloud providers (AWS/GCP) and bare-metal infrastructure, significantly reducing engineering costs. Streamlined datacenter operations, managed new software release deployments, and conducted post-mortem analyses to identify and resolve failure causes.

Key Accomplishments:
Supported the production team and improved the onboarding process for new developers through the development of internal tools. These efforts resulted in a 50% reduction in Time to Productivity (TTP)
Implemented a Lakehouse platform using PrestoDB, connecting multiple databases to resolve data inconsistencies and reduce data retrieval time for analytics by 75%.
Designed, built, and maintained bare-metal GPU clusters in an early datacenter environment to support the organization's protein research initiatives.
Implemented network solutions across international offices, enhancing connectivity and security through the introduction of services such as internet access and network security measures.
Ensured confidentiality, integrity, and security of data by ensuring adhering to HIPAA standards.
Maintained 99.9% uptime on databases PostgreSQL through implementation of high availability solution and custom monitoring.

Education

Computer Science - Undergraduate Studies, 2019

Pontificia Universidad Católica de Chile (PUC)

Bachelor of Engineering (B.E.) Computer Science, 2012

Universidad Tecnológica de Chile INACAP

Technical Skills

  • RHEL
  • CoreOS
  • Ubuntu
  • OpenShift
  • Kubernetes
  • Nomad
  • CockroachDB
  • MongoDB
  • PostgreSQL
  • Python
  • Go
  • Rust
  • Ansible
  • Salt
  • Chef
  • Tekton
  • ArgoCD
  • Concourse