Pablo Opazo

site reliability engineer

Bachelor of Engineering (B.E.) Computer Science

Summary


Reliability-focused SRE believing in moving fast with stable infrastructure. Built production infrastructure from scratch as sole engineer. Scaled systems to 300K+ RPS maintaining 99.99% uptime. Developed collectors and alerting that reduced MTTD from 2 hours to 1 minute. Delivered over half a million dollars in cost savings through automation and cloud optimization. Experienced with both cloud and bare-metal infrastructure, including GPU clusters. When something breaks, I fix it - whether debugging application code, redesigning architecture, or implementing security. Comfortable with early-stage ambiguity and wearing multiple hats to deliver what the business needs

Areas of Expertise

  • Infrastructure Architecture
  • Reliability Engineering
  • Security & Compliance
  • Platform Engineering
  • Infrastructure Automation
  • Database Administration
  • Observability
  • FinOps & Cost Management
  • Technical Leadership

Professional Experience

Harness.io • United States (Remote)

Modern software delivery platform that enables CI, CD, feature flags, and cloud cost management at enterprise scale.

Principal Site Reliability Engineer

2024 - Present

Led infrastructure integration following Harness acquisition of Split, consolidating observability and cutting costs 40%.
Key Accomplishments:

♦ Migrated observability signals (logs, metrics, traces) to unified Grafana Cloud, reducing costs by 40%
♦ Built Kubernetes controller to auto-inject StatsD proxy for seamless Datadog migration
• Established AWS-GCP cloud interconnectivity for cross-cloud workloads
• Developed custom OTEL collector using Native Histograms to improve precision and reduce cardinality
• Migrated autoscaling to KEDA-based solution
• Implemented eBPF-based auto-monitoring

Split.io (Acquired by Harness.io) • United States (Remote)

Split is a feature delivery platform that powers feature flag management, software experimentation, and continuous delivery.

Staff Site Reliability Engineer

2021 - 2024

Led reliability and observability initiatives across tracing, metrics, and performance analysis.
Key Accomplishments:

♦ Achieved 30% YoY incident reduction by identifying patterns in post-mortems and implementing permanent solutions
♦ Built metrics collectors (Go/Rust/Python), reducing MongoDB MTTD from 2 hours to 1 minute
• Scaled infrastructure from thousands to 300K+ RPS using NGINX, Linkerd, and APISIX
• Deployed Temporal clusters for workflow orchestration, replacing Databricks with significant cost savings
• Implemented distributed tracing with OpenTelemetry across multiple K8s clusters (EKS/AKS)

Science (Sequoia Capital) • United States (Remote)

A Healthcare startup based in Miami that focuses on creating AI tools to help doctors and clinics to reduce operational complexity.

Lead DevOps Engineer

2020 - 2021

Built entire infrastructure from scratch. Integrated Kafka, CockroachDB, and Elasticsearch on Kubernetes/OpenShift using operators pattern.
Key Accomplishments:

♦ Sole infrastructure engineer - delivered production, staging, and dev environments in 3 months
• Built CI/CD pipeline with Tekton and self-service ML platform on Kubeflow
• Reduced cluster provisioning from 4 hours to 30 minutes using RHACM
• Architected observability stack with Jaeger/OpenTelemetry and Istio service mesh

uBiome (YCombinator S14) • United States (Remote)

A biotechnology company based in San Francisco that has developed technology to sequence the human microbiomes.

Production Engineer - Technical Lead

2016 - 2019

Built SDLC platform using Kubernetes, Drone CI, and Spinnaker. Implemented cost automation across AWS/GCP and bare-metal, significantly reducing infrastructure spend. Managed deployments and conducted post-mortems to improve reliability.
Key Accomplishments:

♦ Built bare-metal GPU clusters with Nomad scheduling for protein research teams
♦ Implemented analytics platform with PrestoDB and Metabase, reducing query time by 75%
• Cut developer onboarding time by 50% through internal tooling
• Achieved 99.9% PostgreSQL uptime with HA and custom monitoring
• Maintained HIPAA compliance for healthcare data

Education

Computer Science - Undergraduate Studies, 2019

Pontificia Universidad Católica de Chile (PUC)

Bachelor of Engineering (B.E.) Computer Science, 2012

Universidad Tecnológica de Chile INACAP

Technical Stack

OS
  • RHEL
  • CoreOS
  • Ubuntu
Orchestration
  • Kubernetes
  • OpenShift
  • Nomad
Databases
  • MongoDB
  • PostgreSQL
  • CockroachDB
Languages
  • Python
  • Go
  • Rust
IaC
  • Ansible
  • Terraform
  • Pulumi
CI/CD
  • ArgoCD
  • GHA
  • Tekton
Security
  • Vault
  • AWS SM
  • Azure KV
Event Stream
  • Kafka
  • Redis Streams
  • RabbitMQ