Site Reliability Engineer / DevOps Engineer
Hybrid – Dallas , TX
Fulltime
About the role
Looking for a Site Reliability Engineer / DevOps Engineer to help operate and evolve a large, cloud-heavy production environment. You’ll work across Linux infrastructure, cloud platforms, CI/CD pipelines, container orchestration, and infrastructure-as-code, with a meaningful portion of your day-to-day involving AI-assisted authoring of infrastructure code and configuration.
This is a hands-on role on a small onshore team that collaborates closely with a larger offshore team. You’ll need to be self-directed, comfortable owning problems end-to-end, and able to work productively across time zones.
What you’ll do
- Build and maintain infrastructure across Azure (primary) and AWS
- Author and maintain infrastructure-as-code using Terraform, following team conventions for module structure, naming, and lifecycle
- Write and operate configuration management automation using Ansible, Chef, or similar tools
- Design, build, and maintain CI/CD pipelines for application and infrastructure delivery
- Operate container workloads on Kubernetes, including troubleshooting, scaling, and lifecycle management
- Use AI coding tools to produce IaC and configuration that is consistent, standards-compliant, and reviewable
- Contribute to observability and instrumentation across the environment
- Participate in incident response and on-call coverage
- Collaborate asynchronously with a globally distributed team
Required
- Experience administering Linux servers in production
- Hands-on experience with a major cloud platform (Azure or AWS)
- Experience with infrastructure-as-code and/or CI/CD platforms (Terraform, ARM, CloudFormation, Jenkins, GitLab CI, or similar)
- Hands-on experience with Docker and Kubernetes
- Demonstrated experience using AI or LLM coding tools (such as Claude Code, Copilot, or Cursor) to produce production-quality infrastructure code or configuration. You should be able to walk us through a real example of what you built, the standards you followed, and how you reviewed the output
- Strong written communication and the ability to operate independently
What you’ll work with
Here’s the environment you’ll be working in. The more of these you’ve worked with, the faster you’ll find your footing.
- Cloud footprint across Azure and AWS, with active migration work between them
- Infrastructure-as-code in Terraform, alongside some ARM and CloudFormation
- Configuration management with Ansible and Chef
- Python and Bash for automation and tooling
- MySQL, PostgreSQL, and NoSQL stores in production
- Observability and instrumentation work, including OpenTelemetry
- Daily collaboration with teammates across time zones
How we work
You’ll be part of a small onshore team working closely with a larger offshore team. Some off-hours collaboration is expected. We need someone who can structure their own work, push problems forward without hand-holding, and communicate clearly in writing across time zones.
Incident response is part of the role. The on-call expectations aren’t a strict rotation, but when critical issues require hands-on attention, we’ll need you available.