MLOps Engineer - AWS, ML Infrastructure - Advertising Services market

5+ years
Long-term (40h)
Advertising Services
Full Remote
AWS
ML Infrastructure
Python
Terraform
CI/CD
Docker

Requirements

Must-haves

  • 5+ years of DevOps, MLOps, or Cloud Infrastructure Engineering experience
  • Experience with AWS services (CDK, Lambda, EC2, S3, SageMaker, CloudWatch)
  • Proficiency with Infrastructure as Code (IaC) tools (Terraform, CloudFormation)
  • Strong experience with Python for scripting and automation
  • Proficiency with containerization using Docker
  • Experience building and maintaining CI/CD pipelines for ML workflows
  • Deep knowledge of ML model lifecycle management, including deployment, monitoring, and retraining
  • Based in Brazil, Argentina, Paraguay, Colombia, or Mexico
  • Strong communication skills in both spoken and written English

Nice-to-haves

  • Startup experience
  • AWS Certifications (e.g. DevOps Engineer, Solutions Architect, Machine Learning Specialty)
  • Background in software engineering or ML/AI infrastructure
  • Bachelor’s Degree in Computer Engineering, Computer Science, or equivalent

What you will work on

ML Infrastructure Architecture & Automation

  • Design, provision, and manage AWS infrastructure for ML workloads using AWS CDK and CloudFormation
  • Architect secure, scalable, and cost-efficient ML environments for experimentation, training, and inference
  • Implement cloud-native services (e.g. EC2, ECS, Lambda, S3, RDS, SageMaker, Bedrock, Step Functions)
  • Apply best practices for security, compliance, and disaster recovery in ML infrastructure

Model Deployment & CI/CD

  • Design and maintain CI/CD pipelines for training, deployment, and retraining of models using CodePipeline, CodeBuild, GitHub Actions, or similar
  • Automate testing, versioning, and rollback strategies for applications and ML models
  • Build and manage Docker containers for microservices and ML applications

MLOps Enablement

  • Collaborate with ML engineers to deploy, monitor, and maintain models in SageMaker
  • Develop end-to-end pipelines for data preprocessing, feature engineering, training, inference, and retraining
  • Integrate model monitoring, drift detection, and automated retraining triggers

Monitoring, Observability & Performance

  • Implement observability frameworks for ML workloads using CloudWatch, DataDog, and other tools
  • Track inference latency, accuracy, and resource usage to optimize performance
  • Troubleshoot production ML systems and lead incident resolution

Collaboration & Documentation

  • Partner with software, ML, and data teams to promote MLOps best practices
  • Maintain clear documentation for infrastructure, deployments, and operational processes
  • Contribute to code reviews and architectural discussions