MLOps Engineer
- Accepting Applications Closed
- US Company | Medium (51-250 employees)
- LATAM (100% remote)
- 5+ years
- Long-term (40h)
- Advertising Services
- Full Remote
Required skills
- AWS
- ML Infrastructure
- Python
- Terraform
- CI/CD
- Docker
Requirements
Must-haves
- 5+ years of DevOps, MLOps, or Cloud Infrastructure Engineering experience
- Experience with AWS services (CDK, Lambda, EC2, S3, SageMaker, CloudWatch)
- Proficiency with Infrastructure as Code (IaC) tools (Terraform, CloudFormation)
- Strong experience with Python for scripting and automation
- Proficiency with containerization using Docker
- Experience building and maintaining CI/CD pipelines for ML workflows
- Deep knowledge of ML model lifecycle management, including deployment, monitoring, and retraining
- Based in Brazil, Argentina, Paraguay, Colombia, or Mexico
- Strong communication skills in both spoken and written English
Nice-to-haves
- Startup experience
- AWS Certifications (e.g. DevOps Engineer, Solutions Architect, Machine Learning Specialty)
- Background in software engineering or ML/AI infrastructure
- Bachelor’s Degree in Computer Engineering, Computer Science, or equivalent
What you will work on
- ML Infrastructure Architecture & Automation
- Design, provision, and manage AWS infrastructure for ML workloads using AWS CDK and CloudFormation
- Architect secure, scalable, and cost-efficient ML environments for experimentation, training, and inference
- Implement cloud-native services (e.g. EC2, ECS, Lambda, S3, RDS, SageMaker, Bedrock, Step Functions)
- Apply best practices for security, compliance, and disaster recovery in ML infrastructure
- Model Deployment & CI/CD
- Design and maintain CI/CD pipelines for training, deployment, and retraining of models using CodePipeline, CodeBuild, GitHub Actions, or similar
- Automate testing, versioning, and rollback strategies for applications and ML models
- Build and manage Docker containers for microservices and ML applications
- MLOps Enablement
- Collaborate with ML engineers to deploy, monitor, and maintain models in SageMaker
- Develop end-to-end pipelines for data preprocessing, feature engineering, training, inference, and retraining
- Integrate model monitoring, drift detection, and automated retraining triggers
- Monitoring, Observability & Performance
- Implement observability frameworks for ML workloads using CloudWatch, DataDog, and other tools
- Track inference latency, accuracy, and resource usage to optimize performance
- Troubleshoot production ML systems and lead incident resolution
- Collaboration & Documentation
- Partner with software, ML, and data teams to promote MLOps best practices
- Maintain clear documentation for infrastructure, deployments, and operational processes
- Contribute to code reviews and architectural discussions
Sign up for Strider today to get matched with top opportunities and receive job alerts.
Create your accountGet matched with the best remote opportunities from today's top US companies
Find great opportunities
Earn more compensation for your hard work
Access exclusive benefits like healthcare, English classes, and more
1-1 individualized training to succeed in the international job market