5+ years
Long-term (40h)
Advertising Services
Full Remote
AWS
Python
Terraform
Docker
CI/CD
Requirements
Must-haves
- 5+ years of DevOps or cloud infrastructure experience
- Experience with AWS services (e.g. CDK, Lambda, EC2, S3, SageMaker, CloudWatch)
- Proficiency with Python for scripting and infrastructure automation
- Experience with Infrastructure as Code (e.g. Terraform, CloudFormation)
- Hands-on experience with Docker
- Experience with CI/CD pipeline creation and maintenance
- Strong communication skills in both spoken and written English
Nice-to-haves
- Startup experience
- AWS certifications (e.g. DevOps Engineer, Solutions Architect, Machine Learning Specialty)
- Background in software engineering or ML/AI infrastructure
- Bachelor's Degree in Computer Engineering, Computer Science, or equivalent
What you will work on
- Develop and manage scalable infrastructure and deployment workflows in AWS for data and machine learning applications
- Build cloud-native systems with a focus on infrastructure as code, containerization, and CI/CD automation
- Author infrastructure using AWS CDK with strong proficiency in AWS services and Python
- Support ML workflows by integrating services like SageMaker and contributing to model operations infrastructure
Infrastructure Development & Automation:
- Design, provision, and manage infrastructure in AWS using CDK and CloudFormation
- Build secure, scalable, and cost-effective environments for machine learning and analytics workloads
- Operate cloud-native services (e.g. EC2, ECS, Lambda, S3, RDS, SageMaker, Bedrock)
- Apply best practices for security, compliance, and disaster recovery
CI/CD & Deployment Automation:
- Design and maintain deployment pipelines using CodePipeline, CodeBuild, GitHub Actions, or similar
- Automate testing, deployment, and rollback processes
Containerization & Orchestration:
- Build and manage containerized applications using Docker
- Deploy services on ECS or Lambda with container-based runtimes
- Set up image build, versioning, and artifact management workflows
Machine Learning & Model Operations Support:
- Collaborate with ML engineers to deploy and maintain models in SageMaker
- Integrate pipelines for pre-processing, inference, and model retraining
- Monitor model performance, logging, and metrics
Monitoring, Observability & Logging:
- Set up alerting and observability tools (e.g., CloudWatch, DataDog)
- Investigate and resolve infrastructure, deployment, and performance issues
Collaboration & Documentation:
- Partner with ML, software, and data teams to support DevOps practices
- Maintain documentation for infrastructure and operational workflows
- Participate in architecture discussions and code reviews