Apex Systems is seeking a Machine Learning Operations Engineer for our Des Moines, IA based client. This role will deploy internal and external/ open-source ML frameworks/models and pipeline in production Kubefarm ensuring seamless functioning of the pipeline. Interested candidates should send their resume to Hunter at [email protected]
Responsibilities
- Experience in creating and managing production scale Kubernetes clusters, Deep understanding of Kubernetes networking
- An understanding of writing Infrastructure-as-Code (IaC), using tools like CloudFormation or Terraform
- Support our Kubernetes based projects to resolve critical and complex technical issues
- Performing application deployments on Kubernetes cluster
- Securely managing Kubernetes Cluster on at least one of the cloud providers (AWS, Azure or GCP cloud)
- Good knowledge about Kubernetes core concepts (Deployment, ReplicaSet, DaemonSet, Statefulsets, Jobs, Secrets, Ingress, Storage services, Networking services)
- Experience in setting up monitoring and alerting for Kubernetes cluster using opensource monitoring tools like Grafana, Prometheus.
- Managing resource quotas, loads and cost estimates for different applications at cluster leve l
- De eply engage with our stakeholders to understand architecture and operations and work to continuously improve overall Kubernetes support experience.
Qualifications :
- 3+ years of industry experience working in Software Engineering, DevOps or Data Engineering with Data Science and MLOps experience.
- 2+ years of experience working in containerized applications like Kubernetes and Docker and creating and maintaining docker-based micro-services API
- Certifications in CKA (Certified Kubernetes Administrator) and CKAD (Certified Kubernetes Application Developer) will be an added advantage
- Experience in building and deploying distributed systems in AWS
- Experience with connecting models with data from different sources (databases, API, Apache Kafka) via Kubernetes
- Proficiency in Python and scripting languages
- Familiarity with high performance computing and Linux environment
- Experience with version control and workflow management (Airflow, Nextflow, Argo)
- Experience with working on AWS services (S3, RDS, EC2, EBS, SQS, Lambda)
- Experience in building orchestration pipeline to convert plain python models into a deployable API/RESTful endpoint.
- Experience in designing pipelines for ML model serving (Seldon-Core, TorchServe, TensorFlow Serving, NVIDIA Triton inference server, etc.)
- Experience in continuous integration/continuous delivery (CI/CD) pipelines for ML applications using GitLab/Jenkins
EEO Employer
Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at [email protected] or .
Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated’s Best of Staffing® in Talent Satisfaction in the United States and Great Place to Work® in the United Kingdom and Mexico.
GitLab amazon-s3 Lambdas Python Terraform Amazon Web Services (AWS) Azure Prometheus Google Cloud Platform (GCP) MLOps amazon-ebs amazon-cloudformation amazon-sqs Airflow Kubernetes seldon-core amazon-rds Grafana Jenkins nextflow amazon-ec2 torchserve