Role: Machine Learning Engineer/SRE Location: Chicago, IL or Remote Duration: 12 Months Rate: DOE US Citizens and Green cards & GC-EAD Only. No Third-party C2C available for this job Job Description: We are seeking a highly skilled and motivated Machine Learning Engineer who possesses expertise in developing, deploying, and managing machine learning models. In this role, you will be an integral part of our AI Engineering and Site Reliability Engineering (SRE) teams, responsible for managing Azure infrastructure for AI model development and deployment, monitoring and reporting model performance, and responding to outages/incidents related to model operations. Key Responsibilities: Manage Azure Infrastructure: Configure, maintain, and optimize Azure infrastructure for AI model development and deployment, ensuring scalability and performance. Model Performance Monitoring: Implement and maintain monitoring systems to track model performance, proactively identifying and addressing issues as they arise. Incident Response: Collaborate with the SRE team to respond promptly to outages and incidents related to model operations, ensuring minimal downtime and rapid issue resolution. Skills and Qualifications: Azure Infrastructure Experience: Proficiency in managing Azure infrastructure components, including virtual machines, storage, and networking, to support AI model development and deployment. CI/CD Pipeline Experience: Experience with Continuous Integration/Continuous Deployment (CI/CD) pipelines, including the automation of model deployment processes. Containerization in the Cloud: Strong knowledge of containerization technologies in the cloud, such as Docker and Kubernetes, for efficient deployment and scaling of machine learning models. Machine Learning Expertise: Proficient in building and optimizing machine learning models, with a deep understanding of various Client algorithms and frameworks. Programming Skills: Proficiency in programming languages commonly used in machine learning, such as Python and libraries like TensorFlow and PyTorch. Data Management: Experience in data preprocessing, feature engineering, and data pipeline development for machine learning. Collaborative Team Player: Excellent communication skills and the ability to work collaboratively with cross-functional teams, including AI engineers and SREs. Documentation: Effective documentation skills to maintain clear and organized records of models, infrastructure configurations, and incident responses. Preferred Qualifications : Experience with cloud-based machine learning platforms (e.g., Azure Machine Learning). Experience with CI/ CD tools to deploying Client services and applications specific to Azure cloud platform Familiarity with DevOps practices and tools for automating infrastructure and deployments. Knowledge of model versioning and model management tools. Understanding of security best practices in AI model deployment. Certifications in relevant areas, such as Azure certifications or machine learning certifications. Job titles of folks with these skills may vary – e.g. MLOps Lead, MLOps Solution/Delivery Architect or Senior Client Engineer
CI/CD data-management Machine Learning Site Reliability Engineering (SRE) Software Developer