Job Description: Build, maintain, and operate IaaS and PaaS infrastructure in Azure commercial and government clouds Work closely with dev teams to identify and measure SLOs, SLAs and SLIs Act a strong contributor to development of platform services including architecture, provisioning, configuration, deployment, and support Perform integrations with central logging, metrics dashboards, instrumentation, incident monitoring and management Build/integrate/administer systems and tools that enable engineering teams to observe their applications in production with autonomy (Dashboards, APMs). Support software and/or cloud-infrastructure in an on-call rotation basis Assist with identification and remediation of technical problems at the root cause by continuously implementing automation, self-healing, and real-time monitoring to production systems Maintain and improve operational tooling, frameworks, Build frameworks that test the performance and resiliency of our platform services/tools Automate alerts for metrics on performance, cost, vulnerabilities, risk, compliance violations Improve processes and champion automation of any manual items around support. Top Skills required:
- Azure, Terraform, and Kubernetes
- Dev background
Additional Skills & Qualifications: 4 + years of experience working within a cloud engineer/SRE role Expert knowledge of a cloud service provider Expert knowledge and hands on production experience in Kubernetes (bare metal or managed) cluster setup and management required. Experience with infrastructure as code (IaC) tools like Terraform, Pulumi. Experience with Kubernetes deployment tools like Helm, ArgoCD, Flux Strong awareness of networking and internet protocols. Understanding of identity and access management (IAM) Experience supporting infrastructure in production cloud environments. Knowledge of Encryption, Public Key Infrastructure (PKI), understanding of OWASP Experience working with RESTful services Some experience with monitoring tools (Azure Monitor, Splunk, Dynatrace, Graphana, Prometheus). Familiarity with IDEs and Source Control tools like Visual Studio Code and Git. Employee Value Proposition (EVP): This is an opportunity to build out a new cloud environment for the customer and create the architecture to host billions of dollars worth of government contracts moving forward. This person will be a key member of the team and they will be relied upon to provide direction to a growing team of Cloud Engineers.
Git REST Splunk kubernetes-helm Cloud Engineer azure-monitor Infrastructure as Code (IaC) pulumi Terraform encryption Azure fluxcd OWASP argocd pki DevOps Prometheus Kubernetes Visual Studio Code amazon-iam Site Reliability Engineering (SRE) dynatrace