Senior DevOps/Site Reliability Engineer @ Focal Systems at Focal Systems #vacancy #remote

Solid experience in an infrastructure or Site Reliability Engineer (SRE) role
Hands-on experience with containerization (Docker) and orchestration platforms (Kubernetes) required
Experience in cloud cost management
Great understanding of SQL, networking, distributed systems, operating systems (debian) and software engineering practices
Experience with messaging systems
Terraform or other Infrastructure as Code automation solution
Operating Relational SQL databases and Redis at terabyte scale.
Proven experience with setting up monitoring/alerting and reliability engineering
Scriptings skills in Python
Flexibility in adjusting working hours to meet the needs of the global and time zone-diverse team (urgent tasks or meetings may occur outside of regular CEST working hours)
Startup mentality, team player and strong sense of responsibility
Fluent in Polish with advanced English skills (written & spoken)
Excellent communication, presentation, and demonstration skills

Nice to have experience:

GitOps
Setting up automation for complex load testing scenarios
Tuning Deep Learning pipelines with Python, Pytorch and Multiprocessing
Backend programming with Python

Note: salary range includes the value of stock options

Focal Systems is the industry leader in retail AI solutions. We are a Silicon Valley based startup that has more than doubled in size every year since inception. We are a Deep Learning first company. Our mission is to automate and optimize brick and mortar retail using deep learning computer vision. Focal Systems has been deployed at scale with the top retailers in the world. We are looking for smart, creative and passionate people who want to help build a great and enduring company and deploy Deep Learning to the world!

You’re a self-starter and enjoy taking ownership. You can effectively manage tasks, projects, and problem-solve. You’re a dynamic engineer who is an effective communicator and enjoys an environment of collaboration. You want to have a big impact and like to push trends in infrastructure.

Mission of the role:
To enable us to scale from 200k to 1 million cameras

Job Summary

As a Senior DevOps/Site Reliability Engineer (SRE) at our company, you will play a pivotal role in ensuring the smooth operation and continuous improvement of our infrastructure, deployment processes, and overall system reliability.

Why Focal Systems

Strong Values and Mission – We are a tightly-knit team with an ambitious mission and a strong set of core values, which define our approach to business and have successfully guided us since inception.

Exceptional Team – We are a team of hard-working, fun-loving professionals from some of the most eminent universities, research labs, and tech companies of our time. We pride ourselves on recruiting exceptional individuals to help us redefine the state-of-the-art.

Outstanding Partners – We work with 10+ of the largest retailers in the world and have a world-class roster of investors, advisors and partners to support & advise us in our endeavors.

Benefits

We care deeply about the health, happiness, and wellbeing of all of our employees. We offer:

Competitive Salary & Attractive Stock
Paid Time Off
Quarterly Team Retreats
Education grants

,[Set up and manage blue/green and canary deployments to ensure smooth launches without downtime., Operate multiple large GCP Kubernetes clusters and fine tune for reliability vs cost, Manage the various distributed services of the company, ensuring to always provide graceful updates, comprehensive test coverage, tracking of logs, and 99.9% uptime, Work with Backend, Frontend and Deep Learning teams and write infrastructure automation code for their needs, Identify scalability bottlenecks through load testing and plan infrastructure architecture, Create tools to provide transparency/ease of access into the company’s rich datasets stored across varying geographic locations and data formats, Design, build, and manage a robust Continuous Integration and Continuous Deployment (CI/CD) pipeline.] Requirements: SQL, Python, Docker, Kubernetes, Networking, GCP, Kafka, Redis, SRE, PUB, Terraform, Infrastructure as Code, GitOps, Deep learning, PyTorch Tools: Jira, Confluence, GIT, GitHub. Additionally: Small teams, International projects, Training budget, Flat structure, Startup atmosphere, No dress code.

CI/CD Infrastructure as Code (IaC) Python Terraform Apache Kafka Software Development Engineer Networking DevOps Google Cloud Platform (GCP) deep-learning Docker load-testing SQL Kubernetes Redis monitoring Site Reliability Engineering (SRE) gitops operating-systems PyTorch

Залишити відповідь Скасувати відповідь