Site Reliability Engineer at Stacklet #vacancy #remote

Stacklet helps organizations with large cloud estates optimize cost, improve security, and ensure compliance by simplifying and automating all aspects of governance via code. Our company was founded by the creators and maintainers of CNCF’s Cloud Custodian, an open-source project used today by thousands of well-known global brands.

Our Stacklet Platform is an award-winning governance as code solution that enables teams to identify and remediate cloud governance issues while establishing preventative guardrails against their recurrence. Renowned for supporting some of the world’s most substantial cloud service consumers, Stacklet Platform helps mitigate cloud waste and risk on a large scale.

ROLE DESCRIPTION / OVERVIEW

Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other Stacklet production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments and the Stacklet codebases.

AS AN SRE YOU WILL

Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents.
Use your on-call shift to prevent incidents from ever happening.
Make monitoring and alerting alert on symptoms and not on outages.
Document every action so your findings turn into repeatable actions–and then into automation.
Improve the deployment process to make it as boring as possible.
Design, build and maintain core infrastructure pieces that allow successful scaling of the Stacklet platform
Debug production issues across services and levels of the stack.
Partner with development teams to improve services through rigorous testing and release procedures
Participate in system design consulting, platform management, and capacity planning
Create sustainable systems and services through automation and uplifts
Balance feature development speed and reliability with well-defined service level objectives

WHAT WE’RE LOOKING FOR

A mind for systems – edge cases, failure modes, behaviors, specific implementations.
You know your way around Linux and the Unix Shell.
Strong programming skills – Python and/or Go
Collaborate and communicate asynchronously.
You document all the things so you don’t need to learn the same thing twice.
You have an enthusiastic, go-for-it attitude. When you see something broken, you can’t help but fix it.
You love delivering quickly and iterating fast.

YOU’LL ENJOY WORKING WITH US BECAUSE

100% remote. Slack, Google Meet, Zoom, and more for communication
Company laptop for development
Home office budget
Github and JIRA, lightweight agile process
Work with new technologies
AWS training and certification opportunities
Personal growth experience by helping to build a truly successful company
Participate in, and help to shape, a great culture
Work with a stellar team where you can have a huge impact
Work hard, play hard – regular company-wide fun virtual events, from games to happy hours
Travel 2 – 4 times a year for internal and external events
Career growth with opportunities to earn advancement
Equity compensation and benefits

Stacklet believes a diverse workforce enhances our ability to deliver world class products and services. We are committed to ensuring equal employment opportunities to all qualified individuals. Qualified applicants will receive consideration for employment without regard to their race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.

Go Python Amazon Web Services (AWS) Unix shell Site Reliability Engineering (SRE) Linux pagerduty

Залишити відповідь Скасувати відповідь