Stacklet helps organizations with large cloud estates optimize cost, improve security, and ensure compliance by simplifying and automating all aspects of governance via code. Our company was founded by the creators and maintainers of CNCF’s Cloud Custodian, an open-source project used today by thousands of well-known global brands.
Our Stacklet Platform is an award-winning governance as code solution that enables teams to identify and remediate cloud governance issues while establishing preventative guardrails against their recurrence. Renowned for supporting some of the world’s most substantial cloud service consumers, Stacklet Platform helps mitigate cloud waste and risk on a large scale.
ROLE DESCRIPTION / OVERVIEW
Site Reliability Engineers (SREs) are responsible for keeping all user-facing services and other Stacklet production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments and the Stacklet codebases.
AS AN SRE YOU WILL
- Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents.
- Use your on-call shift to prevent incidents from ever happening.
- Make monitoring and alerting alert on symptoms and not on outages.
- Document every action so your findings turn into repeatable actions–and then into automation.
- Improve the deployment process to make it as boring as possible.
- Design, build and maintain core infrastructure pieces that allow successful scaling of the Stacklet platform
- Debug production issues across services and levels of the stack.
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and uplifts
- Balance feature development speed and reliability with well-defined service level objectives
WHAT WE’RE LOOKING FOR
- A mind for systems – edge cases, failure modes, behaviors, specific implementations.
- You know your way around Linux and the Unix Shell.
- Strong programming skills – Python and/or Go
- Collaborate and communicate asynchronously.
- You document all the things so you don’t need to learn the same thing twice.
- You have an enthusiastic, go-for-it attitude. When you see something broken, you can’t help but fix it.
- You love delivering quickly and iterating fast.
YOU’LL ENJOY WORKING WITH US BECAUSE
- 100% remote. Slack, Google Meet, Zoom, and more for communication
- Company laptop for development
- Home office budget
- Github and JIRA, lightweight agile process
- Work with new technologies
- AWS training and certification opportunities
- Personal growth experience by helping to build a truly successful company
- Participate in, and help to shape, a great culture
- Work with a stellar team where you can have a huge impact
- Work hard, play hard – regular company-wide fun virtual events, from games to happy hours
- Travel 2 – 4 times a year for internal and external events
- Career growth with opportunities to earn advancement
- Equity compensation and benefits
Stacklet believes a diverse workforce enhances our ability to deliver world class products and services. We are committed to ensuring equal employment opportunities to all qualified individuals. Qualified applicants will receive consideration for employment without regard to their race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.
Go Python Amazon Web Services (AWS) Unix shell Site Reliability Engineering (SRE) Linux pagerduty