Senior Backend Software Engineer, Data Pipelines (REMOTE – Palo Alto, CA) About Skyflow: Skyflow is a data privacy vault company built to radically simplify how companies isolate, protect, and govern their customers’ most sensitive data. With its global network of data privacy vaults, Skyflow is also a comprehensive solution for companies around the world looking to meet complex data localization requirements. Skyflow currently supports a diverse customer base that spans verticals like fintech, retail, travel, and healthcare. Skyflow is headquartered in Palo Alto, California and was founded in 2019. For more information, visit or follow on X (formerly Twitter) and LinkedIn . About the role: As a Senior Backend Software Engineer, Data Pipelines, you will be responsible for the design and development of complex data processing workflows and for performing batch processing of large amounts of structured and semi-structured data in cloud-native environments. You will do this by leveraging Kafka for asynchronous queues, Docker for containerization, and Kubernetes for orchestration, to achieve high levels of efficiency, scalability, and reliability. Desired Qualifications: Solid experience with Golang (Must-Have) developing highly-available and scalable production-level code to build microservices, batch architectures, and lambda expressions Excellent understanding of data structures; You’re not challenged by keeping a flow of data structures in-memory Experience working with multiple file formats (CSV, JSON, Parquet, Avro, Delta Lake et al.) Knowledge of data warehouse technical architectures, Docker and Kubernetes infrastructure components, and how to develop secure ETL pipelines Experience in pub/sub modes like Kafka Experience working in a Big Data environment (Hadoop, Spark, Hive, Redshift et al.) Experience with relational and non-relational databases Experience building real-time streaming data pipelines is a plus Responsibilities: Containerize each component of the data pipeline (like ETL processes, databases, data processing applications) using Docker to create Dockerfiles, and Docker Images Set up Kubernetes clusters to manage and orchestrate Docker Containers and deploy Pods, as well as create services and load balancing policies Use Kubernetes Volumes for managing data and stateful applications, ensuring that data persists beyond the lifespan of individual Pods Configure Kafka for scalability, ensuring it can handle high volumes of data streams efficiently. Configure Kafka brokers, topics, producers, and consumers, as well as use Kafka Connect to integrate with external databases, systems, or other data sources/sinks Implement logging and monitoring solutions to keep track of the health and performance of your data pipelines Troubleshoot connectivity issues to common datastores such as Amazon S3 and Azure Data Lake Implement network policies in Kubernetes for secure communication between different services Follow best practices for security, such as securing Kafka clusters and implementing proper access controls Benefits: Work from home expense (U.S., Canada, and Australia) Excellent Health, Dental, and Vision Insurance Options (Varies by Country) Vanguard 401k Very generous PTO Flexible Hours Generous Equity Pay: A base salary range of $150,000 – $200,000 can be expected for this role in the San Francisco/Bay Area. You could also be entitled to receive an additional incentive bonus or variable pay, equity, and benefits. Skyflow operates from a place of high trust and transparency; we are happy to disclose the pay range for our open roles that best align with your needs. Exact compensation may vary based on skills, experience, education, and location. At Skyflow, we believe that diverse teams are the strongest teams. We invite applicants of all genders, races, ethnicities, nationalities, ages, religions, sexual orientations, disability statuses, educational experiences, family situations, and socio-economic backgrounds. #J-18808-Ljbffr
Go Backend Apache Spark Amazon Redshift Apache Kafka JSON avro delta-lake Apache Hive CSV Docker data-pipelines Kubernetes RDBMS NoSQL parquet Hadoop