Analytics Engineer at Datumo #vacancy #remote

Datumo specializes in providing Data Engineering and Cloud Computing consulting services to clients from all over the world, primarily in Western Europe, Poland and the USA. Core industries we support include e-commerce, telecommunications and life science. Our team consists of exceptional people whose commitment allows us to conduct highly demanding projects

Our team members tend to stick around for more than 3 years, and when a project wraps up, we don’t let them go – we embark on a journey to discover exciting new challenges for them. It’s not just a workplace; it’s a community that grows together! 

What we expect: 

Must-have: 

● at least 3 years of commercial experience in programming

● proven record with a selected cloud provider GCP preferred, Azure or AWS 

● good knowledge of JVM languages – Scala or Java or Kotlin

● good knowledge of Python

● good knowledge of SQL

● experience in one of data warehousing solutions: BigQuery/Snowflake/Databricks or similar

● in-depth understanding of big data aspects like data storage, modeling, processing, scheduling etc.

● data modeling and data storage experience 

● ensuring solution quality through automatic tests, CI / CD and code review

● proven collaboration with businesses 

● English proficiency at B2 level, communicative in Polish 

Nice to have: 

● knowledge of dbt, Docker and Kubernetes, Apache Kafka 

● familiarity with Apache Airflow or similar pipeline orchestrator 

● another JVM (Java/Scala/Kotlin) programming language 

● experience in Machine Learning projects 

● understanding of Apache Spark or similar distributed data processing framework

● familiarity with one of BI tools: Power BI/Looker/Tableau

● willingness to share knowledge (conferences, articles, open-source projects) 

What’s on offer: 

● 100% remote work, with workation opportunity 

● 20 free days 

● onboarding with a dedicated mentor

● project switching possible after a certain period 

● individual budget for training and conferences 

● benefits: Medicover private medical care, co-financing of the Medicover Sport card

● opportunity to learn English with a native speaker 

● regular company trips and informal get-togethers 

Development opportunities in Datumo: 

● participation in industry conferences 

● establishing Datumo’s online brand presence 

● support in obtaining certifications (e.g. GCP, Azure, Snowflake) 

● involvement in internal initiatives, like building technological roadmaps

● training budget 

● access to internal technological training repositories 

Discover our exemplary project: 

Cost optimization on Snowflake data platform

Datumo optimized a Snowflake-based platform for a pharmaceutical company, aiming to reduce costs and enhance ELT processes. Before we stepped in, the Client had to manage 1 petabyte across 200 tables. Airflow orchestrated the platform, using Python scripts for data extraction, focusing on data snapshots with hundreds of millions of records. Strategic use of deltas, external tables, and reduced time travel periods led to almost 50% cut in storage volume.

Analytics engineering on Google Cloud Platform

The project entails creating and improving data pipelines on Google Cloud Platform (GCP) to aid analytics and data science teams. The objective is to optimize data workflows utilizing Cloud Composer (Apache Airflow), BigQuery, and Dataproc (Apache Spark) for scheduling, warehousing, and processing respectively. Key responsibilities encompass optimizing SQL queries for better performance, developing internal libraries to streamline tasks, and advocating for data processing best practices. Additionally, the project offers opportunities for progression into data science or MLOps.

Recruitment process: 

● Quiz – 15 minutes 

● Soft skills interview – 30 minutes

● Technical interview – 60 minutes 

Find out more by visiting our website –  

If you like what we do and you dream about creating this world with us – don’t wait, apply now!

databricks Code review Tableau data-extraction Apache Spark Amazon Web Services (AWS) Apache Kafka Data Engineering Azure Kotlin snowflake-cloud-data-platform dataproc remote work Looker Conferences data-modeling cloud-computing Google Cloud Platform (GCP) MLOps Mentoring Docker data-pipelines Airflow Machine Learning external-tables open-source Software Developer DBT Power BI Business Intelligence (BI) E-commerce CI/CD Scala Python Data Science JVM google-cloud-composer onboarding Telecommunication SQL Kubernetes Java google-bigquery

Leave a Reply