- Minimum 8 years of relevant professional experience including Java/Scala and Spark
- Proficiency in all aspects of SDLC, from concept to running production systems
- Proficiency using Spark (PySpark) or Tensorflow
- Experience participating in ETL and ML pipeline projects based on Airflow, Kubeflow, Mleap, Sagemaker or similar
- AWS experience including Kafka, Lambda, Glue, Athena, IAM
- Database experience at large scale, both SQL and NOSQL databases like Postgresql, Cassandra, Neo4j, Neptune, or similar
- Experience in large scale data management formats and frameworks such as Parquet ORC, Databricks / Delta Lake, Iceberg or Hudi
- Bachelor’s degree in Computer Science or related discipline
We are looking for Data Engineers to work remotely for an Adtech company that leverages machine learning and data science to build an identity graph that can scale to reach millions of users via brands with programmatically selected households. The work includes scaling our Big Data asset that combines billions of transaction data points including intent, conversions, first party data into an identity graph that needs to scale to a future cookie less world
We value technical excellence and you will have both resources and time to deliver world-class code.
This is a 100% remote position , You will be working with team members in NYC.
If you like solving hard and technically challenging problems, join us to use those skills here to create real-time, concurrent, globally distributed systems applications and services.
Recruitment process:
- Short call with Varwise
- Initial call with hiring manager or team member
- Take home challenge
- Followup and final technical interview session
,[Work on creating and maintaining reliable and scalable distributed data processing systems , Become a core maintainer of the data lake , Maintain our data lake by building searchable data sets for broader business uses, Scale, troubleshoot and fix existing applications and services, Own a complex set of services and applications, Focus ensuring that our data pipelines run 24/7, Lead technical discussions leading to improvements in tools, processes or projects, Work on scaling our identity graph to deliver impactful advertising campaigns, Work on data sets exceeding billions of records, Work on AWS based infrastructure, Scale our MLOps platform by using both traditional ML as well as LLM/Generative AI based applications] Requirements: Spark, AWS, Linux, NoSQL, SQL, Kafka, Java, Scala, AWS Lambda, Glue, Athena, Cassandra, Neo4j, Databricks, PySpark, Kinesis, Airflow, Jenkins, Datadog, Python, TensorFlow Tools: Jira, GitHub, GIT, Jenkins, Agile, Kanban. Additionally: Small teams, International projects, Team events, 100% Remote Always, International team, Flat structure, Free coffee, Bike parking, Playroom, Free snacks, Free beverages, Modern office, No dress code, In-house trainings.
neo4j PostgreSQL databricks kubeflow Amazon Web Services (AWS) Apache Spark Computer Science Apache Kafka orc Airflow Machine Learning Cassandra parquet aws-glue Amazon SageMaker mleap PySpark SDLC Apache Iceberg Lambdas Scala delta-lake Amazon Athena amazon-neptune SQL Java TensorFlow ETL NoSQL amazon-iam