Senior Data Engineer – Remote at Convey Health Solutions Holdings Inc. #vacancy #remote

We are seeking a highly skilled Senior Data Engineer with expertise in SQL, Python and PySpark, with extensive knowledge and practical experience in utilizing AWS services. The ideal candidate will have a strong background in data engineering, with a focus on building scalable and efficient data pipelines. The person in this role will work with a wide array of healthcare data for ingestion, processing and consumption including but not limited to eligibility, claims, payments and risk adjustment.

KEY DUTIES AND RESPONSIBILITIES:

* Design, develop, and maintain robust data pipelines using Python and PySpark to process large volumes of healthcare data efficiently in a multitenant analytics platform.

* Collaborate with cross-functional teams to understand data requirements, implement data models, and ensure data integrity throughout the pipeline.

* Optimize data workflows for performance and scalability, considering factors such as data volume, velocity, and variety.

* Implement best practices for data ingestion, transformation, and storage in AWS services such as S3, Glue, EMR, Athena, and Redshift.

* Model data in relational databases (e.g., PostgreSQL, MySQL) and file-based databases to support data processing requirements.

* Design and implement ETL processes using Python and PySpark to extract, transform, and load data from various sources into target databases.

* Troubleshoot and enhance existing ETLs and processing scripts to improve efficiency and reliability of data pipelines.

* Develop monitoring and alerting mechanisms to proactively identify and address data quality issues and performance bottlenecks.

EDUCATION AND EXPERIENCE:

* Minimum of 5 years of experience in data engineering, with a focus on building and optimizing data pipelines.

* Expertise in Python programming and hands-on experience with SQL and PySpark for data processing and analysis.

* Proficiency in Python frameworks and libraries for scientific computing (e.g. Numpy, Pandas, SciPy, Pytorch, Pyarrow).

* Strong understanding of AWS services and experience in deploying data solutions on cloud platforms.

* Experience working with healthcare data, including but not limited to eligibility, claims, payments, and risk adjustment datasets.

* Expertise in modeling data in relational databases (e.g., PostgreSQL, MySQL) and file-based databases, ETL processes and data warehousing concepts.

* Proven track record of designing, implementing, and troubleshooting ETL processes and processing scripts using Python and PySpark.

* Excellent problem-solving skills and the ability to work independently as well as part of a team.

* Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.

* Relevant certifications in AWS or data engineering would be a plus.

PostgreSQL PySpark amazon-s3 pandas data-warehouse Python Amazon Web Services (AWS) Amazon Redshift Data Engineering Amazon Athena Amazon EMR NumPy SQL ETL aws-glue MySQL pyarrow SciPy PyTorch

Leave a Reply