Senior Data Scientist (NLP and LLMs) at Brainly Sp. z o.o. #vacancy #remote

WHAT IS REQUIRED

4+ years of experience with Deep Learning models for NLP and transformers architecture
4+ years of working experience in Python and the PyData stack or other numerical programming languages
Experience with analyzing and producing insights from digital product datasets using both qualitative and quantitative techniques
Experience with modern Cloud Computing (preferably AWS)
Strong theoretical background in at least a few among natural language processing (especially modern language models), high-dimensional classifiers, regression models, clustering algorithms, recommender systems, time-series analysis, Bayesian inference, text analytics, knowledge graphs, representation learning (embeddings), computer vision, or social network analysis.
At least some of the data analysis and visualization tools such as pandas, dask, vaex, matplotlib, seaborn, plotly, dash, bokeh, shap, streamlit.
Fluent verbal and written English skills
Ability to think strategically, connecting the dots in the big picture, framing the right problem, balancing trade-offs, and producing actionable insights for our decision-makers
Ability to synthesize key messages and action items in the form of executive summaries (in different forms) and present complex ideas and technical findings to non-technical audiences or with the language of a C-level
Ability to convey complex analyses with the most efficient and intuitive visual interactions and data storytelling

WHAT IS PREFERRED

Experience with prompt engineering, fine-tuning, or evaluating LLMs in production
Experience with HuggingFace transformers or other deep learning models
Experience with text mining and text analytics
Experience with data engineering, ETL jobs, or feature engineering
Knowledge of at least some of the data engineering technologies such as Spark, DataBricks, Glue, EMR, Docker, Kubernetes, SQL, key-value stores, Redshift, Snowflake
Being familiar with ML technologies such as AWS SageMaker, Tensorflow Extended, PyTorch, Spark ML, scikit-learn, XGBoost, KubeFlow, Neptune, Flyte, MLFlow, or related frameworks
Experience with cooperating and communicating with the top management to inform decisions via data science
Bachelor’s degree or above in STEM fields (science, technology, engineering, or mathematics) or a similarly quantitative field
Ability to break research down into clearly defined tasks and quick iterations
Ability to work extremely fast, with short feedback loops and in close collaboration with other team members and multiple external stakeholders
Strong analytical thinking – ability to explore data and draw conclusions
Capable of solving problems in an unconventional manner and not getting stuck at obstacles
A scientific mindset with the ability to ask the right questions, as well as answer them
Ability to stay up to date with the latest academic research and implement state-of-the-art methods
Familiar with agile development and lean principles.
Team player attitude

The AI Research Team is dedicated to bridging the gap between machine learning and the business domain, acting as an incubator for new AI initiatives, feasibility studies on state-of-the-art solutions implemented in our domain, discovering new AI opportunities, supporting existing projects, and providing research capabilities to our leadership, collaborating with them using our data science and analytics expertise to add value and ensure Brainly’s success.

Specifically, our AI strategy and roadmap are investing more and more in our capacity to best exploit modern LLMs for question answering, quality assurance, and other educational tasks, and build domain-specific layers around them (e.g. learners’ personalization) which is one of our main areas of research focus.

You will have the chance to work with top-class scientists, engineers, and domain experts, and to drive the data science and research processes of our LLM-based product features end-to-end.

The ideal candidate is an enthusiast of the educational domain with a blend of coding, machine learning, and statistics skillset.

,[Conduct dedicated research and experiments with the latest state-of-the-art of Machine Learning (including NLP, LLMs, Computer Vision, traditional statistical learning…) applied to Brainly data and education domain., Provide the CTO and other stakeholders with analytics reports that facilitate decision-making related to AI strategy and roadmap., Provide suggestions for new utilization of AI (e.g. LLMs use cases) as part of Brainly product features or optimization of internal processes., Develop proof-of-concepts or prototypes that can be further engineered and productized., Train and share knowledge with the rest of ml practitioners on those advances., Develop reusable tools, and scientific and programmatic methodologies for rapid experimentation and evaluation of a variety of AI applications e.g. question answering, content quality assurance, language chains, embeddings, personalization, classification, tagging, entity extraction, summarization, paraphrasing, text cleaning, ranking/comparison, information retrieval, object detection…, Partner and rapidly provide data science support and programmatic tools to the AI Operations team and other human subject matter experts to produce the ground truth datasets required to validate our hypothesis or train/calibrate our algorithms., Assess the behavior of the current state of Brainly technology used in production, or in development environments, and provide advanced insights about strengths, weaknesses, biases, content characteristics, users’ intentions, and areas for improvement e.g. x-raying our internal GPT-based AnswerBot solution over different cohorts., Collaborate with other teams in the rest of the company to provide consulting, initial research, prototypes, and recommendations of which AI/ML techniques and industry practices to implement as part of the R&D of new or existing projects. Enable Brainly employees who are dealing with AI technologies to learn how to use them, what to expect from it, ] Requirements: Python, NLP, Deep learning, transformers, AWS, LLMs Additionally: Sport subscription, Training budget, Private healthcare, Dental Care Package, Stock options, AskHenry, Mental Health Helpline.

databricks Matplotlib Agile scikit-learn kubeflow feature-engineering streamlit Analytical skills Amazon Web Services (AWS) Apache Spark Data Engineering hyphen vaex Amazon EMR Knowledge graphs snowflake-cloud-data-platform cloud-computing deep-learning Docker xgboost plotly dask text-mining bokeh seaborn aws-glue Amazon SageMaker computer-vision Natural language processing (NLP) pandas Data Science Python recommendation-systems Amazon Redshift Prompt engineering STEM amazon-neptune SQL shap mlflow Kubernetes ETL PyTorch LLM

Залишити відповідь Скасувати відповідь