EdTech global powerhouse! We are looking for a 'Data Engineer' to lead the data and ML platform for new product and engine development based on 970 million learning behavior data, changing the future of AI education at Catch It Play.
We are seeking a data engineer who can reliably handle large-scale learning logs and be responsible for the ML model infrastructure. We want someone who can start from the data platform and gradually expand their role into MLOps.
• Someone who can consider both the stability and cost of data pipelines.
• Someone who can discuss the necessity and context of features with ML engineers.
• Someone who prefers an approach that improves gradually from an MVP rather than seeking perfection from the start.
• Someone who has the responsibility to trace the cause of any failures and prevent their recurrence.
• Someone interested in expanding their role from the data platform to the ML platform and MLOps.
A. Data Platform (~60%)
• Design, implement, and operate large-scale event, log, and learning data collection and processing pipelines from live services.
• Ensure stable operation of **ETL/ELT pipelines** using workflow tools like Airflow, handling failures, performance optimization, and cost efficiency.
• Design, build, and asset data warehouses (DW) and data marts (DM) in a form that is usable by analysts, ML engineers, and planning teams.
• Establish data reliability through data quality management and governance (metadata, catalogs, access control).
• Lead improvements in architecture considering scalability and cost efficiency for cloud-based data infrastructure (AWS, GCP, etc).
B. ML Platform and MLOps Adjacent Areas (~40%)
• Collaborate with the ML team to operate data pipelines for model training and serving for recommendations, matching, and churn prediction, as well as manage the **Feature Store**.
• Operate model serving and monitoring infrastructure together to ensure low-latency inference environments and operational stability.
• Implement data and performance drift detection and alarm systems.
• More than 3 years of hands-on experience in data engineering
• Proficient in Python and SQL
• Experience in processing large-scale user logs (high-volume event processing)
• Hands-on experience with distributed processing frameworks like Spark and Flink
• Experience in building ETL pipelines (Airflow, Prefect, etc)
• Understand the differences between batch and streaming data processing and be able to design for both
• Experience in building data infrastructure in cloud environments (AWS, GCP, etc)
• Experience in operating container environments based on Docker / Kubernetes
• Experience in managing infrastructure as code (IaC, Terraform, etc)
• Experience in building real-time streaming platforms (Kafka, Kinesis, etc)
• Experience operating large-scale data warehouses for analytics (BigQuery, Redshift, Snowflake, etc)
• Experience building or operating Feature Stores (Feast, etc)
• Experience building ML model training and serving pipelines (MLflow, Kubeflow, etc)
• Experience with LLM/large model inference infrastructure (vLLM, TGI, etc)
• Experience in detecting data/performance drift in ML models (Evidently, WhyLabs, etc)
• Experience managing inference latency and availability SLA (Prometheus + Grafana, etc)
• Experience processing large-scale user behavior logs in EdTech, gaming, or recommendation services
• Experience contributing to open-source or presenting at tech conferences/papers
Benefits and Work Environment
• 🏠 Full remote work environment - a productive work environment based on full remote work available anywhere in the country.
• 📊 Stock option program - stock options for key R&D personnel (considering granting after 1 year of stable work).
• 📈 Global growth experience - core experience with a trend-setting product growing globally with a goal of 10 million downloads (Google features, etc).
• 💼 Core system development experience - directly involved in infrastructure and system development in a unique business core area where gaming and AI converge.
• 🌴 Jeju office & refresh - supports refresh, such as working at the Jeju headquarters office.
• 📚 Self-development support - support for self-development such as books and online courses.
• 💪 Health management support - health check-up support / in-house health management programs.
• ❤️ Enjoyable sports play culture - creating a fun sports culture of competition and cooperation through monthly sports challenges.
• Application documents — Resume, cover letter, portfolio, or samples created by yourself (clearly stating the parts you worked on).
• Recruitment process — Document and portfolio review → First interview (online) → Second interview (online) → Final interview (offline) → Announcement of results.
• There may be tasks (under 1 day) or tests during the interview process.
[For details, please refer to the notion page below]
https://catchitplay.notion.site/AI-Mid-Senior-36098f74ee5a8003a68ac81fc502eca9