A global powerhouse in EdTech! To change the future of AI education at Catch It Play, we are looking for a 'Data Engineer' to lead the data & ML platform for new product and engine development based on 970 million learning behavior data.
We seek a Data Engineer who can reliably handle large-scale learning logs and take responsibility for the ML model infrastructure as well. We want someone who can start from the data platform and expand their role into the MLOps area.
• Someone who can consider both the stability and cost of data pipelines.
• A person who can discuss the necessity and context of features with ML engineers.
• Prefer someone who prefers to run an MVP first and gradually improve rather than aiming for perfection from the start.
• A responsible individual who will trace the cause of failures completely and work to prevent recurrence.
• Interested in expanding one’s role from the data platform into the ML platform and MLOps area.
A. Data Platform (~60%)
• Design, implement, and operate data collection and processing pipelines for large-scale events, logs, and training data generated from live services.
• Ensure stable operation and optimize performance and cost efficiency of **ETL/ELT pipelines** using workflow tools like Airflow.
• Design, build, and asset management of data warehouses (DW) and data marts (DM) in a format that is useful for analysts, ML engineers, and planning teams.
• Establish data quality management and governance (metadata, catalogs, access control) to ensure data reliability.
• Lead architecture improvements considering scalability and cost efficiency for cloud-based data infrastructure (AWS, GCP, etc.).
B. ML Platform & MLOps Adjacent Area (~40%)
• Collaborate with the ML team to operate data pipelines for model training and serving for recommendations, matching, and churn prediction, along with **Feature Store**.
• Operate model serving and monitoring infrastructure, ensuring low-latency inference environments and operational stability.
• Implement data and performance drift detection and alerting systems.
• Over 3 years of practical experience in data engineering.
• Proficient in Python and SQL.
• Experience processing large-scale user logs (high-volume event processing).
• Practical experience with distributed processing frameworks like Spark and Flink.
• Experience building ETL pipelines (e.g., Airflow, Prefect).
• Someone who understands the difference between batch and streaming data processing and can design for both.
• Experience building data infrastructure in cloud environments (AWS, GCP, etc.).
• Experience operating container environments based on Docker / Kubernetes.
• Experience managing Infrastructure as Code (IaC, Terraform, etc.).
• Experience building real-time streaming platforms (e.g., Kafka, Kinesis).
• Experience operating large-scale data warehouses for analytics (e.g., BigQuery, Redshift, Snowflake).
• Experience in building or operating a Feature Store (e.g., Feast).
• Experience in constructing ML model training and serving pipelines (e.g., MLflow, Kubeflow).
• Experience in LLM/large model inference infrastructure (e.g., vLLM, TGI).
• Experience detecting data/performance drift in ML models (e.g., Evidently, WhyLabs).
• Experience managing inference latency and availability SLA (e.g., Prometheus + Grafana).
• Experience handling large-scale user behavior logs in EdTech, gaming, and recommendation services.
• Experience contributing to open source or presenting at technical conferences/papers.
Benefits and Work Environment
• 🏠 Full remote work environment - a productive work environment based on full remote work that allows work from anywhere in the country.
• 📊 Stock options program - stock options for key R&D personnel (assessment for grant after one year of stable employment).
• 📈 Global growth experience - core experience in growing a trendy product aiming for 10 million downloads (e.g., Google Feature).
• 💼 Core system development experience - directly involved in infrastructure and system development in a unique business core area that merges gaming and AI.
• 🌴 Jeju Office & Refresh - support for refreshing experiences, including the possibility of working at the Jeju headquarters office.
• 📚 Self-development support - support for self-development such as books and online classes.
• 💪 Health management support - support for health check-up expenses / in-house health management programs.
• ❤️ Enjoyable sports play culture - we create a fun sports culture of competition and collaboration through monthly sports challenges.
• Submission documents — Resume, cover letter, portfolio, or samples you have written (clearly state the parts you worked on).
• Hiring process — Document and portfolio review → 1st practical interview (online) → 2nd interview (online) → Final interview (offline) → Announcement of results.
• There may be assignments (under 1 day's worth) or tests during the interview process.
[For details, refer to the Notion page below]
https://catchitplay.notion.site/AI-Mid-Senior-36098f74ee5a8003a68ac81fc502eca9