1. Configured and Implemented Full Data Engineering and Analysis pipeline:
- Implemented asynchronous tasks with Celery to upload incoming sensor data to a data lake, ensuring timely and accurate data storage.
- Performed data quality assessments to validate dataset accuracy and reliability, ensuring integrity in subsequent processing and analysis.
- Integrated validated datasets into a centralized data warehouse, simplifying access for analytics and reporting.
- Automated the whole process using Apache Airflow, resulting in a 20% increase in performance.
- Configured Airflow alerts and monitoring tools to provide real-time notifications on workflow status.
2. Statistical analysis, Data analysis and Machine learning model implementation:
- Analyzed 1M+ rows of data from about 300+ users daily.
- Developed an algorithm capable of accurately detecting complex driving maneuvers, including u-turns, rapid turns, overtaking, and lane changes.
- Integrated the algorithm into the existing data processing pipeline, ensuring seamless operation and real-time behavior analysis.
- Together with the team, utilized the algorithm’s outputs and implemented a driving behavior scoring algorithm for usage-based insurance, significantly enhancing road safety.
- Developed a supervised SVM machine learning model with Fourier and wavelet transform preprocessing, achieving a 92% accuracy rate in distinguishing between road and sidewalk classes.
- Communicated real-time findings by utilizing Python’s Streamlit and R-Shiny.
- Utilized Tableau for scorecards dashboard.
더보기