• Worked on US Cab Industry business data and used exploratory data analysis and applied ML algorithms to make a business judgment on which firm to invest in.
• Used the Flask framework to deploy the models created using the pickle library on Heroku
• YAML and JSON were used to build a data ingestion pipeline to handle big datasets (3GB+). Compared the performance among Dask, Pyspark, CSV Dict Reader, Datatable fread, and pandas.
Skills used Python libraries (tensorflow/casuallib/scikit-learn/logging/pandas/matplotlib/plotly/seaborn/numpy); Flask, Docker, Streamlit, Agile(jira/scrum/kanban/bitbucket), Postman(API), EDA, data engineering data munging, data visualization, model deployment, A/B testing, Problem solving, Feature Engineering, Stakeholder Communication, Big data, Cloud
더보기