Machine Learning & Data Analysis in Pyth

A Machine Learning Project on Classifying Happy or Unhappy People

Purpose: To find the factors that associate with happiness.
Workings: Removing unnecessary text from all 31 columns, encoding categorical variables using one-hot, dummy, and ordinal encoding, selecting features using the Boruta feature selection method, selecting training data, building models, creating classification reports and confusion matrices, plotting ROC curves, selecting the best model based on F1 score and AUC.
Accomplishment: Successfully achieved 93% accuracy using the Gradient Boosting model in identifying the factors associated with happiness.

A Machine Learning Project on Predicting Coronary Heart Disease

Purpose: Achieving the maximum accuracy in the ml model. This project was undertaken as course assignment.
Workings: Data wrangling, finding missing data, hypothesis testing, finding the confidence interval, visualizing the correlations by using seaborn, plotting the decision tree, using 5 different supervised learning algorithms, and finally evaluating the best model by using the confusion matrix and voting classifier.
Accomplishment: Achieving 85% accuracy by using the logistic regression model.

Data Wrangling in Python

Purpose: Finding out mean of three different axes by merging six different big databases with a common database. This project was undertaken as training assignment.

Workings: Reading csv files, extracting day from datetime, merging 2 csv files by inner join function, dropping unnecessary columns, rearranging columns, calculating means of three axes of per aid and per day, converting dataframe to dictionary, and converting dictionary to json file.

Accomplishment: Experience of working with big data and learning of handling the memory error issue.