Predicting Employee Attrition

In this project, data cleaning, exploration and predictive analysis were carried out on a set of Employee data.
Some machine learning models were trained and the algorithms were implemented to forecast the ones with the most likelihood of attriting.

DATA EXPLORATION: PySpark

Pyspark is a powerful Python module that allows you to communicate using Apache Spark. Spark includes a number of functionalities, such as Spark SQL as well as its Machine Learning Library. In this project, I used PySpark to explore, clean and evaluate a bank dataset using Binary Classification application with pyspark's Machine Learning Library (MLib) Pipelines API.

Requesting Data Using an API Call

In this project, I requested for the most starred project on github as at the time of writing this.
This kind of request is called an API call. The data is returned in JSON format and a visualization was done to show the number of stars on each github project.

NLP WITH PYSPARK

In this project, I analysed a dataset that contains all of Udemy's courses classified into different subjects.
I utilised spark's mlib to train the model, after which it was tested with unknown data to classify the courses.

Database creation with Python and SQLAlchemy

In this project, I used SQLAlchemy, a library that makes it pretty easy to connect to SQL/ORM (Object Relational Mapping) database in python with SQLite.
Note that SQLAchemy can be used with other database engines such as MySQL and PostgreSQL.