Data Science Capstone Project

Capstone Project

Winning the Space Race with Data Science

This was my capstone project from the IBM Data Science Professional Certificate course. It involved applying my understanding of Data Science methodology, Python libraries (NumPy, Pandas, Matplotlib, Seaborn, Folium, and Scikit-Learn), SQL, math, machine learning, report wiriting, and presentation creation.

The major steps of the project included data collection, data cleaning (selecting data to keep, transform, place in database, and query), data analysis, data visualization, interactive dashboard creation, machine learning model training, and final report writing.

The Jupyter Notebooks and final presentation can be accessed via the links below. These files and the data are stored in the GitHub data-science-capstone repository.

Summary: Data was collected and analyzed to understand the nature of the first stage landing success rate for the SpaceX Falcon 9 rocket to make predictions about the success of future SpaceX Falcon 9 rocket first stage landings. What is the nature and extent of the data that we have on SpaceX Falcon 9 first stage landings? Which machine learning model would work best (have the highest accuracy) to predict the outcome of a Falcon 9 first stage landing from a future launch? Will a future Falcon 9 first stage landing be successful?

Methodology: Data was collected from the SpaceX public API and publically available data on Wikipedia. Data wrangling included extracting launch outcome information to serve as the dependent variable in the machine learning models. SQL queries and data visualizations (static plots, interactive maps, and an interactive dashboard) were created to generatea insights about the data set and answer questions. Predictive analysis was pursued using Logistic Regression, SVM (Support Vector Machine), Decision Tree, and KNN (k-Nearest Neighbors) machine learning models.

Results: The launch data set info about flight number, date of launch, payload mass, orbit type, launch site, mission outcome, and other variables was explored and visualized. Logistic Regression, SVM (Support Vector Machine), and KNN (k-Nearest Neighbors) performed equally well for Machine Learning models on this dataset.