What is Data Science?
Data Science is the field of study that combines the technical capacities of computer science, mathematics (especially statistics and linear algebra), and domain-specific expertise to extract useful insights from raw data. It is a broad field that touches on business intelligence, machine learning, and artificial intelligence.
The value data science brings to organizations comes in the form of better decisions enabled by being better informed. Working with better intel allows organizations to save money though targeted cost reductions, increase revenues from marketing and product/service quality enhancements, and improve operational performance so as to achieve a better competative stance in the marketplace and bigger profit margins.
Through discovering and clearly communicating insights, data scientists help leaders improve their business operations. The effectiveness of data science is due, in part, to the combination of advanced software, robust methods and the massive compute power available today. Data science applications can be found in every industry and in everything from small data sets to large, complex, high velocity data streams.
There are many great software tools available to today's data scientist. Jupyter Notebooks provide an environment to generate and document the workflow of a data science project. Python is a popular programming language for data science tasks due to the ease of writing Python code and the numerous libraries that have been developed to make data science tasks easier. These libraries include NumPy, Pandas, Polars, Matplotlib, SciPy, Scikit-Learn, Seaborn, and Folium among many others. R is a programming language geared towards statistical analysis. SQL (Structured Query Language) is a useful language for interacting with databases.
A Data Science Project Workflow
Identify
Identify Specific Problem to Solve
Involve Stakeholders
Business Understanding
Select Analytic Approach
Descriptive ✱ What happened?
Diagnostic ✱ Why did it happen?
Predictive ✱ What will happen?
Prescriptive ✱ What action to take?
Collect
Collect Data
Software Engineering
Data Requirements, Collection, Mining, Exploration, Understanding, Cleaning, Preparation
Instrumentation, Logging, Sensors, External Data, User Generated Content
Process
Process Data
Data Engineering
Reliable Data Flow, Infrastructure, Pipelines, ETL (Explore, Transform, Load), Structured and Unstructured Data Storage
Cleaning, Wrangling, Anomaly Detection, Preparation
Label
Aggregate/Label Data
Data Science Analytics
Analytics, Metrics, Segments, Aggregates, Features, Training Data Preparation
A/B Testing, Experimentation
Model
Build Data Model
Machine Learning
Feature Engineering, Model Training, Evaluation, Deployment, Monitoring, Assessment, Optimization
AI, Deep Learning, Research Science
Report
Report to Stakeholders
Data Visualization, Executive Summary, Detailed Analysis/Conclusions, Storytelling with Data
Choose Format Option: Formal report, Live/interactive dashboard, Minimum-viable-product on-the-fly quick-n-dirty one sheet summary
Formal Report Structure
Cover Page
Title
Author Names
Date Published
Table of Contents / Outline
Executive Summary / Abstract
Introductory
Methodology
Results
Discussion
Conclusion/Recommendations
Acknowledgments
References
Appendicies
Data Science Certificates
IBM Data Science Professional Certificate
What is Data Science?
Data Science Tools
Methodology
Python
Python Project
Databases and SQL
Data Analysis
Data Visualization
Machine Learning
Capstone
Data Science Capstone Project
Winning Space Race with Data Science
In competition with SpaceX, a rival rocket launch company wants to make predictions about the success/failure of SpaceX Falcon 9 rocket first stage landings.
Data was collected and analyzed to understand the nature of SpaceX's rocket launch success rate and to build data visualizations (static plots, interactive maps, and an interactive dashboard).
Machine Learning models (Logistic Regression, Support Vector Machine, Decision Tree, and k-Nearest Neighbors) were trained to be used to predict the success rate of future SpaceX launches.
Clark Data Science
Data Analytics Data Infrastructure System Organization