What is Data Science?
Data Science is the field of study that combines the technical capacities of computer science and mathematics with domain-specific expertise to extract useful insights from raw data. It is a broad field that touches on business intelligence, data engineering, data analytics, machine learning, and artificial intelligence. Data science draws heavily from both statistics and linear algebra.
![Image](../img/datascience/web-development-300x300.png)
Computer
Science
![Image](../img/datascience/data-analysis-240x240.png)
Mathematics
(Stats/LinA)
Domain
Knowledge
Data
Science
Data science delivers value to organizations through enhancements to the intelligence picture in support of better decision making processes. Professional data scientists discover and clearly communicate information. Organizations can then use that information to improve operational performance through more precisely targeted resource use and improved alignment of product/service with the customer base. This results in reduced expenses, increased revenue flow, higher profit margins, stronger teams, and more effective marketing campaigns. A significant goal of data science is to discover information for a company to use to achieve a better competative stance in the marketplace.
The effectiveness of data science is due, in part, to the combination of advanced software, rigorous methods, and the massive compute power available today. Applications of data science can be found in every industry and in everything from small data sets to large, complex, high velocity data streams.
There are many great software tools available to today's data scientist. Jupyter Notebooks provide an environment to generate and document the workflow of a data science project. Python is a popular programming language for data science tasks due to the ease of writing Python code and the numerous libraries that have been developed to make data science tasks quicker. These libraries include NumPy, Pandas, Polars, Matplotlib, SciPy, Scikit-Learn, Seaborn, and Folium among many others. R is a programming language geared towards statistical analysis. SQL (Structured Query Language) is a useful language for interacting with databases.
A Data Science Project Workflow
Identify
Identify Specific Problem to Solve
Involve Stakeholders
Business Understanding
Select Analytic Approach
Descriptive ✱ What happened?
Diagnostic ✱ Why did it happen?
Predictive ✱ What will happen?
Prescriptive ✱ What action to take?
Collect
Collect Data
Software Engineering
Data Requirements, Collection, Mining, Exploration, Understanding, Cleaning, Preparation
Instrumentation, Logging, Sensors, External Data, User Generated Content
Process
Process Data
Data Engineering
Reliable Data Flow, Infrastructure, Pipelines, ETL (Explore, Transform, Load), Structured and Unstructured Data Storage
Cleaning, Wrangling, Anomaly Detection, Preparation
Label
Aggregate/Label Data
Data Science Analytics
Analytics, Metrics, Segments, Aggregates, Features, Training Data Preparation
A/B Testing, Experimentation
Model
Build Data Model
Machine Learning
Feature Engineering, Model Training, Evaluation, Deployment, Monitoring, Assessment, Optimization
AI, Deep Learning, Research Science
Report
Report to Stakeholders
Data Visualization, Executive Summary, Detailed Analysis/Conclusions, Storytelling with Data
Choose Format Option: Formal report, Live/interactive dashboard, Minimum-viable-product on-the-fly quick-n-dirty one sheet summary
![IBM Data Science Professional Certificate](../img/certificates/ibm-data-science-professional-certificate-medium.png)
Data Science Certificates
IBM Data Science Professional Certificate
![What is Data Science?](../img/certificates/coursera-data-science-certificate-1-small.png)
What is Data Science?
![Tools for Data Science](../img/certificates/coursera-data-science-certificate-2-small.png)
Data Science Tools
![Data Science Methodology](../img/certificates/coursera-data-science-certificate-3-small.png)
Methodology
![Python for Data Science, AI & Development](../img/certificates/coursera-data-science-certificate-4-small.png)
Python
![Python Project for Data Science](../img/certificates/coursera-data-science-certificate-5-small.png)
Python Project
![Databases and SQL for Data Science with Python (Honors Content)](../img/certificates/coursera-data-science-certificate-6-small.png)
Databases and SQL
![Data Analysis with Python](../img/certificates/coursera-data-science-certificate-7-small.png)
Data Analysis
![Data Visualization with Python](../img/certificates/coursera-data-science-certificate-8-small.png)
Data Visualization
![Machine Learning with Python (Honors Content)](../img/certificates/coursera-data-science-certificate-9-small.png)
Machine Learning
![Applied Data Science Capstone](../img/certificates/coursera-data-science-certificate-10-small.png)
Capstone
![Capstone Project: IBM Data Science Professional Certificate](../img/datascience/capstone.png)
Data Science Capstone Project
Winning Space Race with Data Science
In competition with SpaceX, a rival rocket launch company wants to make predictions about the success/failure of SpaceX Falcon 9 rocket first stage landings.
Data was collected and analyzed to understand the nature of SpaceX's rocket launch success rate and to build data visualizations (static plots, interactive maps, and an interactive dashboard).
Machine Learning models (Logistic Regression, Support Vector Machine, Decision Tree, and k-Nearest Neighbors) were trained to be used to predict the success rate of future SpaceX launches.
![](../img/datascience/clarkdatascience-1124x1124.png)
Clark Data Science
Data Analytics Data Infrastructure System Organization