Logo

TOPTop

What is Data Science?

Data Science is the field of study that combines the technical capacities of computer science, mathematics (especially statistics and linear algebra), and domain-specific expertise to extract useful insights from raw data. It is a broad field that touches on business intelligence, machine learning, and artificial intelligence.

The value data science brings to organizations comes in the form of better decisions enabled by being better informed. Working with better intel allows organizations to save money though targeted cost reductions, increase revenues from marketing and product/service quality enhancements, and improve operational performance so as to achieve a better competative stance in the marketplace and bigger profit margins.

Through discovering and clearly communicating insights, data scientists help leaders improve their business operations. The effectiveness of data science is due, in part, to the combination of advanced software, robust methods and the massive compute power available today. Data science applications can be found in every industry and in everything from small data sets to large, complex, high velocity data streams.

There are many great software tools available to today's data scientist. Jupyter Notebooks provide an environment to generate and document the workflow of a data science project. Python is a popular programming language for data science tasks due to the ease of writing Python code and the numerous libraries that have been developed to make data science tasks easier. These libraries include NumPy, Pandas, Polars, Matplotlib, SciPy, Scikit-Learn, Seaborn, and Folium among many others. R is a programming language geared towards statistical analysis. SQL (Structured Query Language) is a useful language for interacting with databases.

A Data Science Project Workflow

Identify

Identify Specific Problem to Solve

Involve Stakeholders

Business Understanding

Select Analytic Approach

DescriptiveWhat happened?

DiagnosticWhy did it happen?

PredictiveWhat will happen?

PrescriptiveWhat action to take?

Collect

Collect Data

Software Engineering

Data Requirements, Collection, Mining, Exploration, Understanding, Cleaning, Preparation

Instrumentation, Logging, Sensors, External Data, User Generated Content

Process

Process Data

Data Engineering

Reliable Data Flow, Infrastructure, Pipelines, ETL (Explore, Transform, Load), Structured and Unstructured Data Storage

Cleaning, Wrangling, Anomaly Detection, Preparation

Label

Aggregate/Label Data

Data Science Analytics

Analytics, Metrics, Segments, Aggregates, Features, Training Data Preparation

A/B Testing, Experimentation

Model

Build Data Model

Machine Learning

Feature Engineering, Model Training, Evaluation, Deployment, Monitoring, Assessment, Optimization

AI, Deep Learning, Research Science

Report

Report to Stakeholders

Data Visualization, Executive Summary, Detailed Analysis/Conclusions, Storytelling with Data

Choose Format Option: Formal report, Live/interactive dashboard, Minimum-viable-product on-the-fly quick-n-dirty one sheet summary

Formal Report Structure

Cover Page

Title

Author Names

Date Published

Table of Contents / Outline

Executive Summary / Abstract

Introductory

Methodology

Results

Discussion

Conclusion/Recommendations

Acknowledgments

References

Appendicies


Data Science Certificates

IBM Data Science Professional Certificate

Data Science Logo IBM Data Science


What is Data Science?

Data Science Tools

Methodology

Python

Python Project

Databases and SQL

Data Analysis

Data Visualization

Machine Learning

Capstone

Data Science Capstone Project

Winning Space Race with Data Science

In competition with SpaceX, a rival rocket launch company wants to make predictions about the success/failure of SpaceX Falcon 9 rocket first stage landings.

Data was collected and analyzed to understand the nature of SpaceX's rocket launch success rate and to build data visualizations (static plots, interactive maps, and an interactive dashboard).

Machine Learning models (Logistic Regression, Support Vector Machine, Decision Tree, and k-Nearest Neighbors) were trained to be used to predict the success rate of future SpaceX launches.

Data Science Logo Capstone Project

Data Science Logo GitHub Repository

Data Science Logo Capstone Report PDF

Clark Data Science

Data Analytics • Data Infrastructure • System Organization

Clark Data Science Logo Clark Data Science