# Data Science

E-books Only

I separate this section from the statistics one because when I needed to learn how to run statistics in R, I would feel frustrated when the documents I found were not directly telling me about specific tests I need to learn about. It may not be generalizable but it makes sense to me now. Sorry!

## A ModernDive into R & the Tidyverse

By Chester Ismay and Albert Y. Kim. Foreword by Kelly S. McConville

**What is this?**

**Excerpt from ebook:** Over the course of this book, you will develop your “data science toolbox,” equipping yourself with tools such as data visualization, data formatting, data wrangling, and data modeling using regression.

In particular, this book will lean heavily on data visualization. In today’s world, we are bombarded with graphics that attempt to convey ideas. We will explore what makes a good graphic and what the standard ways are used to convey relationships within data. In general, we’ll use visualization as a way of building almost all of the ideas in this book.

## Advanced Data Science 2020

Added Sep 12th, 2020

**What is this?**

This course is designed for PhD students at Johns Hopkins Bloomberg School of Public Health. We are usually pretty flexible about permitting outside students but we want everyone to be aware of the goals and assumptions so no one feels like they are surprised by how the class works.

The primary goal of the course is to teach you how to deconstruct, perform, and communicate professional data analyses across diverse media.

The goal is to help you to organize your thinking around how to combine the things you have learned about statistics, data manipulation, and visualization into complete data analyses that answer important questions about the world around you.

- Link to ebook here: http://jtleek.com/ads2020/

## Building reproducible analytical pipelines with R

Added Mon Apr 24th, 2023

**What is this?**

**Excerpt from e-book:** The aim of this book is to teach you how to use some of the best practices from software engineering and DevOps to make your projects robust, reliable and reproducible. It doesn’t matter if you work alone, in a small or in a big team. It doesn’t matter if your work gets (peer-)reviewed or audited: the techniques presented in this book will make your projects more reliable and save you a lot of frustration!

- Link to ebook here: https://raps-with-r.dev/

## Data Analysis for the Life Sciences with R

By Rafael A. Irizarry and Michael I Love

**What is this?**

**Excerpt from ebook:** This book will cover several of the statistical concepts and data analytic skills needed to succeed in data-driven life science research. We go from relatively basic concepts related to computing p-values to advanced topics related to analyzing high-throughput data. While statistics textbooks focus on mathematics, this book focuses on using a computer to perform data analysis. Instead of explaining the mathematics and theory, and then showing examples, we start by stating a practical data-related challenge. This book also includes the computer code that provides a solution to the problem and helps illustrate the concepts behind the solution.

- Link to ebook: https://leanpub.com/dataanalysisforthelifesciences

## Exploratory Data Analysis with R 💯

**What is this?**

**Excerpt from e-booK:** This book covers the essential exploratory techniques for summarizing data with R. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data you have. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing informative data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.

- https://bookdown.org/rdpeng/exdata/

## Introduction to Data Science

**What is this**

Excerpt from e-book: This book introduces concepts from probability, statistical inference, linear regression and machine learning and R programming skills.

- Link to e-book here: https://leanpub.com/datasciencebook

## Mastering Spark with R

By Javier Luraschi, Kevin Kuo, & Edgar Ruiz

**What is this?**

**Excerpt from ebook:** This chapter presented Spark as a modern and powerful computing platform, R as an easy-to-use computing language with solid foundations in statistical methods, and sparklyr as a project bridging both technologies and communities. In a world in which the total amount of information is growing exponentially, learning how to analyze data at scale will help you to tackle the problems and opportunities humanity is facing today. However, before we start analyzing data, Chapter 2 will equip you with the tools you will need throughout the rest of this book. Be sure to follow each step carefully and take the time to install the recommended tools, which we hope will become familiar resources that you use and love.

- Link to free ebook: https://therinspark.com/

## Text Mining with R A Tidy Approach

By Julia Silge and David Robinson

**What is this?**

**Excerpt from ebook:** This book serves as an introduction of text mining using the tidytext package and other tidy tools in R. The functions provided by the tidytext package are relatively simple; what is important are the possible applications. Thus, this book provides compelling examples of real text mining problems.

- Link to free ebook here: https://www.tidytextmining.com/
- Buy the book here: Amazon
- Link to repo here: https://github.com/dgrtwo/tidy-text-mining

## Social Data Science with R

By Daniel Anderson, Brendan Cullen, & Ouafaa Hmaddi

Added Thu Dec 31st, 2020

What is this?Excerpt from e-book:Here’s an intro about why R is great and the cool things you can do with it and new problems you can address.