4 Packages
4.1 Functionality
4.1.1 kableExtra
The goal of kableExtra is to help you build common complex tables and manipulate table styles. Plots nice tables, basically. It imports the pipe `%>%` symbol from `magrittr` and verbalize all the functions, so basically you can add “layers” to a kable output in a way that is similar with `ggplot2` and `plotly`.
4.1.2 gridExtra
: Used to arrange plots next to each other.
4.1.3 kableExtra
: The goal of kableExtra is to help you build common complex tables and manipulate table styles. Plots nice tables, basically. It imports the pipe `%>%` symbol from `magrittr` and verbalize all the functions, so basically you can add “layers” to a kable output in a way that is similar with `ggplot2` and `plotly`.
4.1.4 unpivotr
Tools for converting data from complex or irregular layouts to a columnar structure. For example, tables with multilevel column or row headers, or spreadsheets.
4.1.5 tibble
tible provides a ‘tbl\_df’ class (the ‘tibble’) that provides stricter checking and better formatting than the traditional data frame.
4.1.6 dslabs
26 Datasets and some functions for data analysis. Used to practice data visualization, statistical inference, modeling, linear regression, data wrangling and machine learning.
4.1.7 knitr
Engine for dynamic report generation with R. Enables integration of R code into LaTeX, LyX, HTML, Markdown, AsciiDoc, and reStructuredText documents. The purpose of knitr is to allow reproducible research in R through the means of Literate Programming.
4.1.8 readr
The goal of ‘readr’ is to provide a fast and friendly way to read rectangular data (like ‘csv,’ ‘tsv,’ and ‘fwf’)
4.1.9 readxl
The readxl package makes it easy to get data out of Excel and into R.
4.1.10 tidyxl
Imports non-tabular data from Excel files into R. It exposes cell content, position, formatting and comments in a tidy structure for further manipulation, especially by the unpivotr package.
4.1.11 corrgram
Create correlograms from data frames directly.
4.1.12 corrplot
Create correlograms from preprocessed data frames. Needs a matrix with correlations between each variable.
4.1.13 rtweet
Collect and organize Twitter data.
4.1.14 caTools
Contains several basic utility functions including: moving (rolling, running) window statistic functions, read/write for GIF and ENVI binary files, fast calculation of AUC, LogitBoost classifier, base64 encoder/decoder, round-off-error-free sum and cumsum, etc.
4.2 Data
4.2.1 ggplot2movies
IMDB movies data set useful to experiment with ggplot2 visualizations.
4.2.2 WDI
Search and download data from over 40 databases hosted by the World Bank, including the World Development Indicators (‘WDI’), International Debt Statistics, Doing Business, Human Capital Index, and Sub-national Poverty indicators, GDP, Population.
4.2.3 essurvey
Package used to easily download specific European Social Survey data.
4.2.4 wbstats
This package allows to download data from the world bank database.
wb_cachelist$indicators## # A tibble: 16,649 × 8
## indicator_id indicator unit indicator_desc source_org topics source_id
## <chr> <chr> <lgl> <chr> <chr> <list> <dbl>
## 1 1.0.HCount.1… Poverty H… NA The poverty hea… LAC Equity … <df [… 37
## 2 1.0.HCount.2… Poverty H… NA The poverty hea… LAC Equity … <df [… 37
## 3 1.0.HCount.M… Middle Cl… NA The poverty hea… LAC Equity … <df [… 37
## 4 1.0.HCount.O… Official … NA The poverty hea… LAC Equity … <df [… 37
## 5 1.0.HCount.P… Poverty H… NA The poverty hea… LAC Equity … <df [… 37
## 6 1.0.HCount.V… Vulnerabl… NA The poverty hea… LAC Equity … <df [… 37
## 7 1.0.PGap.1.9… Poverty G… NA The poverty gap… LAC Equity … <df [… 37
## 8 1.0.PGap.2.5… Poverty G… NA The poverty gap… LAC Equity … <df [… 37
## 9 1.0.PGap.Poo… Poverty G… NA The poverty gap… LAC Equity … <df [… 37
## 10 1.0.PSev.1.9… Poverty S… NA The poverty sev… LAC Equity … <df [… 37
## # … with 16,639 more rows, and 1 more variable: source <chr>
wb_cachelist$topics## # A tibble: 21 × 3
## topic_id topic topic_desc
## <dbl> <chr> <chr>
## 1 1 Agriculture & Rural Development "For the 70 percent of the world's …
## 2 2 Aid Effectiveness "Aid effectiveness is the impact th…
## 3 3 Economy & Growth "Economic growth is central to econ…
## 4 4 Education "Education is one of the most power…
## 5 5 Energy & Mining "The world economy needs ever-incre…
## 6 6 Environment "Natural and man-made environmental…
## 7 7 Financial Sector "An economy's financial markets are…
## 8 8 Health "Improving health is central to the…
## 9 9 Infrastructure "Infrastructure helps determine the…
## 10 10 Social Protection & Labor "The supply of labor available in a…
## # … with 11 more rows
# result = wb_search("")
# result$indicator_desc
# Takes a long time to download
# data = wb_data("SP.POP.TOTL", start_date = 1960, end_date = 2020)
# Example visualization
# library(tidyverse)
# data$country
# data %>%
# filter(country == "Germany") %>%
# ggplot(aes(date, SP.POP.TOTL/1000000)) +
# geom_line()4.3 Visualization
4.3.1 igraph
Creating and manipulating graphs and analyzing networks. It is written in C and also exists as Python and R packages.
4.3.2 ggthemes
4.3.3 ggrepel
This geometry adds “smart” labels to each data point, meaining labels that “repel” each other automaticaly to not overlap each other. Sometimes the data points are to close to each other. In these cases one solution might be to use a log scale to stretch those clustered observation away from each other.
data(murders)
murders %>%
ggplot(aes(population,total)) +
geom_point() +
scale_x_log10() +
scale_y_log10() +
geom_text(aes(label = abb))
murders %>%
ggplot(aes(population,total)) +
geom_point() +
scale_x_log10() +
scale_y_log10() +
ggrepel::geom_text_repel(aes(label = abb))
4.3.4 ggridges
Density Ridges
In cases in which we are concerned that the boxplot summary is too simplistic, we can show stacked smooth densities or histograms. We refer to these as ridge plots. Because we are used to visualizing densities with values in the x-axis, we stack them vertically. Also, because more space is needed in this approach, it is convenient to overlay them. The package ggridges provides a convenient function for doing this. Here is the income data shown above with boxplots but with a ridge plot.
gapminder %>%
filter(year == 2015) %>%
ggplot(aes(life_expectancy,continent, fill = continent)) +
ggridges::geom_density_ridges(show.legend = F)## Picking joint bandwidth of 2.23

4.3.5 kableExtra
Plots the most simple table.
mtcars[1:10,] %>%
kbl()| mpg | cyl | disp | hp | drat | wt | qsec | vs | am | gear | carb | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Mazda RX4 | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.620 | 16.46 | 0 | 1 | 4 | 4 |
| Mazda RX4 Wag | 21.0 | 6 | 160.0 | 110 | 3.90 | 2.875 | 17.02 | 0 | 1 | 4 | 4 |
| Datsun 710 | 22.8 | 4 | 108.0 | 93 | 3.85 | 2.320 | 18.61 | 1 | 1 | 4 | 1 |
| Hornet 4 Drive | 21.4 | 6 | 258.0 | 110 | 3.08 | 3.215 | 19.44 | 1 | 0 | 3 | 1 |
| Hornet Sportabout | 18.7 | 8 | 360.0 | 175 | 3.15 | 3.440 | 17.02 | 0 | 0 | 3 | 2 |
| Valiant | 18.1 | 6 | 225.0 | 105 | 2.76 | 3.460 | 20.22 | 1 | 0 | 3 | 1 |
| Duster 360 | 14.3 | 8 | 360.0 | 245 | 3.21 | 3.570 | 15.84 | 0 | 0 | 3 | 4 |
| Merc 240D | 24.4 | 4 | 146.7 | 62 | 3.69 | 3.190 | 20.00 | 1 | 0 | 4 | 2 |
| Merc 230 | 22.8 | 4 | 140.8 | 95 | 3.92 | 3.150 | 22.90 | 1 | 0 | 4 | 2 |
| Merc 280 | 19.2 | 6 | 167.6 | 123 | 3.92 | 3.440 | 18.30 | 1 | 0 | 4 | 4 |
4.3.6 ?gridExtra
https://cran.r-project.org/web/packages/gridExtra/vignettes/arrangeGrob.html
There are often reasons to graph plots next to each other. The `gridExtra` package permits us to do that with `grid.arrange()`:
library(gridExtra)
p1 <- plot(mtcars$mpg)
p2 <- plot(mtcars$cyl)
# grid.arrange(p1, p2, ncol = 2)