This page looks best with JavaScript enabled

The beginning of a beautiful friendship

 ·   ·  ☕ 5 min read  ·  ✍️ Andrei Stoian

What will you learn

Introduction

The reason you are here is that you want to start your blog. Our journey starts here, with this thought in mind. As a house has a ground foundation, so does our data project.
We learned in the course on Reproducible Research:

  • Organize data analysis to help make it more reproducible
  • Write up a reproducible data analysis using knitr
  • Determine the reproducibility of the analysis project
  • Publish reproducible web documents using Markdown

Let us begin from scratch with this reproducible report. I dislike the fact that people expect you to know exactly where they got every program and have the aspiration that these are trivial things. I will, hopefully, not make the same mistake. Below I explain every step of the journey. Once you go through the steps, we forget about them.

R and RStudio

Any story begins with the hero having a pure heart and nothing else. Our thought process is very much the same. We start by first setting the scene: cue music. For this evening is Relaxing Jazz Piano Radio

First, look at what operating system we have and mainly do we have 32-bit or 64-bit.

We need to know what type of Rtools to download in just a moment.

The first thing that we need to do is to get the 2 tools that we will use every day from now on:

  • R(we will install more than one version)
  • RStudio(we install the latest version with all the bells and wisels)

Once we install the R and RStudio we see that it ask us yet another tool…as they say, don’t they realize that we agree with them. We must ask them again if they agree with the terms and conditions. Oh well sometimes you got to do what you got to do said Dave.

Thus we download the tool and install the version for our 64-bit

Now, now we are ready to start the RStudio that uses the R program.

Can we just start R.

Yes but the advantages of using RStudio are obvious when you see the 4 windows. Those 4 windows were designed especially for you. They were bundeled into what is called an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management. Finally we are done.

Well yes now we can open R and have the base library set. What we soon realize is that R is just like a deck of cards. You want to have the right cards to play the winning hand. So here come an amazing comunity that invented all this wanderfull library full of winning cards. In order to make use of this library we have to install them first. Here is a good time to pause the story and point out the fact that this is a crucial fact to understand. R has versions, the Library has versions, the functions themselfs have versions then overarching them all the RStudio has versions. So in order to manage the library version for each specific version of R we will introduce our first package called checkpoint

Creating our first checkpoint

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
## Creating our first checkpoint
 Create the folder for our project. We create the vector 
 dir.checkpoint that stands for directory checkpoint

dir.checkpoint <- "checkpoint" 

dir.create(file.path(dir.checkpoint,".checkpoint"), recursive = TRUE, showWarnings = FALSE)
options(install.packages.compile.from.source = "no")
# I use Vt for Vector that way I know that
# it is a vector and not a data frame
# More on this on a later blog post about style
vt.oldLibPaths <- .libPaths()

## Create a checkpoint by specifying a snapshot date
library(checkpoint)
checkpoint("2020-05-24", project = dir.checkpoint, checkpointLocation = dir.checkpoint)

We are set to now install all the library that we will use for this project
# Scanning for packages used in this project
# - Discovered 0 packages
# No packages found to install
# checkpoint process complete
# ---

# install.packages("tidyverse")
Because tidyverse is such a giant leap in the way we think about code it is also dependant on a lot of packages: 

# Installing package into ‘C:/your-project-path-is-entered-here R//checkpoint/.checkpoint/2020-05-24/lib/x86_64-w64-mingw32/4.0.0’
# (as ‘lib’ is unspecified)
# also installing the dependencies ‘desc’,
 #‘pkgbuild’, ‘rprojroot’, ‘pkgload’, ‘praise’,
 #‘colorspace’, ‘sys’, ‘ps’, ‘highr’, ‘markdown’,
 #‘plyr’, ‘testthat’, ‘farver’, ‘labeling’,
 #‘munsell’, ‘RColorBrewer’, ‘viridisLite’,
 #‘askpass’, ‘rematch’, ‘prettyunits’,
 #‘processx’, ‘knitr’, ‘yaml’, ‘htmltools’,
 #‘evaluate’, ‘base64enc’, ‘tinytex’, ‘xfun’,
 #‘backports’, ‘generics’, ‘reshape2’, ‘assertthat’,
 #‘glue’, ‘fansi’, ‘DBI’, ‘lifecycle’, ‘R6’,
 #‘tidyselect’, ‘ellipsis’, ‘pkgconfig’,
 #‘Rcpp’, ‘BH’, ‘plogr’, ‘digest’, 
 #‘gtable’, ‘isoband’, ‘scales’, ‘withr’,
 #‘vctrs’, ‘curl’, ‘mime’, ‘openssl’,
 #‘utf8’, ‘clipr’, ‘cellranger’, ‘progress’,
 #‘callr’, ‘fs’, ‘rmarkdown’, ‘whisker’, ‘selectr’,
 #‘stringi’, ‘broom’, ‘cli’, ‘crayon’, #‘dbplyr’,
 #‘dplyr’, ‘forcats’,‘ggplot2’,‘haven’, ‘hms’, ‘httr’,
 #‘jsonlite’, ‘lubridate’, ‘magrittr’, ‘modelr’,
 #‘pillar’, ‘purrr’, ‘readr’, ‘readxl’,‘reprex’,
 #‘rlang’, ‘rstudioapi’, ‘rvest’, ‘stringr’, ‘tibble’, ‘tidyr’, ‘xml2’

Ureka we have lift off. Now just you wait and see how many
wanderfull stories we will tell using this framework.
I keep telling you it is a framework dispite being 
just a mega cool library. Well in some blogs down 
the line we will see that it is a fram of mind.
So we have vanilla coding and we have tidyverse coding.
But what do I know ..I am just making things up as we go. Maybe we have more. 
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
library(tidyverse)

# -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
# v ggplot2 3.3.0     v purrr   0.3.4
# v tibble  3.0.1     v dplyr   0.8.5
# v tidyr   1.1.0     v stringr 1.4.0
# v readr   1.3.1     v forcats 0.5.0
# -- Conflicts ------------------------------------------ tidyverse_conflicts() --
# x dplyr::filter() masks stats::filter()
# x dplyr::lag()    masks stats::lag()

## Are we done ? 

library(rmarkdown)
Share on

Andrei Stoian
WRITTEN BY
Andrei Stoian
Data Analytics with telecommunication SME