R Basics


Updated: 03 September 2023

Based on this EdX Course


Before starting the course make sure you install the library with the relevant datasets included


From the dslabs library you can use the data sets as needed


In order to store a value as a variable we use the assignment variable

a <- 25

To display this object we can use



A Data Analysis process is typically a series of functions applied to data

We can define a function in R using:

myfunction <- function(arg1, arg2, ... ){

To Evaluate a function we use the parenthesis and arguments:


We can nest function calls inside of function arguments, for example:


To get help for a function we can use the following:


We can enter function arguments in an order different to the default by using named arguments:

log(x=5, base=3)

Otherwise the arguments are evaluated in order


Commenting in R is done with the # symbol:

# This is a comment

Data Types

R makes use of different data types, in R we typically use DataFrames to store data, these can store a mixture of different data types in a collection

To make use of DataFrames we need to import the dslabs library:


To check the type of an object we an use


In order to view the structure of an object we can use


If we want to view the data in a DataFrame, we can use:


to Access a variable in an object we use the $ accessesor, this preserves the rows in the DataFrame

data$names will list the names column of the DataFrame

In R we refer to the data points in our DataFrame or Matrix as Vectors

We can use the == as the logical operator

Factors allow us to store catergorical data, we can view the different catergories with the following:

> class(dataFrame$gender)
[1] Factor
> levels(dataFrame$gender)
[2] "Male" "Female"


The most basic data unit in R is a Vector

To create a vector we can use the concatonate function with:

codes <- c(380, 124, 818)

If we want to name the values we can do so as follows:

codes <- c(italy=380, canada=124, egypt=818)
codes <- c("italy"=380, "canada"=124, "egypt"=818)

Getting a sequence of number we can use:

> seq(1, 5)
[1] 1, 2, 3, 4, 5

> seq(1, 10, 2)
[1] 1, 3, 5, 7, 9

We can access an element of a vector with either a single access or multi-access vector as follows:

> codes[3]
[1] 818

> codes["canada"]
[2] 124

> codes["canada", "egypt"]
[3] 124 818

> codes[1:2]
[4] 380 124

Vector Coercion

Coercion is an attempt by R to guess the type of a variable if it’s of a different type of the rest of the values

x <- c(1, "hello", 3)
[1] "1" "hello" "3"

If we want to force a coercion we can use the as.character function or as.numeric function as follows:

> x <- 1:5
> y <- as.character(1:5)
> y
[1] "1" "2" "3" "4" "5"
> as.numeric(y)
[2] 1 2 3 4 5

If R is unable to coerse a value it will result in NA which is very common with data sets as it refers to missing data


The sort function will sort a vector in increasing order, however this gives us no relation to the positions of that data. We can use the order function to reuturn the index of the values that are sorted

> x
[1] 31 4 15 92 65

> sort(x)
[2] 4 15 31 65 92

> order(x)
[3] 2 3 1 5 4

The entries of vectors that the vectors are ordered by correspond to their rows in the DataFrame, therefore we can order one row by another

index <- order(data.total)

To get the max or min value we can use:

max(data$total) # maximum value
which.max(data$total) # index of maximum value

min(data$total) # minimum value
which.min(data$total) # index of minimum value

The rank function will return the index of the sizes of the vectors

Vector Aritmetic

Aritmetic operations occur element-wise

If we operate with a single value the operation will work per element, however if we do this with two vectors, we will add it element-wise, v3 <- v1 + v2 will mean v3[1] <- v1[1] + v2[1] and so on


R provides ways to index vectors based on properties on another vector, this allows us to make use of logical comparators, etc.

> large_tots <- data$total > 200

> small_size <- data$size < 20

index <- large_tots && small_size

Indexing Functinos

  • which will give us the indexes which are true which(data$total > 200) this will only return the values that are true
  • match returns the values in one vector where another occurs match(c(20, 14, 5), data$size) will return only the values in which data$size == 20 || 14 || 5
  • %in% if we want to check if the contents of a vector are in another vector, for example:
> x <- c("a", "b", "c", "d", "e")
> y <- c("a", "d", "f")
> y %in% x

These functions are very useful for subsetting datasets

Data Wrangling

The dplyr package is useful for manipulating tables of data

  • Add or change a column with mutate
  • Filter data by rows with filter
  • Filter data by columns with select
mutate(data, rate=total/size) # Add rate column based on two other columns

select(data, name, rate) # Will create a new table with only the name and rate columns

filter(data, rate <= 0.7) # Will filter out the rows where the rate expression is true

We can combine functions using the pipe operator:

dataTable %>% select(name, rate) %>% filter(rate <= 0.7)

Creating Data Frames

we can create a data frame with the data.frame function as follows:

data <- data.frame(names = c("John","James", "Jenny"),
                   exam_1 = c(90, 29, 45),
                   exam_2 = c(30, 10, 95))

Howewever, by default R will pass strings as Factors, to prevent this we use the stringsAsFactors argument:

data <- data.frame(names = c("John","James", "Jenny"),
                   exam_1 = c(90, 29, 45),
                   exam_2 = c(30, 10, 95),
                   stringsAsFactors = FALSE)

Basic Plots

We can make simple plots very easily with the following functions:

  • plot(dataFrame$size, data$rate)
  • lines(dataFrame$size, data$rate)
  • hist(dataFrame$size)
  • boxplot(rate~catergory, data=dataFrame)

Programming Basics


# Can evalutate all elements of a vector
if (test_expression) {
} else {

# Will reuturn a result
ifelse(comparison, trueReturn, falseReturn)

# Will return true if any value in vector meets condition

# Will return true if all values meet condition


Functions in R are objects, if we need to write a function in R we can do this wth the following:

myfunction <- function(arg1, arg2, optional=TRUE ){

This will make use of the usual lexical scoping

For Loops

for (i in sequence) {

At the end of our loop the index value will hold it’s last value

Other Functions

In R we rarely use for-loops We can use other functions like the following:

  • apply
  • sapply
  • tapply
  • mapply

Other functions that are widely used are:

  • split
  • cut
  • quantile
  • reduce
  • identical
  • unique