R Basics
Updated: 03 September 2023
Configuration
Before starting the course make sure you install the library with the relevant datasets included
From the dslabs library you can use the data sets as needed
Objects
In order to store a value as a variable we use the assignment variable
To display this object we can use
Functions
A Data Analysis process is typically a series of functions applied to data
We can define a function in R using:
To Evaluate a function we use the parenthesis and arguments:
We can nest function calls inside of function arguments, for example:
To get help for a function we can use the following:
We can enter function arguments in an order different to the default by using named arguments:
Otherwise the arguments are evaluated in order
Comments
Commenting in R is done with the #
symbol:
Data Types
R makes use of different data types, in R we typically use DataFrames to store data, these can store a mixture of different data types in a collection
To make use of DataFrames we need to import the dslabs
library:
To check the type of an object we an use
In order to view the structure of an object we can use
If we want to view the data in a DataFrame, we can use:
to Access a variable in an object we use the $
accessesor, this preserves the rows in the DataFrame
data$names
will list the names column of the DataFrame
In R we refer to the data points in our DataFrame or Matrix as Vectors
We can use the ==
as the logical operator
Factors
allow us to store catergorical data, we can view the different catergories with the following:
Vectors
The most basic data unit in R is a Vector
To create a vector we can use the concatonate function with:
If we want to name the values we can do so as follows:
Getting a sequence of number we can use:
We can access an element of a vector with either a single access or multi-access vector as follows:
Vector Coercion
Coercion is an attempt by R to guess the type of a variable if it’s of a different type of the rest of the values
If we want to force a coercion we can use the as.character
function or as.numeric
function as follows:
If R is unable to coerse a value it will result in NA
which is very common with data sets as it refers to missing data
Sorting
The sort
function will sort a vector in increasing order, however this gives us no relation to the positions of that data. We can use the order
function to reuturn the index of the values that are sorted
The entries of vectors that the vectors are ordered by correspond to their rows in the DataFrame, therefore we can order one row by another
To get the max or min value we can use:
The rank
function will return the index of the sizes of the vectors
Vector Aritmetic
Aritmetic operations occur element-wise
If we operate with a single value the operation will work per element, however if we do this with two vectors, we will add it element-wise, v3 <- v1 + v2
will mean v3[1] <- v1[1] + v2[1]
and so on
Indexing
R provides ways to index vectors based on properties on another vector, this allows us to make use of logical comparators, etc.
Indexing Functinos
which
will give us the indexes which are truewhich(data$total > 200)
this will only return the values that are truematch
returns the values in one vector where another occursmatch(c(20, 14, 5), data$size)
will return only the values in which data$size == 20 || 14 || 5%in%
if we want to check if the contents of a vector are in another vector, for example:
These functions are very useful for subsetting datasets
Data Wrangling
The dplyr
package is useful for manipulating tables of data
- Add or change a column with
mutate
- Filter data by rows with
filter
- Filter data by columns with
select
We can combine functions using the pipe operator:
Creating Data Frames
we can create a data frame with the data.frame
function as follows:
Howewever, by default R will pass strings as Factors, to prevent this we use the stringsAsFactors
argument:
Basic Plots
We can make simple plots very easily with the following functions:
plot(dataFrame$size, data$rate)
lines(dataFrame$size, data$rate)
hist(dataFrame$size)
boxplot(rate~catergory, data=dataFrame)
Programming Basics
Conditionals
Functions
Functions in R are objects, if we need to write a function in R we can do this wth the following:
This will make use of the usual lexical scoping
For Loops
At the end of our loop the index value will hold it’s last value
Other Functions
In R we rarely use for-loops We can use other functions like the following:
- apply
- sapply
- tapply
- mapply
Other functions that are widely used are:
- split
- cut
- quantile
- reduce
- identical
- unique