19 R Glossary
19.1 Arguments
Inputs supplied to a function. Arguments can be either defined by their position
in the function declaration (“positional argument”; e.g., function (x, y, z = 0)
:
the first value supplied to the function will be given the name x
, and the
second one the name y
) or by a tag / name (e.g., z
in function (x, y, z = 0)
:
in the function call, the value will have to be explicitly linked to z
, and if
no value is linked to z
, z
will default to 0).
19.2 Block
A sequence of statements, grouped between curly braces.
19.3
Console pane
19.4 CRAN
Comprehensive R Archive Network. The official repository where the versions of R
and all published packages can be downloaded from. Use install.packages("package_name")
to download and install a new package from CRAN.
19.5 Environment Pane
19.6 Evaluate
19.7 Expression
R code consists of expressions. There are multiple types of expressions: - constant (that is, a character string or a number). E.g.,
> 123
[1] 123
> "abc"
[1] abc
- operator expression (that is, every expression that contains one of R’s operators):
> 2 + 3
[1] 5
> a <- "abc"
[1]
abc>
2 > 3
[1]
FALSE
- index constructions (that is, extracting elements from a vector or list using numerical or name indices):
> c(1, 2, 3)[-1]
[1] 2 3
> fruit <- list(apple = 5, pear = 2)
> fruit[["apple"]][1]
5
> fruit$pear[1]
2
- flow control element: loops, conditional expressions,…
> if (x %% 2 == 1) print("odd") else print("even")
> for (i in 1:10) print(i)
- compound expression (“block”): a series of expressions grouped between curly braces and separated by semicolons or new lines:
> {x <- 1; x += 5}
> {
x <- 1
x += 5
}
- function definition:
> function(x, y) x + y
> function(x,y) {
x + y
}
- function call:
> print(5)
[1]
5
19.8 File pane
19.9 Function:
A function is a (sequence of) statements that is not evaluated immediately when it is “declared” (that is, created/written), but only when it is “called” / “invoked” (that is, when you tell R to evaluate it). Functions have their own environment of variables; if you assign a value to a symbol that is already used in another environment, the variable in that other environment will be left untouched.
When you call a function, you can pass values to that function through its list of arguments, which is a comma-separated list between brackets following the function’s name.
In order to “declare” (create) a function, the keyword function
is needed,
followed by the list of arguments (between parentheses) and the body of the function
(usually between curly braces). Functions are usually assigned to a variable name
(functions that are not linked to a name are called anonymous functions).
Functions in R are objects, which mean they can be passed to other functions as arguments, placed in lists, etc.
19.10 Global variable:
A variable defined outside the scope of a function.
19.11 Import:
see Load.
19.12 Load:
Bring data or packages from the computer’s (passive) storage into its (active) memory so they can be used in the current R session. Loading a package adds the names of functions and other objects from that package to R’s namespace. Use library(“package_name”) to import/load a package into R for the current session (in a script, group all these imports at the top of a script).
19.13 Local variable
A variable defined inside a function. Outside that function, the variable’s name will not be bound to the value it was bound to within the function’s scope.
19.14 Package:
A collection of functions and datasets developed by a user to extend R.
Packages can be published on CRAN
, or distributed on GitHub or other repositories.
If you want to use a package, it must first be installed and then be loaded into your current R session. Installation must be done only once, loading every time you start a new R session.
- Use
install.packages("package_name")
to download and install a new package fromCRAN
. - Use
library("package_name")
to import/load an installed package into R for the current session (in a script, group all imports at the top of a script). - Use
installed.packages()
to get a list of all packages installed for your R version.
19.15 R:
A computer language developed for statistical analysis and graphics.
19.16 RStudio:
An Integrated Development Environment (IDE) for writing, executing and debugging R code. R comes with its own IDE but most users prefer RStudio for its user-friendliness.
19.17 Pane:
The Rstudio window is divided into four quadrants called panes: the Source pane, the Console pane, the Environment Pane and the File pane. The first two are for writing code, the latter two contain a number of tabs with useful resources.
You can minimise and maximise the size of each pane by using the icons in the top right of every pane.
To switch the order of the panes, use RStudio’s “Pane layout” dialog, which you
can find in the Options
dialog (Tools > Options
; RStudio > Preferences
on Mac).
19.18 Prompt:
19.19 Run:
Execute R code.
19.20 Scope:
Symbols in R are bound to a specific value only within a specific environment or “scope”. E.g., the symbol of a variable defined within a function will be bound to that variable’s value only within that function (such a variable is called a local variable); outside of the function, it will be considered unbound (that is, not connected to a value):
> f <- function (x) {
doubled = x * 2
}
> doubled
Error: Object 'doubled' not found.
Variables declared outside a function are considered global variables.
If during evaluation of a function a symbol is encountered that is not in the local environment (that is, a symbol that was not in that function’s list of arguments and that was not defined inside the function), R will search for this symbol in the environment from which the function was called, and so on until the global environment is reached.
19.21 Session:
19.22 Source pane:
19.23 Symbol:
The name component in a variable: a variable is a value assigned to a symbol. E.g.,
x <- 3
(x is the symbol/name of the variable, 3 its value).
19.24 Variable:
A symbol (name) linked with a value that this symbol represents.
Linking a symbol with a value is called assigning; this is done using an
assignment operator (<-
, ->
, =
).
19.25 Vector:
A vector is the main data structure in R: vectors are collections of data. Usually, the term vector is used as shorthand for a specific type of vector in R, so-called atomic vectors; they are called like that because every element in an atomic vector is of the same data type.
There are 5 “modes” of atomic vectors, based on the data type of its elements; only the first three are directly relevant for us:
character vector: all elements are text strings (data type:
character
).> v <- c("a", "100", "ألف", "vector elements can be very long strings") > typeof(v) [1] "character" > mode(v) [1] "character"
numeric vector: all elements are of the
integer
type (whole numbers, both positive and negative: 1, 2, -137, …), or of thedouble
type (“double precision floating point numbers”: 1.2345, -125.8, pi, …) E.g.,> v <- c(1, -300, 18.5, pi) > v [1] 1.000000 300.000000 18.500000 3.141593 > typeof(v) [1] "double" > mode(v) [1] "numeric"
logical vector: all elements are one of the boolean values TRUE or FALSE:
> v <- c(TRUE, TRUE, FALSE, TRUE) > typeof(v) [1] "logical" > mode(v) [1] "logical"
complex vector: all elements are complex numbers (numbers that have a real and an imaginary part)
raw vector: all elements are raw byte objects
The c()
function is often used to create vectors with multiple elements. But
even if you assign a single string or number to a variable, the variable will be
a vector:
> a <- "This is a string"
> class(a)
[1] "character" # a is a character vector!
> a[1]
[1] "This is a string" # our string is the first (and only) element of that vector!
Vectors have no dimensions (vs. for example tables, which have 2 dimensions: rows and columns).
19.26 relevant R functions
19.26.1 class()
The class function is used to display the class of an R object.
The function has one argument: the object you want to know the class of.
> a = 15
> class(a)
[1] "numeric"
> b = "15"
> class(b)
[1] "character"
> class(class)
[1] "function"
19.26.2 ls()
The ls
function (for “list”) lists all the objects we have created in the current
R session. You will find the same information in the Environment
tab in RStudio.
The ls()
function does not require any arguments
> ls()
[1] "a" "b"
19.26.3 c()
The c()
function (for “combine”) combines multiple values into a single vector
object.
> character_vector <- c('a', 'b', 'c')
> character_vector
[1] "a" "b" "c"
> numeric_vector <- c(1,2,3)
> numeric_vector
[1] 1 2 3
> logical_vector <- numeric_vector >= 2
> logical_vector
[1] FALSE TRUE TRUE
Note that all objects inside a vector must be of the same type (character/numberic/logical). If they are of different types, R will “coerce” them into the same type.
> mixed_vector <- c(1, "2", "three", TRUE)
> mixed_vector
[1] "1" "2" "three" "TRUE" # R has converted all elements into strings!
19.26.4 length()
The length()
function will display the number of elements in a vector.
> my_vector <- c("a", "bb", "ccc")
> length(my_vector)
[1] 3
> length("A longer character string")
[1] 1
19.26.5 paste()
The paste()
function concatenates two or more character vectors.
By default, it will add a space between two strings:
> paste('a', "b")
[1] "a b"
> paste('a', "b", "c")
[1] "a b c"
If you want another character to be used to separate the two strings, the function provides an additional argument called “sep”:
> paste('a', "b", sep=",")
[1] "a,b"
> paste(c('a', 'b', 'c'), "d", sep='/')
[1] "a/d" "b/d" "c/d"
19.26.6 nchar()
The nchar()
function (for “number of characters”) returns the number of
characters in each string in a character vector.
> nchar("banana")
[1] 6
> test_vector <- c("apple", "pear", "banana")
> nchar(test_vector)
[1] 5 4 6
19.26.7 substr()
The substr
function returns substrings of character vectors using character
offsets of each string in the vector. The function takes three arguments:
* the character vector from which you want to extract a substring
* start
: the index of the substring inside each string inside the vector
* stop
: the last character of the substring inside each string inside the vector
> substr("Banana", start=2, stop=5)
[1] "anan"
> test_vector <- c("apple", "pear", "banana")
> substr(test_vector, start=1, stop=3)
[1] "app" "pea" "ban"
NB: note that in R (in contrast to many other programming languages) the first
index of an object is 1, not 0; and that the stop index is inclusive (e.g., if
stop
is set to 5, the substring will end after the fifth value, not before it).
If a string inside a character vector is shorter than the stop
value, the
substr
function will return the string from the start
value until its last
character:
> test_vector <- c("apple", "pear", "banana")
> substr(test_vector, start=1, stop=5)
[1] "apple" "pear" "banan"
You can use the nchar()
function to return all characters after an index
position until the end of the string for each string in a vector, or only
the last n characters in each string:
> test_vector <- c("apple", "pear", "banana")
> substr(test_vector, start=2, stop=nchar(test_vector))
[1] "pple" "ear" "anana"
> substr(test_vector, start=nchar(test_vector)-2, stop=nchar(test_vector))
[1] "ple" "ear" "ana"
19.26.8 grep()
The grep()
function (short for “Global Regular Expressions Print”) returns
the indices of all strings inside a character vector that match a given pattern.
grep()
requires two arguments (in addition, there are a number of optional arguments):
pattern
: the regex pattern a string must matchx
: the character vector containing the string(s)
> test_vector <- c("apple", "pear", "banana")
> grep('a', test_vector)
[1] 1 2 3
> grep('p', test_vector)
[1] 1 2
> grep('(\\w)\\1', test_vector) # strings that have a duplicated character in them
[1] 1
NB: to escape a character in regular expressions in R, you need to use double
backslashes instead of single ones: e.g., use \\n
for a new line character.
To match a literal backslash, you will need four backslashes! \\\\
19.26.9 Casting functions: as.numeric(), as.character()
Casting functions explicitly convert an R object of one type into an object of another
type. You will probably most frequently use as.numeric()
to turn character
vectors into numeric characters, and as.character()
to do the opposite.
> a <- "123"
> a
[1] "123"
> a + 4
Error in a + 4 : non-numeric argument to binary operator
> as.numeric(a) + 4
[1] 127
> b <- 123
> b
[1] 123
> as.character(b)
[1] "123"
There are many more casting functions (try writing “as.” in the RStudio console and a popup will appear with dozens of other casting functions).
19.26.10 matrix()
The matrix()
function creates a matrix object. A matrix is a two-dimensional
data structure, similar to a table with rows and columns. All elements in a
matrix must be of the same type (string/numeric/…)
The matrix()
function requires three arguments:
data
: a vector containing the data that should be put into the matrixnrow
: the number of rows the matrix should havencol
: the number of columns the matrix should have
# create a matrix with 3 rows and 4 columns containing the numbers from 1 to 12:
> m <- matrix(data=1:12, nrow=3, ncol=4)
> m
[,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
The optional argument byrow
defines whether the data should be fed into the
matrix by row (byrow=TRUE
) or by column (byrow=FALSE
); default is FALSE
.
> m <- matrix(data=1:12, nrow=3, ncol=4, byrow=TRUE)
> m
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
19.26.11 data.frame()
The data.frame()
function creates a dataframe object, which is similar to a
spreadsheet or a table in a database.
Contrary to a matrix, a dataframe can contain objects of different types (character,
numeric, …)
> df <- data.frame(fruit=c("apple", "banana", "pear"), stock=c(15, 20, 3))
> df
fruit stock
1 apple 15
2 banana 20
3 pear 3
Every dataframe has 3 attributes:
dim
: its dimension (number of rows and columns)colnames
: the names of its columnsrownames
: the names of its rows
> dim(df)
[1] 3 2
> colnames(df)
[1] "fruit" "stock"
> rownames(df)
[1] "1" "2" "3"
The name of a column can be used to get the data from that column:
> df$fruit
[1] "apple" "banana" "pear"
19.26.12 setwd() and getwd()
R needs a specific location on your computer where it can write data, and from where it can read data. This location is called the “working directory”.
The setwd()
function (short for “set working directory”) sets the working directory
to the path that is passed as an argument to the function.
The getwd()
function displays the current working directory.
> setwd("~")
> getwd()
[1] "C:/Users/peter/Documents"
19.26.13 dir()
The dir()
function (short for “directory”) the contents (files and folders)
of a directory (by default: the working directory).
19.26.14 read.csv()
CSV (“Comma-Separated Values”) files are plain text files used to hold structured data. Data is written in tabular form, with commas delimiting columns and new lines delimiting rows.
An example of the contents of a csv file:
word,frequency
apple,15
banana,20
pear,5
The read.csv()
function loads a csv file into an R dataframe object:
> freq <- read.csv(file="word_frequency.csv", as.is=TRUE)
> freq
word frequency
1 apple 15
2 banana 20
3 pear 5
> class(freq)
[1] "data.frame
Sometimes, csv data uses other delimiters than comma to separate columns; the
tab character (symbolized in R by \t
) is often used (such files are often
called TSV files, for “tab-separated values”).
word frequency
apple 15
banana 20
pear 5
The read.csv
function’s optional argument sep
(for “separator”; default: “,”)
can be set to \t
(that is, tab) to parse a tsv file:
> freq2 <- read.csv(file="word_frequency.csv", as.is=TRUE, sep="\t")
> freq2
word frequency
1 apple 15
2 banana 20
3 pear 5
> class(freq2)
[1] "data.frame
19.26.15 order()
The order()
function can be used to sort a vector or another R data structure
like a dataframe. Its output is a vector that contains indices for the sort
order of the object you passed to the function.
> test_vector <- c("apple","pear", "banana")
> index <- order(test_vector)
[1] 1 3 2
You can use the output of the order()
function to sort the original object:
> index <- order(test_vector)
> test_vector[index]
[1] "apple" "banana" "pear"
> df <- data.frame(word=c("apple", "banana", "pear"), frequency=c(15, 20, 3))
> index <- order(df$frequency)
> index
[1] 3 1 2
> df[index,]
word frequency
3 pear 5
1 apple 15
2 banana 20
In order to sort from high to low values, set the optional argument decreasing
to TRUE
:
> test_vector <- c("apple","pear", "banana")
> sort(test_vector, decreasing=TRUE)
[1] "pear" "banana" "apple"