Before we start working with data in R, it’s helpful to know some of the basic operations R can perform and to get a feel for the syntax of R.
R is an object based language. You will spend a lot of time defining and then manipulating objects in your environment. Objects can be simple, a single number for example, or more complicated like an entire dataframe. To be defined and saved in R’s memory they must have a name.
To create an object we use the ‘gets’ operator: <-
On the left side of the gets operator you write the name of the object you are creating, and on the right side you assign whatever information you want retained in that object. For example the following line of code creates an object named “George” which is assigned the numeric value 42.
George <- 42
Some things to note: George is capitalized. If you want to reference this object you need to use the capitalization. The assigned value is a number. Because it is a number you could use it in mathematical operations.
You can check the class of an object with the function: class()
class(George)
## [1] "numeric"
The classes of objects matter because certain functions only work on certain classes of object….
If you started with the “What is R?” tutorial you’ve seen a bunch of functions already, but I never defined the term. If you’re thinking about R as a language it might be helpful to think of functions like verbs. Functions do something. What they do depends on the function, and what arguments you give it.
Mathematically, y = mx + b is a function. It defines a line. m, x, and b are all arguments of the function. m is the slope, x is a variable, b is the y-intercept. m and b are necessary arguments. The function needs them to define a line. x is not a necessary argument, it is only necessary if you want a specific value of y.
Functions in R don’t have the same syntax as functions in math, but they still take arguments to operate.
class() takes the object you’re interested in as an argument to return what type of object it is. Let’s define some more objects and learn some more functions.
One useful function is c() or the “concatenate” function. To concatenate is to put things together. c() puts things together into a list. Run the following line of code:
my_list <- c(1, 2, 3, 4, 5)
Now you have a new object named “my_list” which is a list of numbers 1 through 5.
Create another list called “myList” which includes the numbers 6 through 10. Answers to exercises are at the bottom of the document.
Now you have two lists. Notice the names. They are very similar and would be read allowed the same but they use two different techniques to avoid the fact that R doesn’t allow spaces in object names. Another technique would be to use a period between the words.
Both of these lists contain numbers so we can use functions that work with numbers on them. mean() takes the average, median() gives the median, sum() gives the sum.
mean(my_list)
## [1] 3
Use the example above as a guide to find the mean, median, and sum of both your lists.
sum() added all the entries of each list together, what if we wanted to add my_list to myList?
There are two ways to do this, depending on what we want. Do you want to add the first entry in my_list to the first entry of myList and the second entry of my_list to the second entry of myList and so forth resulting in a new list? Or do you want to add all the entries of both lists together resulting in just one number?
Which does sum() do if you use the names of the lists?
sum(my_list, myList)
## [1] 55
It adds all the entries together to give you one number.
Notice that in this case, we gave sum two arguments instead of just one. When multiple arguments are used in a function they are separated by a comma.
So how do we do the other type of addition?
In this case we don’t have to use a function, we can just use the addition operator: +
my_list + myList
## [1] 7 9 11 13 15
R recognizes mathematical operators and can be used to do calculations.
Define two objects: x which is the number 5, and y which is the number 4.
You can now do basic math with x and y, using their names.
x + y
## [1] 9
x - y
## [1] 1
x/y
## [1] 1.25
x*y
## [1] 20
x^y
## [1] 625
You’re probably not learning R to use it just as a calculator, so let’s think about how these functions could be useful for working with data.
Let’s say I’m working with abandoned lots in a city and I’ve numbered them and have their lengths from the front of the lot to the back alley and the distance along the sidewalk in front as their widths. Run the code below to save that information as lists.
lots <- c("lot1", "lot2", "lot3", "lot4")
lotLengths <- c(35, 40, 38, 20)
lotWidths <- c(14, 10, 16, 25)
How would I find the area of the lots?
How could I save that information to a new list called lotAreas?
These lists can be used on their own, but they’re all information about the same lots so it would make sense to put that information together. In this case each list is a different measure for each lot so they should be put together as columns. We can do that with the cbind() function. If each list was observations about 1 lot we could bind them together as rows using rbind().
lotData <- cbind(lots, lotLengths, lotWidths, lotAreas)
lotData
## lots lotLengths lotWidths lotAreas
## [1,] "lot1" "35" "14" "490"
## [2,] "lot2" "40" "10" "400"
## [3,] "lot3" "38" "16" "608"
## [4,] "lot4" "20" "25" "500"
myList <- c(6, 7, 8, 9, 10)
mean(my_list)
## [1] 3
median(my_list)
## [1] 3
sum(my_list)
## [1] 15
mean(myList)
## [1] 8
median(myList)
## [1] 8
sum(myList)
## [1] 40
x <- 5
y <- 4
lotLengths * lotWidths
## [1] 490 400 608 500
lotAreas <- lotLengths * lotWidths