In this lesson, you’ll learn how to read in and explore a dataset on average rainfall in Australia and create beautiful plots with ggplot.
For help getting this lesson set up either in RStudio Cloud or RStudio Desktop, check out the Creating a New R Project video in the Getting Started tab.
Before we start add you name to the “author” argument in the YAML on line 3 where it says “write your name here”. Also, make sure to run the first R chunk above (by pushing the green arrow) to load in our libraries.
ggeasy
that will help
us create beautiful plotstidyverse
ggeasy
In this lesson, we’re going to create a line graph to show the average rainfall (in mm) for different cities in Australia over the past 20 years. The data is freely available on github. However, for this lesson, I’ve cleaned the data to make it easier for us to work with.
The data is saved in a file called rainfall_clean.csv
and this is the dataset we’ll be working with. Outside of R, you can try
opening this file and you’ll be able to open it directly in excel. Of
course, we will be working mostly in RStudio, but this will just give
you a quick glimpse of the data before we get started.
csv stands for “comma separated value” which is a simple text file. Often, when working with programming languages data is stored in csv files. Although you can also read in other types of files, like excel (XLSX) or text (TXT) files.
First, take a look in the Files tab in RStudio and locate the
rainfall_clean.csv
file.
To read in our data, we will use a read_csv()
. In the
console, type ?read_csv()
to see the help documentation
about this function. No need to read through this entire Help file - I
just wanted to point out how you can use the help documentation on
essentially any function within RStudio.
Copy the following line of R code into the read_data chunk.
rainfall <- read_csv(“rainfall_clean.csv”)
Make sure to run the code in read_data chunk by pressing the green arrow pointing to the right.
This line of code will read in our rainfall dataset and save it as a
variable called rainfall
. In the environment tab on the
right, you should now be able to see the rainfall variable.
rainfall <- read_csv("rainfall_clean.csv")
To view our data in a new tab, go to the Environment tab and click on
rainfall
- if you have a quick look at your console, you
can see it automatically printed View(rainfall)
To view our data in the console, simply type rainfall
and run the view_data chunk. Depending on your RMarkdown settings,
running this line of code may print your data directly below your R
chunk or in the console (either works!)
rainfall
# A tibble: 112 × 3
city_name year avg_rainfall
<chr> <dbl> <dbl>
1 Adelaide 2000 1.53
2 Adelaide 2001 1.71
3 Adelaide 2002 0.996
4 Adelaide 2003 1.49
5 Adelaide 2004 1.41
6 Adelaide 2005 1.61
7 Adelaide 2006 0.757
8 Adelaide 2007 1.27
9 Adelaide 2008 1.03
10 Adelaide 2009 1.34
# … with 102 more rows
Looking at the output, we see the first line printed says
A tibble: 112 x3
. A tibble is a type of dataframe that
stores our data in a convenient way. For example, it tells us the
dimensions of our dataset: we have 112 rows and 3 columns. The columns
represent our different variables and the rows represent individual
observations.
What are the three variables?
Underneath the column names, our tibble also tells us the types of
variables we are dealing with. For example, under city_name it says
If we look at the first row of our data, we learn that in the city of Adelaide in 2000 there was on average 1.53mm of rainfall.
Type the following 2 lines of code in the explore_data chunk.
unique(rainfall\(city_name) range(rainfall\)year)
unique(rainfall$city_name)
[1] "Adelaide" "Brisbane" "Canberra" "Melbourne" "Perth"
[6] "Sydney"
range(rainfall$year)
[1] 2000 2019
The first line of code returns a list of the unique cities within our dataset. How many cities do we have in our data?
The second line of code gives us the range of years. In other words, this tells us the minimum and maximum year. What is the range of years in our dataset?
Now, we’re ready to create a line graph! The goal of this graph is to plot the average mm of rainfall for each city in our dataset from the years 2000-2019.
In the plot1 chunk copy the following 3 lines of code:
plot1 <- ggplot(rainfall, aes(x = year, y = avg_rainfall)) + geom_line() plot1
We’re creating a plot using the ggplot function and our rainfall
dataset. We’re plotting year on the x axis and average rainfall on the y
axis. geom_line()
is a function to tell R that we want to
plot a line graph.
However, this graph isn’t quite right yet. We only have 1 line for the entire dataset, but we really want 1 line for each of the 5 cities.
In the plot2 chunk copy the following 3 lines of code:
plot2 <- ggplot(rainfall, aes(x = year, y = avg_rainfall, color = city_name)) + geom_line() plot2
This looks much better!
Here, we just added one argument
called “color” and told
R we want to color the lines by the different city names.
However, we can add a few more lines of code to make our plot even prettier.
In the plot3 chunk copy the following 7 lines of code:
plot3 <- ggplot(rainfall, aes(x = year, y = avg_rainfall, color = city_name)) + geom_line(size = 1) + labs(x = “Year”, y = “Average Rainfall (mm)”) + theme_bw() + easy_text_size(15) + easy_remove_legend_title() plot3
plot3 <- ggplot(rainfall, aes(x = year, y = avg_rainfall, color = city_name)) +
geom_line(size = 1) +
labs(x = "Year", y = "Average Rainfall (mm)") +
theme_bw() +
easy_text_size(15) +
easy_remove_legend_title()
plot3
geom_line()
we added the size
argument to make the lines on our graph a little bit thicker and easier
to seelabs()
function allows us to easily change the
labels on the x and y axestheme_bw()
is one of ggplot’s default themes, which is
my personal favorite. It makes the plot a little bit cleanereasy_text_size()
and
easy_remove_legend_title()
are two functions from the
ggeasy package that allow us to easily adjust the text size and remove
the legend title. In order to use these functions, you need to remember
to call the ggeasy
library first (like we did at the top of
our script)Finally, in the plot4 chunk try changing the code from plot3 so the text size is set to 25 and the theme is set to theme_dark() instead of theme_bw() - can you see the differences?
Finally, knit your document to see all of your beautiful plots.
License and Citation: You can use, modify, and adapt any of the lessons, but please include the following attribution: RGirls Community. (2022, April 10). RGirls Lessons. Zenodo. https://doi.org/10.5281/zenodo.6436861