Home: Plotting rainfall data from Australian cities

Download zip file here

For help getting this lesson set up either in RStudio Cloud or RStudio Desktop, check out the Creating a New R Project video in the Getting Started tab.

Before we start add you name to the “author” argument in the YAML on line 3 where it says “write your name here”. Also, make sure to run the first R chunk above (by pushing the green arrow) to load in our libraries.

Lesson Objectives

To learn how to read in data
To practice viewing and exploring your data
To identify R commands needed to plot a line graph
To learn about a package called ggeasy that will help us create beautiful plots

Packages used in this lesson:

tidyverse
ggeasy

Success Criteria:

I can identify where my data is located and what type of file it is
I can view my data in a new RStudio window
I can see all of the different variables within my dataset
I have a better understanding of what “packages” are
I can produce a line plot using ggplot

Keywords:

csv files
packages
functions
arguments

Our data

In this lesson, we’re going to create a line graph to show the average rainfall (in mm) for different cities in Australia over the past 20 years. The data is freely available on github. However, for this lesson, I’ve cleaned the data to make it easier for us to work with.

The data is saved in a file called rainfall_clean.csv and this is the dataset we’ll be working with. Outside of R, you can try opening this file and you’ll be able to open it directly in excel. Of course, we will be working mostly in RStudio, but this will just give you a quick glimpse of the data before we get started.

csv stands for “comma separated value” which is a simple text file. Often, when working with programming languages data is stored in csv files. Although you can also read in other types of files, like excel (XLSX) or text (TXT) files.

Reading in our data

First, take a look in the Files tab in RStudio and locate the rainfall_clean.csv file.

To read in our data, we will use a read_csv(). In the console, type ?read_csv() to see the help documentation about this function. No need to read through this entire Help file - I just wanted to point out how you can use the help documentation on essentially any function within RStudio.

Copy the following line of R code into the read_data chunk.

rainfall <- read_csv(“rainfall_clean.csv”)

Make sure to run the code in read_data chunk by pressing the green arrow pointing to the right.

This line of code will read in our rainfall dataset and save it as a variable called rainfall. In the environment tab on the right, you should now be able to see the rainfall variable.

rainfall <- read_csv("rainfall_clean.csv")

View our data

To view our data in a new tab, go to the Environment tab and click on rainfall - if you have a quick look at your console, you can see it automatically printed View(rainfall)

To view our data in the console, simply type rainfall and run the view_data chunk. Depending on your RMarkdown settings, running this line of code may print your data directly below your R chunk or in the console (either works!)

rainfall

# A tibble: 112 × 3
   city_name  year avg_rainfall
   <chr>     <dbl>        <dbl>
 1 Adelaide   2000        1.53 
 2 Adelaide   2001        1.71 
 3 Adelaide   2002        0.996
 4 Adelaide   2003        1.49 
 5 Adelaide   2004        1.41 
 6 Adelaide   2005        1.61 
 7 Adelaide   2006        0.757
 8 Adelaide   2007        1.27 
 9 Adelaide   2008        1.03 
10 Adelaide   2009        1.34 
# … with 102 more rows

Understanding our data

Looking at the output, we see the first line printed says A tibble: 112 x3. A tibble is a type of dataframe that stores our data in a convenient way. For example, it tells us the dimensions of our dataset: we have 112 rows and 3 columns. The columns represent our different variables and the rows represent individual observations.

What are the three variables?

Underneath the column names, our tibble also tells us the types of variables we are dealing with. For example, under city_name it says which stands for a “character string”, or a string of letters. Under year it says which stands for double, which contains numeric values that may have decimal places.

If we look at the first row of our data, we learn that in the city of Adelaide in 2000 there was on average 1.53mm of rainfall.

Type the following 2 lines of code in the explore_data chunk.

unique(rainfall\(city_name) range(rainfall\)year)

unique(rainfall$city_name)

[1] "Adelaide"  "Brisbane"  "Canberra"  "Melbourne" "Perth"    
[6] "Sydney"

range(rainfall$year)

[1] 2000 2019

The first line of code returns a list of the unique cities within our dataset. How many cities do we have in our data?

The second line of code gives us the range of years. In other words, this tells us the minimum and maximum year. What is the range of years in our dataset?

Create a line graph

Plot 1

Now, we’re ready to create a line graph! The goal of this graph is to plot the average mm of rainfall for each city in our dataset from the years 2000-2019.

In the plot1 chunk copy the following 3 lines of code:

plot1 <- ggplot(rainfall, aes(x = year, y = avg_rainfall)) + geom_line() plot1

plot1 <- ggplot(rainfall, aes(x = year, y = avg_rainfall)) +
  geom_line()
plot1

Understanding Plot 1

We’re creating a plot using the ggplot function and our rainfall dataset. We’re plotting year on the x axis and average rainfall on the y axis. geom_line() is a function to tell R that we want to plot a line graph.

However, this graph isn’t quite right yet. We only have 1 line for the entire dataset, but we really want 1 line for each of the 5 cities.

Plot 2

In the plot2 chunk copy the following 3 lines of code:

plot2 <- ggplot(rainfall, aes(x = year, y = avg_rainfall, color = city_name)) + geom_line() plot2

plot2 <- ggplot(rainfall, aes(x = year, y = avg_rainfall, color = city_name)) +
  geom_line()
plot2

Understanding Plot 2

This looks much better!

Here, we just added one argument called “color” and told R we want to color the lines by the different city names.

However, we can add a few more lines of code to make our plot even prettier.

Plot 3

In the plot3 chunk copy the following 7 lines of code:

plot3 <- ggplot(rainfall, aes(x = year, y = avg_rainfall, color = city_name)) + geom_line(size = 1) + labs(x = “Year”, y = “Average Rainfall (mm)”) + theme_bw() + easy_text_size(15) + easy_remove_legend_title() plot3

plot3 <- ggplot(rainfall, aes(x = year, y = avg_rainfall, color = city_name)) +
  geom_line(size = 1) +
  labs(x = "Year", y = "Average Rainfall (mm)") +
  theme_bw() +
  easy_text_size(15) + 
  easy_remove_legend_title()
plot3

Understanding Plot 3

The first line of our code is the same as plot 2
Within geom_line() we added the size argument to make the lines on our graph a little bit thicker and easier to see
The labs() function allows us to easily change the labels on the x and y axes
theme_bw() is one of ggplot’s default themes, which is my personal favorite. It makes the plot a little bit cleaner
easy_text_size() and easy_remove_legend_title() are two functions from the ggeasy package that allow us to easily adjust the text size and remove the legend title. In order to use these functions, you need to remember to call the ggeasy library first (like we did at the top of our script)

Finally, in the plot4 chunk try changing the code from plot3 so the text size is set to 25 and the theme is set to theme_dark() instead of theme_bw() - can you see the differences?

Finally, knit your document to see all of your beautiful plots.

THE END

License and Citation: You can use, modify, and adapt any of the lessons, but please include the following attribution: RGirls Community. (2022, April 10). RGirls Lessons. Zenodo. https://doi.org/10.5281/zenodo.6436861