Data Input and Output

Reading in your data
Saving new data files
Saving plots
- ggsave
- pdf()

Reading in your data

Many people code and keep their data in an Excel or an Excel-type format. If your data is saved in an .xls file, we’ll need to first convert it to a .csv or a .txt file. Choose File > Save As..., and then choose the format you want. I prefer .csv because I can easily see if there’s a mistake in the delimiter, but other programmers prefer tab-delimited files because they are easeier to read in a text editor.

You can either read in your file by specifying the full path to it:

Nigel <- read.csv("~/Documents/Intro_R/IR-2-NigelHunt.csv", head=T)

Or by setting your working directory to the directory where your files are located. If you’re not sure what directory you’re in right now, you can find out with:

getwd() ## print working directory

## [1] "/Users/betsysneller/Documents/public_html/r-mini-course"

You can also set your directory to whatever you like:

setwd("~/Documents/Intro_R") ## set working directory
Nigel <- read.csv("IR-2-NigelHunt.csv", head=T)
head(Nigel)

##   SpeakerID Age Gender Ethnicity College    High      Elementary vowel
## 1  IHP2-155  19      M     white  Temple Central Cook-Wissahikon    AH
## 2  IHP2-156  19      M     white  Temple Central Cook-Wissahikon    OW
## 3  IHP2-157  19      M     white  Temple Central Cook-Wissahikon    AY
## 4  IHP2-160  19      M     white  Temple Central Cook-Wissahikon    AE
## 5  IHP2-161  19      M     white  Temple Central Cook-Wissahikon    AY
## 6  IHP2-162  19      M     white  Temple Central Cook-Wissahikon    ER
##   stress  word norm_F1 norm_F2      t    beg    end   dur cd fm fp fv ps
## 1      0 HELLO     624    1217  3.420  3.403  3.453 0.050  6  5  4  2  0
## 2      1 HELLO     630    1456  3.540  3.533  3.753 0.220 63  0  0  0  7
## 3      1   I'M     884    1423  4.048  4.012  4.132 0.120 41  4  1  2  0
## 4      1  YEAH     707    1840 11.913 11.823 12.093 0.270  3  0  0  0  9
## 5      1     I     882    1549 12.287 12.093 12.482 0.389 41  0  0  0  0
## 6      1  WORK     547    1347 12.649 12.622 12.702 0.080 94  1  6  1  9
##   fs style glide norm_F1.20. norm_F2.20. norm_F1.35. norm_F2.35.
## 1  1    NA               656        1260         624        1217
## 2  0    NA               636        1437         614        1390
## 3  0    NA     s         935        1633         864        1423
## 4  0    NA               662        1983         713        1821
## 5  0    NA     s         837        1424         867        1433
## 6  0    NA               549        1260         547        1357
##   norm_F1.50. norm_F2.50. norm_F1.65. norm_F2.65. norm_F1.80. norm_F2.80.
## 1         576        1162         537        1133         511        1124
## 2         592        1270         576        1225         590        1135
## 3         837        1507         826        1593         805        1733
## 4         740        1681         761        1654         761        1568
## 5         883        1553         798        1687         669        1872
## 6         540        1504         571        1626         618        1647
##   nFormants
## 1         6
## 2         5
## 3         4
## 4         4
## 5         4
## 6         5

I use read.csv() to read in my .csv and my .txt files, mostly because I’m used to typing it. For a .txt file, you’d just add the optional argument sep="\t". On a pc, you should use a forward slash insetad: sep="/t". There are also the funcitons read.table() and read.delim() which are essentially the same.

If you don’t know the path to your file, you can also use the function file.choose() inside your read.table function to select your files. This will open up a file browser from which you can choose your file.

kiwi_vowels <- read.csv(file.choose())

I recommend against using the file.choose() function. One of the benefits of using R is maximally reproducible research. Writing out the path and file name ensures that Future You knows exactly what data you’re working with. Using the file.choose() function does not keep a record of the file that you ultimately chose, making it harder for you to ensure that you reproduce your research accurately.

It’s important to know that when you’re reading in a data file, you’re just loading the data (not the file) into your workspace. Any changes that you make to the data frame (changing names, adding columns) happens only on your copy in the workspace, not on the original source file. This is pretty great, because it means that playing around with your data in the workspace is risk-free: if you mess something up, you can always start over. It’s also good for reproducibility: Future You will be able to run your scripts exactly as they are.

A quick note about reading data files in as data frames: they have to be in data frame format (equal number of rows and columns). If your file has a few lines of weird preamble that does not line up with the number of columns in the rest of the data, then you need to manually delete it (or if it has crucial information in the preamble, then use Python to wrangle that data into the format of a data frame – or just do it manually).

Saving new data files

That being said, there are times when you want to save the data manipulation that you’ve done into its own .csv or .txt file. This is easy to do, with write.csv() or write.table(). I did this earlier today to produce the example small data file IR-1-EveSmith.csv:

write.csv(eve_small, file = "~/Documents/Intro_R/IR-1-EveSmith.csv", row.names=FALSE)

So it takes the data frame that you want to save as the first argument, and the file path (including the name you want to give it) as the second argument. After file =, there are a couple optional arguments:

row.names, which by default is set to TRUE. I never have row names for my data frames, so if I leave this as the default, the row numbers get assigned as names. This is annoying, because I have a meaningless column in my data frame that is just a sequence of numbers.
quote, which by default is also set to TRUE. This automatically puts character strings inside of “”, as they would be in your R code. The biggest reason to set it to FALSE is if you want to look at your data in a text editor, and the multitudes of “” makes it hard to read your data. I always keep this at TRUE, since I open up my .csv files in Excel, which don’t show the “”, so it’s not a problem for me.
sep, which assigns a separator. If you want your output to be tab-delimited, set it to \t on a Mac and /t on a PC.

Saving plots

And of course, you’ll want to be able to save the beautiful and convincing plots that you’re making. There are a few ways to do this.

`ggsave`

If you’re using ggplot to make your graphs, you can save them with ggsave(). Let’s say I’ve made the grammatical categories barplot from yesterday, and saved it as gram_ing_plot. I can pull it up in the plots window:

gram_ing_plot

ggsave works by default on the most recent graph that you’ve plotted. The only arugment it needs is a file name to save it as. By default, it will save to the directory that you’re working out of. If you like, you can explicitly specify a different path to save your plot in a different directory.

As with all files that you save, your file names are like breadcrumbs - so Future You will know how to find it again if you want to re-run or change it. I like to keep my plots in their own folder, because it makes searching for the right one later much easier.

ggsave("plots/Gend_GrCat_barplot_IN.pdf")

So this is an easy way to save your ggplot creations. A quick note: ggsave supports most common file types, so if you put “plot.png” as the file name, it’ll save a .png file. Aside from the default settings, ggsave also allows a surprising amount of flexibility, with optional arguments.

ggsave("file_name.pdf", plot_name) allows you to save a plot that was not the most recent creation.
ggsave("file_name.pdf", scale = .8) allows you to scale the plot. Note that geometric levels like points and text maintain their underlying sizes, not the scaled sized. So if you scale your plot down, it will have the effect of making points and text look larger, relative to the rest of the plot.

`pdf()`

I don’t use ggsave, primarily because I like to see the details more clearly when I save my plots. Instaed, I use pdf() when I’m writing my files to .pdf.

pdf("plots/Gend_GrCat_barplot_IN.pdf")     ## opens a .pdf device with specified path and file name
print(gram_ing_plot)   ## prints gram_ing_plot into the open device
dev.off()       ## IMPORTANT! closes the device

pdf() obviously writes files to a .pdf (you’ll get an empty plot if you try pdf("file.png")) - but it’s part of a family of plot saving devices, that includes:

png("file.png")
jpeg("file.jpg")
bmp("file.bmp")
tiff("file.tiff")

With these functions, it’s important to actually print your plot with print() (rather than just calling the plot name).

Additional arguments that give you lots of flexibility:

Argument	Does
`height = 8`	assign height
`width = 12`	assign width
`onefile = T`	allows multiple figures to a single .pdf file (default = T)
`family = "Helvetica"`	defines the font family to be used. Default is “Helvetica”

These are the arugments I regularly make use of - but you can find lots more in the documentation.