Many people code and keep their data in an Excel or an Excel-type format. If your data is saved in an .xls file, we’ll need to first convert it to a .csv or a .txt file. Choose File > Save As...
, and then choose the format you want. I prefer .csv because I can easily see if there’s a mistake in the delimiter, but other programmers prefer tab-delimited files because they are easeier to read in a text editor.
You can either read in your file by specifying the full path to it:
Nigel <- read.csv("~/Documents/Intro_R/IR-2-NigelHunt.csv", head=T)
Or by setting your working directory to the directory where your files are located. If you’re not sure what directory you’re in right now, you can find out with:
getwd() ## print working directory
## [1] "/Users/betsysneller/Documents/public_html/r-mini-course"
You can also set your directory to whatever you like:
setwd("~/Documents/Intro_R") ## set working directory
Nigel <- read.csv("IR-2-NigelHunt.csv", head=T)
head(Nigel)
## SpeakerID Age Gender Ethnicity College High Elementary vowel
## 1 IHP2-155 19 M white Temple Central Cook-Wissahikon AH
## 2 IHP2-156 19 M white Temple Central Cook-Wissahikon OW
## 3 IHP2-157 19 M white Temple Central Cook-Wissahikon AY
## 4 IHP2-160 19 M white Temple Central Cook-Wissahikon AE
## 5 IHP2-161 19 M white Temple Central Cook-Wissahikon AY
## 6 IHP2-162 19 M white Temple Central Cook-Wissahikon ER
## stress word norm_F1 norm_F2 t beg end dur cd fm fp fv ps
## 1 0 HELLO 624 1217 3.420 3.403 3.453 0.050 6 5 4 2 0
## 2 1 HELLO 630 1456 3.540 3.533 3.753 0.220 63 0 0 0 7
## 3 1 I'M 884 1423 4.048 4.012 4.132 0.120 41 4 1 2 0
## 4 1 YEAH 707 1840 11.913 11.823 12.093 0.270 3 0 0 0 9
## 5 1 I 882 1549 12.287 12.093 12.482 0.389 41 0 0 0 0
## 6 1 WORK 547 1347 12.649 12.622 12.702 0.080 94 1 6 1 9
## fs style glide norm_F1.20. norm_F2.20. norm_F1.35. norm_F2.35.
## 1 1 NA 656 1260 624 1217
## 2 0 NA 636 1437 614 1390
## 3 0 NA s 935 1633 864 1423
## 4 0 NA 662 1983 713 1821
## 5 0 NA s 837 1424 867 1433
## 6 0 NA 549 1260 547 1357
## norm_F1.50. norm_F2.50. norm_F1.65. norm_F2.65. norm_F1.80. norm_F2.80.
## 1 576 1162 537 1133 511 1124
## 2 592 1270 576 1225 590 1135
## 3 837 1507 826 1593 805 1733
## 4 740 1681 761 1654 761 1568
## 5 883 1553 798 1687 669 1872
## 6 540 1504 571 1626 618 1647
## nFormants
## 1 6
## 2 5
## 3 4
## 4 4
## 5 4
## 6 5
I use read.csv()
to read in my .csv and my .txt files, mostly because I’m used to typing it. For a .txt file, you’d just add the optional argument sep="\t"
. On a pc, you should use a forward slash insetad: sep="/t"
. There are also the funcitons read.table()
and read.delim()
which are essentially the same.
If you don’t know the path to your file, you can also use the function file.choose()
inside your read.table
function to select your files. This will open up a file browser from which you can choose your file.
kiwi_vowels <- read.csv(file.choose())
I recommend against using the file.choose()
function. One of the benefits of using R is maximally reproducible research. Writing out the path and file name ensures that Future You knows exactly what data you’re working with. Using the file.choose()
function does not keep a record of the file that you ultimately chose, making it harder for you to ensure that you reproduce your research accurately.
It’s important to know that when you’re reading in a data file, you’re just loading the data (not the file) into your workspace. Any changes that you make to the data frame (changing names, adding columns) happens only on your copy in the workspace, not on the original source file. This is pretty great, because it means that playing around with your data in the workspace is risk-free: if you mess something up, you can always start over. It’s also good for reproducibility: Future You will be able to run your scripts exactly as they are.
A quick note about reading data files in as data frames: they have to be in data frame format (equal number of rows and columns). If your file has a few lines of weird preamble that does not line up with the number of columns in the rest of the data, then you need to manually delete it (or if it has crucial information in the preamble, then use Python to wrangle that data into the format of a data frame – or just do it manually).
That being said, there are times when you want to save the data manipulation that you’ve done into its own .csv or .txt file. This is easy to do, with write.csv()
or write.table()
. I did this earlier today to produce the example small data file IR-1-EveSmith.csv
:
write.csv(eve_small, file = "~/Documents/Intro_R/IR-1-EveSmith.csv", row.names=FALSE)
So it takes the data frame that you want to save as the first argument, and the file path (including the name you want to give it) as the second argument. After file =
, there are a couple optional arguments:
row.names
, which by default is set to TRUE. I never have row names for my data frames, so if I leave this as the default, the row numbers get assigned as names. This is annoying, because I have a meaningless column in my data frame that is just a sequence of numbers.
quote
, which by default is also set to TRUE. This automatically puts character strings inside of “”, as they would be in your R code. The biggest reason to set it to FALSE is if you want to look at your data in a text editor, and the multitudes of “” makes it hard to read your data. I always keep this at TRUE, since I open up my .csv files in Excel, which don’t show the “”, so it’s not a problem for me.
sep
, which assigns a separator. If you want your output to be tab-delimited, set it to \t
on a Mac and /t
on a PC.
And of course, you’ll want to be able to save the beautiful and convincing plots that you’re making. There are a few ways to do this.
ggsave
If you’re using ggplot
to make your graphs, you can save them with ggsave()
. Let’s say I’ve made the grammatical categories barplot from yesterday, and saved it as gram_ing_plot
. I can pull it up in the plots window:
gram_ing_plot
ggsave
works by default on the most recent graph that you’ve plotted. The only arugment it needs is a file name to save it as. By default, it will save to the directory that you’re working out of. If you like, you can explicitly specify a different path to save your plot in a different directory.
As with all files that you save, your file names are like breadcrumbs - so Future You will know how to find it again if you want to re-run or change it. I like to keep my plots in their own folder, because it makes searching for the right one later much easier.
ggsave("plots/Gend_GrCat_barplot_IN.pdf")
So this is an easy way to save your ggplot
creations. A quick note: ggsave
supports most common file types, so if you put “plot.png” as the file name, it’ll save a .png file. Aside from the default settings, ggsave
also allows a surprising amount of flexibility, with optional arguments.
ggsave("file_name.pdf", plot_name)
allows you to save a plot that was not the most recent creation.
ggsave("file_name.pdf", scale = .8)
allows you to scale the plot. Note that geometric levels like points and text maintain their underlying sizes, not the scaled sized. So if you scale your plot down, it will have the effect of making points and text look larger, relative to the rest of the plot.
pdf()
I don’t use ggsave
, primarily because I like to see the details more clearly when I save my plots. Instaed, I use pdf()
when I’m writing my files to .pdf.
pdf("plots/Gend_GrCat_barplot_IN.pdf") ## opens a .pdf device with specified path and file name
print(gram_ing_plot) ## prints gram_ing_plot into the open device
dev.off() ## IMPORTANT! closes the device
pdf()
obviously writes files to a .pdf (you’ll get an empty plot if you try pdf("file.png")
) - but it’s part of a family of plot saving devices, that includes:
png("file.png")
jpeg("file.jpg")
bmp("file.bmp")
tiff("file.tiff")
With these functions, it’s important to actually print your plot with print()
(rather than just calling the plot name).
Additional arguments that give you lots of flexibility:
Argument | Does |
---|---|
height = 8 |
assign height |
width = 12 |
assign width |
onefile = T |
allows multiple figures to a single .pdf file (default = T) |
family = "Helvetica" |
defines the font family to be used. Default is “Helvetica” |
These are the arugments I regularly make use of - but you can find lots more in the documentation.