Tips for visualising grid-data in ggplot2
As someone who works primarily with models in which individuals have spatial coordinates, I have struggled a lot with visualising these systems. If you are using similar models, or biological data sets that have a similar structure of datapoints with x- and y-coordinates, here's a few things I've picked up to make them prettier.
But first, let's start at step 0, and read some data...
0) Reading data for individuals on a grid
The data I use as an example is from a 2D individual-based model with 4002 grid-points, which has generated a white-space separated file of x-y coordinates. Each x-y coordinate is either empty (0) or contains one of six types of individuals (1-6).
I read this data into R and convert it to a long-format table with four columns: an x- and y-coordinate, a value (0-7), and a label ("individual_type"). The latter label can be useful for if you have multiple variables for each position on the grid, such as cell size, metabolite concentrations, etc. Here's how to do this:
# Reading raw data in to data frame data <- read.table("individuals.dat") > head(data) V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 ... 1 1 1 1 0 1 1 1 1 1 1 1 1 1 ... 2 1 1 1 1 1 0 1 1 1 1 1 0 0 ... griddat <- data.frame(r=as.vector(row(data)), c=as.vector(col(data)), val=as.vector(t(data)),var="type") > head(griddat,20) r c val var 1 1 1 1 type 2 2 1 1 type 3 3 1 1 type 4 4 1 0 type 5 5 1 1 type 6 6 1 1 type 7 7 1 1 type 8 8 1 1 type 9 9 1 1 type 10 10 1 1 type 11 11 1 1 type 12 12 1 1 type 13 13 1 1 type 14 14 1 1 type 15 15 1 0 type 16 16 1 0 type 17 17 1 0 type 18 18 1 2 type 19 19 1 2 type 20 20 1 2 type
I use a simple trick to convert it to a long-format table, where I convert the rows, columns, and values to vectors. For the values (val), the dataframe first needs to be transformed into a matrix with t(), as default R dataframes do not support row-wise vectorisation.
By the way, I am loading the following packages:
# Packages: library(ggplot2) library(dplyr) library(gridExtra)
Now that we have the data and packages loaded. Let's see how NOT to plot it.
1) Use geom_raster, not geom_tile
For some reason, ggplot2 comes with two functions to draw x-y coordinate data on a 2D canvas: geom_raster and geom_tile. I discovered it is almost always better to use geom_raster when drawing these surfaces, as geom_tile() introduces strange artefacts (e.g. the line in the left panel below), and is a lot slower to boot!
############################ # Geom tile vs Geom raster # ############################ # val > 0 for only plot living individuals # x,y<=200 to plot only part of the grid plot1 <- griddat %>% filter(val > 0, x<= 200, y<=200) %>% ggplot(aes(x=x,y=y,fill=as.factor(val))) + geom_tile() + ggtitle("Without geom_tile") plot2 <- griddat %>% filter(val > 0, x<= 200, y<=200) %>% ggplot(aes(x=x,y=y,fill=as.factor(val))) + geom_raster() + ggtitle("With geom_raster") grid.arrange(plot1,plot2,nrow=1)
2) Use coord_fixed and negative y-coordinates
In most cases, I want my grid-data to be displayed as square cells. By default, ggplot just plots the entire grid with whatever width/height fit the device window. However, using coord_fixed() helps to keep things nice and square-looking. Also note that I am using negative y coordinates here to maintain the orientation of the original data table.
# Using coord_fixed griddat %>% filter(val > 0, x<= 200, y<=200) %>% ggplot(aes(x=x,y=-y,fill=as.factor(val))) + geom_raster() + coord_fixed()
3) Modify the GGplot default theme
I don't know about you, but I think empty space is black. I also am not very fond of the extra lines and tick labels here, so let's get rid of them:
# Changing the default ggplot theme griddat %>% filter(val > 0, x<= 200, y<=200) %>% ggplot(aes(x=x,y=-y,fill=as.factor(val)))+ geom_raster() + coord_fixed() + theme_void() + theme(panel.background = element_rect(fill = 'black', colour = 'black'))
4) Modify the weird pastel colours
The default pastel colours from ggplot2 are really pleasing for bar/pie charts, but not very pleasing for this mess. How about defining your own colours?
We could use the default rainbow colors:
griddat %>% filter(val > 0, x<= 200, y<=200) %>%ggplot(aes(x=x,y=-y,fill=as.factor(val))) + geom_raster() + coord_fixed() + theme_void() + theme(panel.background = element_rect(fill = 'black', colour = 'black')) + scale_fill_manual(values=rainbow(6),name="Individual type")
Or better yet, define your own set of colours:
# Color set by Kevin Wright @ stackoverflow: https://stackoverflow.com/questions/9563711/r-color-palettes-for-many-data-classes c25 <- c("dodgerblue2", "#E31A1C", # red"green4","#6A3D9A", # purple"#FF7F00", # orange"black", "gold1","skyblue2", "#FB9A99", # lt pink"palegreen2","#CAB2D6", # lt purple"#FDBF6F", # lt orange"gray70", "khaki2","maroon", "orchid1", "deeppink1", "blue1", "steelblue4","darkturquoise", "green1", "yellow4", "yellow3","darkorange4", "brown") griddat %>% filter(val > 0, x<= 200, y<=200) %>%ggplot(aes(x=x,y=-y,fill=as.factor(val))) + geom_raster() + coord_fixed() + theme_void() + theme(panel.background = element_rect(fill = 'black', colour = 'black')) + scale_fill_manual(values=c25,name="Individual type")
5) Make individuals points, not squares
If you want to get a bit more fancy, you can also not use geom_raster, but use geom_point. When you make the point-size slightly bigger so that the circles overlap, this will even help to make the difference between individuals and empty space much more clear.
# Plot individuals as dotsplot1 <- griddat %>% filter(val > 0, x<= 70, y<=70) %>%ggplot(aes(x=x,y=-y,col=as.factor(val)),pch=19) + geom_point(size=0.5) + coord_fixed() + theme_void() + theme(panel.background = element_rect(fill = 'black', colour = 'black')) + scale_color_manual(values=c25,name="Individual type") +guides(colour = guide_legend(override.aes = list(size=4)))plot2 <- griddat %>% filter(val > 0, x<= 70, y<=70) %>%ggplot(aes(x=x,y=-y,col=as.factor(val)),pch=19) + geom_point(size=1.3) + coord_fixed() + theme_void() + theme(panel.background = element_rect(fill = 'black', colour = 'black')) + scale_color_manual(values=c25,name="Individual type") + guides(colour = guide_legend(override.aes = list(size=4)))grid.arrange(plot1,plot2,nrow=1)
Especially when visualising the entire 400x400 grid, using geom_point can help to make a crispier image:
Fair warning: be careful with these fancy tricks. Make sure you aren't misrepresenting the data in any important way. Aesthetics are not as important as clarity and accurate representation of your data.
6) Combine multiple variable/time points and make nice videos
If you have multiple data sets, for example of different time points and different variables for each individuals, you can go all the way and make this into a movie. You can simply use ggsave to store PNGs, and use your favourite video editor to convert this into a movie. Below is an example where I did just that:
Wrapping up
I hope these tips are useful for anyone struggling with the visualisation of these types of data. If you want to play with this, the Rscript and the data table are attached at the top of the page.