Code
library(dplyr)
library(tidyr)
library(stringr)
library(sf)
library(ggplot2)
library(patchwork)A challenging problem
David O’Sullivan
February 13, 2026
Recently in preparing a manuscript about weavingspace for publication concerns about the colour schemes used in some of the figures were raised because they were not colour blind friendly. Seeking to address that concern, one thing led to another, and here I am.
I am in no way an expert on this topic. Colour blindness is relatively well understood and characterised, as resulting from anomalous (or a complete lack of) cone cells among the long (L), medium (M), or short (S) wavelength sensitive cones. The L- and M-cones are most sensitive to colour either side of cyan in the spectrum and issues with these result in two forms of red-green colour blindness known respectively as protanopia and deuteranopia. Blue-yellow colour-blindness, tritanopia results from anomalous or absent S-cones sensitive to blue colours. The more severe achromatopsia results in an ability only to differentiate colours by their lightness. As we will see the effects of these different colour vision deficiencies (CVDs) can be simulated.
In spite of this good understanding, colour palettes readable by those with CVDs have only come to widespread attention recently. That said, there are now many options in R for exploring the readability of graphics under different CVDs. In the way of R, perhaps a few too many, making it a little difficult to pick a path through things.
My explorations suggest that the most straightforward path to tackling design challenges in this area runs through the cols4all package, although in this post I use a couple of other packages besides. First the usual suspects:
And also three specifically colour-related packages:
cvd_grid() that can simulate how an existing plot appears under different colour vision deficiencies.
palette_dist to estimate differences among colours in a palette.
The colorspace package provides the underlying ‘machinery’ to drive all of this, supporting ways to manipulate colours, such as darkening or lightening them, increasing or decreasing saturation, and, particularly relevant here, simulating different CVDs. colorblindr and colorblindcheck wrap the colorspace functionality up in various ways making some low-level operations more convenient. Finally, cols4all collates numerous palettes from many sources, and provides tools you can use to explore their usefulness under difference CVDs.
There has been a proliferation of packages for work on this topic. Not included here but also encountered along the way were scico, and khroma. These have their uses, but it’s easy to wind up with an R environment awash with conflicting function names, or at the very least with too many different options to keep track of. For that reason if no other, once you’ve explored the options, I highly recommend using cols4all: it does everything you are likely to need in routine map production. In particular, its interactive GUI, invoked by c4a_gui() allows you to filter, select, and examine the characteristics of the many colour schemes it supports, including from the perspective of various colour vision deficiencies. A good guide to how to proceed—with much more background information than I can offer—is available here.
To make the code chunks for the maps less cluttered we make a basemap with a very pale blue for ocean and white for land. When coloured areas are overlaid on this, NA values will show through in white.
land <- st_read("land.gpkg")
bb <- st_bbox(land)
xr <- bb[c(1, 3)]
yr <- bb[c(2, 4)]
basemap <- ggplot() +
geom_sf(data = land, fill = "white", colour = "black", lwd = 0.5) +
theme_void() +
theme(
panel.border = element_rect(fill = NA, colour = "black"),
panel.background = element_rect(fill = lighten("lightblue", 0.8), colour = NA))
frame <-
coord_sf(xlim = xr, ylim = yr, expand = FALSE) Here is the resulting base map.
The data are three different simple demographic classifications of census tracts in the San Francisco Bay Area into 5, 10, and 15 demographic clusters, loosely based on data assembled by Luc Guillemot. See this page for more background. The clusters have been stored as integer values, so we convert them to factors that will be correctly associated with discrete (i.e., qualitative) colour schemes by ggplot2.
Rows: 1,546
Columns: 4
$ k5 <fct> 5, 4, 4, 1, 1, 2, 3, 3, 4, 3, 4, 1, 1, 4, 4, 4, 4, 4, 3, 4, 2, 3,…
$ k10 <fct> 5, 5, 10, 9, 9, 3, 2, 2, 10, 2, 5, 9, 9, 5, 10, 10, 1, 1, 1, 1, 8…
$ k15 <fct> 7, 7, 8, 6, 15, 10, 12, 12, 12, 14, 7, 15, 15, 7, 12, 12, 12, 3, …
$ geom <MULTIPOLYGON [m]> MULTIPOLYGON (((1844733 650..., MULTIPOLYGON (((1842…
So, let’s make a map of the five-cluster solution, using the default colour fill provided by ggplot2.
This is not even that great a selection of colours for normal colour vision(!), with the second and third, and third and fourth, clusters easily confused. If it’s bad for normal vision, it’s terrible for CVD viewers, as the colorblindr::cvd_grid() function reveals (click on the image for a closer look).
Under red-green conditions in the top two simulated maps, the colours break down to two hard to differentiate groups, while the blue-yellow colour blindness simulation in the third panel is a little better, but not by much. The achromatopsia simulation hints at the problem which lies in the default colour scheme deliberately picking a set of similar lightness hues evenly spaced around the colour wheel. Using cols4all::c4a_plot_cvd() we can see the problem with this palette in close-up.
We have to use the scales::pal_hue() function to access the five-colour version of ggplot2’s default categorical scheme. On paper, the default scheme in ggplot2 seems like a good idea. The problem becomes clearer if we inspect what happens with a larger number of classes.
Extended to 25 colours the approach becomes essentially ‘spectral’ and it is impossible to avoid sections of the spectrum where different CVDs struggle. Similar colours are inevitable (even with normal vision), and whole regions of the spectrum where some colour sensitivity is lost under different CVDs become indistinguishable.
Cindy Brewer’s colour palettes have been widely adopted, and were a huge leap forward when they were introduced. As a member of the cartographic tribe, I am a loyal user,1 but unfortunately one area where the categorical palettes in this series fall down is colour blindness friendliness. To show this, I’ve made my own function to show the CVD simulations of multiple palettes in a single plot (click into the code below for a closer look.2)
modes <- ordered(
c("Normal", "Deuteranopia", "Protanopia", "Tritanopia", "Achromatopsia"),
levels = c("Normal", "Deuteranopia", "Protanopia", "Tritanopia", "Achromatopsia"))
get_cvd_colours <- function(cols, mode) {
chooser <- str_to_lower(str_sub(mode, 1, 4))
if (chooser == "deut") {
return(deutan(cols))
} else if (chooser == "prot") {
return(protan(cols))
} else if (chooser == "trit") {
return(tritan(cols))
} else if (chooser == "achr") {
return(desaturate(cols))
} else {
return(cols)
}
}
my_c4a_plot_cvd <- function(pals, cb_modes = modes, n = NULL) {
dfs <- list()
for (pal in pals) {
cols <- c4a(pal)
if (!is.null(n)) { cols <- c4a(pal)[1:n] }
ncols <- length(cols)
dfs[[length(dfs) + 1]] <-
expand.grid(x = 1:ncols, y = cb_modes) |>
mutate(
hex = cb_modes |>
lapply(get_cvd_colours, cols = cols) |> unlist(), pal = pal)
}
ggplot(bind_rows(dfs)) +
geom_tile(aes(x = x, y = y, fill = hex), height = 0.92) +
scale_y_discrete(limits = rev) +
scale_fill_identity() +
facet_wrap( ~ pal, scales = "free_x") +
theme_void() +
theme(axis.text.y = element_text(hjust = 1, size = 8),
strip.text = element_text(vjust = 1))
}deutan, protan, tritan, and desaturate are from the colorspace package.
n = NA will display all colours in the palette; if a value is supplied only that number of colours will be shown.
Here’s what we get for the Brewer categorical palettes.
It’s apparent from this display that the Brewer palettes won’t save us. We can hand-pick a five colour subset from a palette for present purposes, guided by the simulations. For example, colours 1, 2, 4, 5, and 7 of the brewer.accent palette might work (disregarding achromatopsia):
Again, click on the image for a closer look.
Picking our way through palettes ‘by hand’ like this, while do-able, is far from easy, and rapidly becomes more difficult if we want to map more categories. This is where the cols4all::c4a_gui() tool for interactively selecting palettes and exploring their properties comes into its own.
In the static form of this post, the best approximation to this I can provide is a tabulation of palettes that might suit our purposes. We can create this from the scores that cols4all associates with its palettes, using the c4a_scores() function, and some filtering and sorting.
fullname n cbfriendly
1 cols4all.friendly5 5 2.01766
2 parks.kings_canyon 6 2.01676
3 cols4all.friendly7 7 2.01502
4 cols4all.friendly9 9 1.01252
5 tol.medium 6 1.01208
6 parks.cuyahoga 6 1.01179
7 tol.muted 10 1.01165
8 parks.halekala 5 1.01157
9 cols4all.area7d 7 1.01117
10 cols4all.area7 7 1.01115
11 cols4all.friendly11 11 1.01112
12 parks.death_valley 7 1.01095
13 met.new_kingdom 5 1.01046
14 cols4all.friendly13 13 1.01006
15 cols4all.line7 7 1.01005
16 cols4all.area8d 8 0.00997
17 met.lakota 6 0.00991
18 met.moreau 7 0.00960
19 cols4all.line8 8 0.00954
20 tableau.classic_color_blind 10 0.00944
There are many more metrics in the data, than the number of colours (n) and a measure of their colour blind friendliness (cbfriendly), but to keep things manageable, I’ve selected only those. The interactive tool provides complete information. In the GUI tool you would see a display more like this:
Many of the palettes in our first cut of the data allow for more categories than the requested five. I think that palettes that accommodate only five colours, or only one or two more than that, are likely to make better use of colour space than more expansive ones, which necessarily have more colours, more similar to one another. That’s why the table has been sorted on both cbfriendly and increasing n.
At this point, having found some possible schemes suited to our purposes it makes sense to examine them more closely. Using the cols4all GUI this would be an interactive process. Here, a more limited list of favoured schemes throws up eight plausible options, which we can plot with my CVD simulation wrapper function.
contrastWT indicates schemes that contrast well with a white background.
Of these five candidates, I like cols4all.friendly5, so here goes.
It’s perhaps notable that the colours in this scheme are not dissimilar to those hand-picked from the brewer.accent scheme in Figure 7. The parks and met series which feature in these lists are new to me, and some of them seem to have potential for striking colourblind friendly maps.
For the ten class case we need to impose a more restrictive requirement on the palettes we consider. For a start, they need to have at least 10 ten different colours! It turns out that only three categorical schemes in the cols4all collection meet this requirement.
This time, in addition to a couple more of cols4all home-brew colourblind friendly schemes3 there is a scheme by Paul Tol, named here as tol.muted. Full details of this family of palettes can be found at Paul Tol’s website.
Here’s ten classes in our data mapped using the tol.muted scheme.
Ten classes seems to be at about the limit of what can be reliably mapped using categorical colour schemes. The CVD simulations in Figure 11 clearly show that some of the colours in those palettes are very close to one another. It may make sense in a case like this to consider which colours from the palette might be best mapped to which clusters. This requires close analysis of which clusters most often neighbour one another, something I explore in outline a bit further on.
Categorical colour schemes that are colour blind friendly run out of steam around 10 classes. The options here demand a bit more thought. One possibility is to think carefully about the classification scheme itself: perhaps it can be simplified? Alternatively, it may be possible to organise it into a meaningful hierarchy. Having done so, colours can be assigned to upper levels in the hierarchy and lightness variations on these to classes within each. Geological maps are an example of this kind of design, but great care and long experience are required to make this work. Even then, with a complex classification colour blindness friendly outcomes will be difficult to achieve.
A more rough and ready option is to use a sequential colour ramp, known to be colour blind friendly, and apply it as a discrete palette with the required number of classes.
A map like this seems unlikely to be useable for reliable identification of clusters, even for those with normal vision, but crucially, using a colour blind friendly sequential palette like this, should mean it is not any less useable for those with colour vision deficiencies. With this many classes, it’s probable that a different mapping approach entirely4 should be considered.
This section is highly speculative.
I mentioned above, in relation to Figure 12 that analysis of which clusters neighbour one another in a map might be a useful adjunct to the process of applying a suitable colour scheme. By assigning colours from a scheme that are the most different from one another to the most common neighbourhood relations in the data, it may be possible to make maps more readable than the essentially random association of clusters to colours shown previously. This section explores one way we might proceed.
To count neighbouring relations among clusters, we can use a joins count function, spdep::joincount.multi. spdep is a very useful but rather obtuse package. We construct a list of weights between neighbouring pairs of areas in our data, using the poly2nb and listw functions, then supply it together with the values of interest to the joincount.multi function. Up to now the spatial data has extended well beyond the map area. For this analysis we restrict the areas to those in the mapped area using st_intersection.
Here are a few rows of the result.5
Joincount Expected Variance z-value
6:1 0 46.04542 40.42028 -7.2424714
6:2 0 49.04839 42.92622 -7.4862338
6:3 19 44.04345 38.74080 -4.0235537
6:4 14 57.05629 49.53079 -6.1178512
6:5 51 59.05826 51.16423 -1.1265697
7:1 36 21.78493 20.00984 3.1778045
7:2 12 23.20569 21.25719 -2.4304439
7:3 18 20.83776 19.17435 -0.6480603
Only the Joincount column is of interest here.6 It tells us how many ‘joins’ i.e., neighbouring pairs of areas, have the corresponding classes, so for example, while there are no neighbouring pairs of members of cluster 6 with clusters 1 or 2, there are 51 neighbouring pairs of cluster 6 with cluster 5.
Prioritising common pairings for symbolisation by the most different pairs of colours is the goal. To get there one approach is to convert this list of join counts into a matrix:
row_col_jc <- jc |>
data.frame() |>
select(1) |>
slice(-n()) |>
tibble::rownames_to_column(var = "pair") |>
separate(pair, c("row", "col")) |>
mutate(row = as.integer(row),
col = as.integer(col))
m_jc <- xtabs(Joincount ~ col + row, row_col_jc) |>
unname()
m_jc <- m_jc + t(m_jc) - diag(diag(m_jc + t(m_jc)))
m_jc [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 63 1 8 9 0 36 40 5 25
[2,] 63 0 0 6 7 0 12 19 12 16
[3,] 1 0 0 73 11 19 18 5 4 3
[4,] 8 6 73 0 33 14 49 6 0 11
[5,] 9 7 11 33 0 51 16 12 11 72
[6,] 0 0 19 14 51 0 4 0 66 18
[7,] 36 12 18 49 16 4 0 24 1 33
[8,] 40 19 5 6 12 0 24 0 2 7
[9,] 5 12 4 0 11 66 1 2 0 24
[10,] 25 16 3 11 72 18 33 7 24 0
Next, we need a similar matrix reflecting differences among the colours in the palette we plan to use. The colourblindcheck::palette_dist() function provides exactly this, although the raw output needs some light post-processing to convert NAs in its upper triangular distance matrix to a symmetric distance matrix. This is what the get_colour_distances() wrapper I’ve written does. Meanwhile, we also have to measure distances for the palette viewed normally, and also under three CVD simulations, and take the minimum distance across all four results.7
get_colour_distances <- function(pal) {
m_cols <- palette_dist(pal) |>
round(0) |>
c() |>
replace_na(0) |>
matrix(length(pal), length(pal))
m_cols + t(m_cols)
}
pal <- c4a("tol.muted")
m_cols <- get_colour_distances(pal)
for (mode in modes[2:4]) {
m_cols <- pmin(
m_cols,
get_colour_distances(get_cvd_colours(pal, mode)))
}
m_cols [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 34 23 18 29 22 14 15 12 24
[2,] 34 0 65 24 54 12 41 43 13 62
[3,] 23 65 0 30 38 48 22 15 34 16
[4,] 18 24 30 0 31 19 20 16 35 40
[5,] 29 54 38 31 0 43 12 35 25 15
[6,] 22 12 48 19 43 0 27 35 13 46
[7,] 14 41 22 20 12 27 0 22 18 16
[8,] 15 43 15 16 35 35 22 0 26 22
[9,] 12 13 34 35 25 13 18 26 0 35
[10,] 24 62 16 40 15 46 16 22 35 0
Now, we want to know what permutation, or swapping of rows and columns in this matrix will make it as much like the joins count matrix as possible. There is likely no efficient algorithm for this process in general as the number of possible permutations increases rapidly with the size of the matrices. With ten colours there are factorial 10 or 3,628,800 possibities, so we content ourselves with trying 10,000 permutations and retaining the best one. For smaller problems we could carry out an exhaustive search, while for larger ones we would need a better heuristic to guide the search. This seems likely to be a known problem, but in the time I allowed for preparing this post, I couldn’t find an exact solution.8
The GraphAlignment and pracma packages provide the needed functions here. I also made a rough and ready function for calculating a difference between two matrices based on the Frobenius norm of their difference. The Frobenius norm is analogous to the magnitude of a vector, but for matrices.
library(GraphAlignment)
library(pracma)
d_m1_m2 <- function(m1, m2) { norm((m1 - m2), "F") }
size <- ncol(m_cols)
d12 <- d_m1_m2(m_jc, m_cols)
min_d12 <- d12
min_perm <- 1:size
for (i in 1:10000) {
perm <- randperm(1:10, 10)
d12 <- d_m1_m2(m_jc, Permute(m_cols, perm))
if (d12 < min_d12) {
min_d12 <- d12
min_perm <- perm
}
}There is nothing very clever here: we simply iterate over 10,000 random permutations, and retain the permutation associated with the least difference between the two matrices that we encounter. We can then use this to permute the colours in our palette, applied using scale_fill_manual.
map6 <- basemap +
geom_sf(data = gdf, aes(fill = k10), colour = "grey") +
scale_fill_manual(values = pal[min_perm]) +
guides(fill = "none") +
frame
(map4 + ggtitle("Original map")) /
(map6 + ggtitle("Permuted colour scheme")) /
plot_spacer() /
cvd_grid(map6) + plot_layout(heights = c(2, 2, 0.1, 2.5))I don’t know if the permuted map is better than the original. I do know that while developing this post, I have seen many different permuted maps, because this crude implementation is certainly not finding a unique ‘best’ permutation. Even so, this seems like an interesting avenue for further investigation.
One area where the example shown seems like an improvement is in downtown San Francisco where two very similar colours were juxtaposed in the original map, and are not in the permuted colour scheme map, although it’s highly likely that there are other map areas where the comparison is less favourable.
While design for colour vision deficiency—like all design!—remains challenging, it’s a lot easier than I realised to pay attention to this important consideration in making choices for colour use in maps. cols4all does a great job of supporting this in the R ecosystem. It’s worth knowing too that you can export colour schemes from cols4all to Python and web friendly formats, so even if you are not a regular user of R you might find this tool useful.
With periods of my career at Penn State and also Berkeley, I have an affinity for both the Brewer palettes, and the Viridis palettes.↩︎
I wrote this code only because I couldn’t get any of the various available packages to label their simulated plots with the palette name as a title… a weird and surprisingly annoying omission!↩︎
Developed using the cols4all tools!↩︎
Or, as noted, a different classification entirely.↩︎
I am showing only a few rows of the joins count result, because there are 55 pairwise combinations.↩︎
The others are relevant to assessment of spatial autocorrelation.↩︎
My understanding is that this an important element in cols4all’s assessment of colour blindness friendliness.↩︎
This comes close, but permutes only columns, not rows and columns together.↩︎