-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathparallel_coord_plot.Rmd
More file actions
89 lines (62 loc) · 2.55 KB
/
parallel_coord_plot.Rmd
File metadata and controls
89 lines (62 loc) · 2.55 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
---
title: "Parallel Coordinate Plot"
output:
html_document:
css: style.css
theme: lumen
highlight: pygments
---
```{r include=F}
opts_chunk$set(cache = T)
```
The required library for this plot are **ggplot2**, **dplyr**, **knitr**, **stringr**, **rmarkdown** and **data.table**.
```{r 'load library', echo=FALSE}
suppressMessages(require(ggplot2))
suppressMessages(require(dplyr));suppressMessages(require(stringr))
suppressMessages(require(knitr))
suppressMessages(require(data.table))
suppressMessages(require(rmarkdown))
```
Populate some dummy datasets.
```{r}
data = data.frame(
'l1' = sample(LETTERS[1:4], 20, T),
'l2' = sample(LETTERS[5:10], 20, T),
'l3' = sample(LETTERS[15:18], 20, T),
'cluster' = sample(1:4, 20, T))
# a helper function to convert an object name into literal string
var_to_name <- function(x){
deparse(substitute(x))
}
# print out the data frame
kable(data, caption = var_to_name(data))
```
Next we do some massaging on the datasets
```{r 'data massage'}
# generate id for each row so that we can group the row in line plot
data$id <- seq(nrow(data))
# melt the data frame by keeping id and cluster only
data.melt <- melt(data, id.vars = c('id', 'cluster'))
# the value column is actually the label value in each column, so we rename it as 'label'
names(data.melt)[grepl('value', names(data.melt), fixed = T) %>% which] = 'label'
# change the label to numeric type so that we can plot it in y-axis, it would be our data label
data.melt$value <- as.factor(data.melt$label) %>% as.numeric
dt <- data.table(data.melt)
dt[, variable.x:=as.factor(variable)]
kable(head(dt), caption = var_to_name(dt))
# scale the value group by its column name and use a new column 'scaled' to store it
dt[, scaled.y:=scale(value), variable]
# create the dataset for text labelling
data.label <- dt[, .(variable.x, label, scaled.y)] %>% distinct
```
Start to plot the graph.
```{r 'plotting'}
g = ggplot(dt, aes(x = variable.x, y = scaled.y)) + geom_point(
aes(color = as.factor(cluster))) + geom_line(aes(group = id, color = as.factor(cluster)))
# vectorize str_wrap function so that we can wrap the text in label column if its too long
wraptxt <- Vectorize(str_wrap, vectorize.args = 'string')
# plot the label
g1 = g + geom_text(data = data.label, aes(x = variable.x, y = scaled.y, label = wraptxt(label, width = 10)),
family = 'Panton', hjust = 'left', nudge_x = .02, size = 3)
g1 + scale_color_discrete(name = 'cluster') + theme_bw() + theme(legend.key = element_rect(color = NA))
```