cleanUrl: "clonevol-usage"
description: "Clonal evolution의 흐름을 시각화하는 도구인 clonevol 사용법을 알아봅니다."

ClonEvol

ClonEvol is a package for clonal ordering and clonal evolution visualization. It uses the clustering of heterozygous variants identified using other tools as input to infer consensus clonal evolution trees and estimate the cancer cell fraction.

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/f5639fa1-51b0-4875-8786-2ca35d49afc9/_2021-06-01__3.14.54.png

Installation

install.packages('devtools')
library(devtools)
install_github('hdng/clonevol')
install.packages('gridBase')
install.packages('gridExtra')
install.packages('ggplot2')
install.packages('igraph')
install.packages('packcircles')
install_github('hdng/trees')

Manual

https://raw.githubusercontent.com/hdng/clonevol/master/vignettes/clonevol.pdf

Input data

ClonEvol requires an input data frame consisting of at least a cluster column and one or more variant cellular prevalence columns, each corresponds to a sample. The cluster should be named by contiguous integer numbers, starting from 1. For better visualization, the names of the cellular prevalence columns should be short.

# Shorten vaf column names as they will be.
vaf.col.names = grep('.vaf', colnames(x), value=T)
sample.names = gsub('.vaf', '', vaf.col.names)
x[, sample.names] = x[, vaf.col.names]
vaf.col.names = sample.names

# Prepare sample grouping.
sample.groups = c('P', 'R')
names(sample.groups) = vaf.col.names

# Setup the order of clusters to display in various plots (later)
x = x[order(x$cluster), ]

필요한 column들 (PyClone의 경우 loci별로 build_table 해야 한다.)

<aside> 💡 Input data 만들 때 매우 중요함!!! Cluster ID를 부여할 때, subclone size가 작아지는 순서대로 1, 2, 3, ... 으로 부여해야 tree가 제대로 그려진다.

</aside>

Visualizing the variant clusters

The following code will plot the clustering results for you to investigate. It will plot the cellular prevalence of the variants across clusters and samples, using jitter, box and violin plots to allow close investigation of the clustering. This plot is very powerful as it can visualize lots of samples and clusters at once.

pdf('box.pdf', width = 3, height = 3, useDingbats = FALSE, title='')
pp <- plot.variant.clusters(x,
       cluster.col.name = 'cluster', # Cluster index를 나타내는 colname.
       show.cluster.size = FALSE,
       cluster.size.text.color = 'blue',
       vaf.col.names = vaf.col.names, # VAF를 나타내는 colname들. 여기서는 P,R
       vaf.limits = 70,
       sample.title.size = 20,
       violin = FALSE,
       box = FALSE,
       jitter = TRUE,
       jitter.shape = 1,
       jitter.color = clone.colors,
       jitter.size = 3,
       jitter.alpha = 1,
       jitter.center.method = 'median',
       jitter.center.size = 1,
       jitter.center.color = 'darkgray',
       jitter.center.display.value = 'none',
       highlight = 'is.driver', # Boolean column으로 highlight.
       highlight.shape = 21,
       highlight.color = 'blue',
       highlight.fill.color = 'green',
       highlight.note.col.name = 'gene', # Highlight되는 row표시를 어떤 col로 할지.
       highlight.note.size = 2,
       order.by.total.vaf = FALSE)
dev.off()

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/6bbe8cce-9088-48d8-bbba-941f514d2123/_2021-06-01__3.23.06.png