ContrastRank - A new method for tumor samples classification.

Input

ContrastRank is method to score whole exome sequencing and predict risk for cancer. The server requires to upload in input a VCF file either in normal or zipped format. The server also require to provide the number of the column in the VCF file where the phenotype is reported. In case you have multiple samples in the same VCF we expect the number of the column be higher than 9.
For standard TCGA files where normal and primary tumor samples are paired together their phenotype are usually reported in columns 10 and 11 respectively. The server is expecting to receive a VCF file which first 9 columns contains the following information: Chromosome, Position, rsID (when available), reference allele, alternative allele, quality filter, info and format.
Other needed input are the type of cancer (at the moment we have only colon, lung and prostate adenocarcinoma) and type of filtering you apply for the selection of single nucleotide variants. The "PASS" option will select only putative defective variants (PDV: non-synonymous single nucleotide variants with alleale frequency lower than 0.5%) that have the classified as "PASS" in the filter column of the VCF file. If the the option "None" is selected all the PDV will be selected without considering any Quality filter. The option "Recover" has bee designed to recover germline variants from the VCF file provided by Broad Institute which filtered them out. Using this option the server will select non-synonymous PDV with average base quality (BQ) for reads supporting alternative alleles higher then 30 and fractions of reads (FA) higher than 0.05. Finally, if provided, the server can return the output by email.

Output

When a VCF file is downloaded, it is annotated using ANNOVAR (Wang et al., Nucleic Acids Research, 2010). After the annotation the file is filter selecting only non-synonymous single nucleotide variants (SNVs) with allele frequency lower than 0.5% in sample from 1000 Genomes project. This procedure allows to select the putative defective variants (PDVs) that are used to define the putative impaired genes (PIGs) with ContrastRank score ≥3 (see methods).
The list of PIGs is finally used to calculate the global score associated to the whole sample. A threshold obtained optimizing the correlation coefficient in the classification of normal and tumor samples is used to discriminate between High and Low Risk mutation pattern associate to the whole exome. To the prediction is associated a false discovery rate (FDR) value that is extrapolated fitting an extreme value distribution over the average level FDRs obtained using a bootstrap procedure.
In the web interface of the server information about the pathways associated to the selected cancer type is reported. The shown network is obtained merging together the information extracted from KEGG database and the list of protein-protein interactions in human extracted from Reactome pathway database. Different colors for the edges are used to discriminate association (in orange) from physical association (in gray).
In the bottom part of the page a table reporting all the putative defective variants (PDVs) with ContrastRank score associated to their genes is included.

Example of the analysis of a randomly generated vcf file for normal and tumor samples are reported at following links:

Normal sample:

VCF File:	colon_random.vcf.gz
Genotype:	10
Tumor:	CRC (Colon Adenocarcinoma)
Filter:	PASS
Output:	output.html

Tumor sample:

VCF File:	colon_random.vcf.gz
Genotype:	11
Tumor:	CRC (Colon Adenocarcinoma)
Filter:	PASS
Output:	output.html