Dr. Cancer
Machine Learning-based Predictor of Cancer-causing SNPs

Last Update 18/06/12



Dr. Cancer:   Machine Learning-based Predictor of Cancer-causing SNPs

We developed a disease-specific machine learning approach to predict if a non-synonymous SNP is related to cancer. The implemented Support Vector Machine (SVM) method has been trained on a set of 3,163 cancer-causing mutations from 74 proteins. As a negative set we used differnts types of missense Single Nucleotide Variants (mSNVs) from SwissVar and generated in silico and previously used to train a method the descrimination between driver and passenger mutations. In particlular the CNO dataset was used for training and tested porpouse and is composed by 1,583 cancer-causing mutations and the same number of randomly selected Polymorphisms in SwissVar with allele frequency higher than 0.01 and sample count higher than 49. Only for testing porpopse we used the CND dataset that is similar to the CNO dataset but 1,582 polymorphism where replaced with disease-related mutations not associated to disease to MESH term neoplasm . Finally, in the Synthetic dataset negative subset has been generated in silico and used to test the CHASM algorithm (Carter et al. Cancer Research 2009 ).

The SVM input feature of the method are: i) the amino acid substitution, ii) the sequence environment, iii) the sequence profile information, and iv) a Gene Ontology (GO) based score. More details and The preformances of Dr. Cancer algorithm has been described in a per-reviewed paper (Capriotti and Altman, Genomics 2011).