Meta-predictor of disease causing variants

Meta-SNP has been trained and tested using a 20-fold cross-validation procedure on a set of 35,766 variations from 8,667 proteins (SV-2009) extracted from the Swiss-Var database (Oct. 2009). The SV-2009 dataset is composed by 17,883 disease-related mutations and the same number of randomly selected polymorphisms. In the cross-validation procedure, proteins are clustered using the blastclust algorithm in the BLAST package, and keeping in the same set all the variations belonging to the same cluster of similar sequences. The SV-2009 dataset can be downloaded from this link.

An additional dataset composed by 972 newly annotated variats in a recent version of Swiss-Var (Feb. 2012) from 577 proteins (NSV-2012) has been used to test Meta-SNP. The list of NSV-2012 variations is available here.