snps&GO
 
cornerL
home
methods
help
contact
cornerR
 
 

SNPs&GO
Predicting disease associated variations using GO terms


Benchmark
SNPs&GO has been trained and tested using a 20-fold cross-validation procedure on a set of 38,460 variations from 9,067 proteins (SAP-SEQ) extracted from the Swiss-Var database (Oct. 2009).
The SAP-SEQ dataset is composed by 19,230 disease-related mutations and the same number of randomly selected neutral polymorphisms. In the cross-validation procedure, proteins are clustered using the blastclust algorithm in the BLAST package, and keeping in the same set all the variations belonging to the same cluster of similar sequences.
The SAP-SEQ dataset can be downloaded from this link. An additional dataset composed by 1,494 variants from 274 proteins (SAP-NEW) not included in SAP-SEQ have been used to test our tool. The list of SAP-NEW variations is available here.

The Gene Ontology (GO) terms are extracted from the gene_association.goa_human file and their parents are retrieved using GO-TermFinder package.


 
 
cornerL
cornerR