snps&GO
 
cornerL
home
methods
help
contact
cornerR
 
 

SNPs&GO
Predicting disease associated variations using GO terms


Benchmark
SNPs&GO has been trained and tested using a 20-fold cross-validation procedure on a set of 38,460 variations from 9,067 proteins (SAP-SEQ) extracted from the Swiss-Var database (Oct. 2009).
The SAP-SEQ dataset is composed by 19,230 disease-related mutations and the same number of randomly selected neutral polymorphisms. In the cross-validation procedure, proteins are clustered using the blastclust algorithm in the BLAST package, and keeping in the same set all the variations belonging to the same cluster of similar sequences. The SAP-SEQ dataset can be downloaded from this link.

The structure-based SNPs&GO3d algorithm, has been trained and tested using a 20-fold cross-validation procedure on a set of 6,630 mutations from 784 protein chains (SAP-3D) from the PDB (Oct. 2009).
The SAP-3D dataset is composed by 3,342 disease associated v and the 1,644 neutral variations. To balance the composition of the dataset the reverse mutations of neutral polymorphisms are also considered. in the dataset also the reverse mutation of the In the cross-validation procedure proteins are clustered using blastclust algorithm in the blast package, and keeping in the same set all the mutations belonging to the same cluster of sequences.
The SAP-3D dataset can be downloaded for this link.

An additional dataset composed by 1,489 variants from 271 proteins (SAP-NEW) with known structures has been used to test both SNPs&GO and SNPs&GO3d. The list of SAP-NEW variations is available here.

The Gene Ontology (GO) terms are extracted from the gene_association.goa_human file and their parents are retrieved using GO-TermFinder package.


 
 
cornerL
cornerR