|
SNPs&GO
Predicting disease associated variations using GO terms
|
|
Benchmark
SNPs&GO has been trained and tested using a 20-fold cross-validation procedure on a set of 38,460
variations from 9,067 proteins (SAP-SEQ) extracted from the
Swiss-Var database
(Oct. 2009).
The SAP-SEQ dataset is composed by 19,230 disease-related mutations and the same number
of randomly selected neutral polymorphisms.
In the cross-validation procedure, proteins are clustered using the blastclust algorithm in the
BLAST package, and keeping in the same set all the variations belonging to the same cluster of similar sequences.
The SAP-SEQ dataset can be downloaded from this
link.
An additional dataset composed
by 1,494 variants from 274 proteins (SAP-NEW) not included in SAP-SEQ have been used to test our tool.
The list of SAP-NEW variations is available here.
The Gene Ontology (GO) terms are extracted from
the gene_association.goa_human file and their parents are retrieved using
GO-TermFinder package.
|
|
|