Finding deleterious SNP in dog genome

A binary classifier for predicting pathogenic variants in coding and non-coding regions.

Server Input

Fido-SNP server takes in input a list of variants in two different formats. The formats (CSV and VCF) require the genomic location of the variants and the nucleotide change.

  • CSV: The simplest input format uses comma separated values which indicates the chromosome (chr), the position (pos), the reference (ref) and alternative (alt) alleles as follows: chr,pos,ref,alt (see example 1).

  • VCF: The variants can be also provided in a VCF like format that requires at least 5 columns (chr,pos,id,ref,alt) separated by spaces. When the id is not available, it can be replaced by a dot character (see example 2).

For all the input formats, each variant is provided in a separated row of the textarea box. In addition, all the input variants can be uploaded as a single file in text or zipped format. For formatting reasons, it is recommended to avoid the cutting and paste of VCF input data which can be easily upload as zip file. Before the submission of the process please select the appropriate assembly of the dog genome on which the genomic location are expressed.

To prevent the submission of large processes, a maximum number of 1,000 variants for each job is allowed. For predicting the impact of a larger of variants, please install Fido-SNP on your local machine. The information for the local installation of Fido-SNP are reported in method page.

Server Output

After the submission of the job, Fido-SNP server returns a link to a web page that is automatically refreshed every 20 seconds. A cgi script checks what is the status of the job in the queue system and when the job is terminate return a static html page with the predictions. In case the page is closed, the output of your job can be retrieved using the JobID and the form in the Job web page
Independently from the input format, Fido-SNP displays in the html output the same information. For each variant, the prediction row includes the following data: the chromosome, the position, the reference/alternative alleles, the prediction, the score, the false discovery rate, the PhyloP11 score on the mutated site and the average value of the PhyloP11 score on a 5-nucleotides window sequence centred around the mutated site. An example of the output table is reported below.
In each row a green button open a window where more information about the variant are reported. In particular, if the variants is in the coding region, the code of the largest NCBI transcript, the UniProt ID of the gene, the strand and the effect of the nucleotide change are included. The annotation process is performed using the annovar package. A text format of Fido-SNP output can be downloaded through a web link. The output file is a tab separated VCF-like file that includes the Fido-SNP predictions and the annotation from the annovar output.
The standard output of the standalone Fido-SNP program, which does not include the annovar annotation, is described in the next section.

Standalone Package Output

The output of the standalone package does not include the annovar annotation. An example of the Fido-SNP output is provided below.

        Fido-SNP returns in output:

        PREDICTION: Pathogenic or Benign
        SCORE: a probabilistic score between 0 and 1. If the score is >0.5 the variants is predicted to be Pathogenic.
        FDR: The false discovery rate associated to higher SCORE.
        PhyloP11: PhyloP11 in the mutated position.
        AvgPhyloP11: Average value of PhyloP11 in a 5-nucleotide window around the mutated position.

        The scores added as extra columns to the input file. An example of output is reported below.

	1	15189413	C	G	Yes	Pathogenic	0.515	0.075	-0.027	0.317
	5	34700967	T	A	Yes	Benign	0.145	0.242	0.766	0.678
	9	54071528	T	C	Yes	Benign	0.327	0.246	-0.402	0.26