PhD-SNPg server takes in input a list of variants in three different formats.
Two formats (CSV and VCF) require the genomic location of the variants and the nucleotide change.
The MUT format, which requires a list of amino acid changes, allows to map the residue substitution to the
corresponding nucleotide variants. The three input formats are summarized as follows:
CSV: The simplest input
format uses comma separated values which indicates the chromosome (chr), the position (pos), the
reference (ref) and alternative (alt) alleles as follows: chr,pos,ref,alt
(see example 1).
VCF: The variants can be also provided in a VCF like format that requires
at least 5 columns (chr,pos,id,ref,alt) separated by spaces. When the id is not available, it can be replaced
by a dot character (see example 2).
MUT: The effect of single amino acid variants can be predicted providing in input the 2 columns including the gene symbol and the mutation (gene,mutation) separated by comma
(see example 3).
For all the input formats, each variant is provided in a separated row of the textarea box.
In addition, all the input variants can be uploaded as a single file in text or zipped format. For formatting
reasons, it is recommended to avoid the cutting and paste of VCF input data which can be easily upload as
zip file. Before the submission of the process please select the appropriate assembly of the human genome on
which the genomic location are expressed.
To prevent the submission of large processes, a maximum number of 1,000 variants for each job is allowed. For
predicting the impact of a larger of variants, please install PhD-SNPg on your local machine.
The information for the local installation of PhD-SNPg are reported in method page.
After the submission of the job, PhD-SNPg server returns a link to a web page
that is automatically refreshed every 20 seconds. A cgi script checks what is the status of the job in the
queue system and when the job is terminate return a static html page with the predictions. In case the page
is closed, the output of your job can be retrieved using the JobID and the form in the Job web page
Independently from the input format, PhD-SNPg displays in the html output the same information.
For each variant, the prediction row includes the following data:
the chromosome, the position, the reference/alternative alleles, the prediction, the score, the false discovery
rate, the PhyloP100 score on the mutated site and the average value of the PhyloP100 score on a 5-nucleotides
window sequence centred around the mutated site. An example of the output table is reported below.
In each row a green button open a window where more information about the variant are reported. In particular, if
the variants is in the coding region, the code of the largest NCBI transcript, the UniProt ID of the gene, the
strand and the effect of the nucleotide change are included. The annotation process is performed using
the transvar package.
A text format of PhD-SNPg output can be downloaded
through a web link. The output file is a tab separated
VCF-like file that includes the PhD-SNPg predictions and the annotation from the
The standard output of the standalone PhD-SNPg program, which does not include the
is described in the next section.
Standalone Package Output
The output of the standalone package does not include the
annotation. An example of the PhD-SNPg output is provided below.
PhD-SNPg returns in output:
PREDICTION: Pathogenic or Benign
SCORE: a probabilistic score between 0 and 1. If the score is >0.5 the variants is predicted to be Pathogenic.
FDR: The false discovery rate associated to higher/lower SCORE.
PhyloP100: PhyloP100 in the mutated position.
AvgPhyloP100: Average value of PhyloP100 in a 5-nucleotide window around the mutated position.
The scores added as extra columns to the input file. An example of output is reported below.
#CHROM POS REF ALT CODING PREDICTION SCORE FDR PhyloP100 AvgPhyloP100
1 10042376 C G Yes Pathogenic 0.814 0.079 -0.159 3.412
1 197094291 C T Yes Pathogenic 0.988 0.023 7.304 4.071
2 31751295 G A Yes Pathogenic 0.913 0.053 1.810 2.674
2 71797809 C T Yes Pathogenic 0.998 0.023 1.181 3.699
2 179577870 T C Yes Benign 0.004 0.007 -6.363 2.997
5 74046464 C T Yes Benign 0.009 0.021 -0.070 5.860