GNIFdb

Help

Neoantigens are antigens that are not expressed in normal tissues but only expressed in tumor tissues, including antigens produced by oncogenic viruses integrated into the genome and antigens produced by mutant proteins. Without negatively screened by thymus, neoantigen has high specificity and strong immunogenicity. Because virus-mediated tumors account for only a small part of all tumor types, neoantigens derived from mutations become the most ideal target for immunotherapy. We developed GNIFdb, a database collected neoantigen information of different cancer and provided several bio-information tools, to meet the urgent demand of neoantigens in cancer immunotherapy. Users can upload their own neoantigen data, which can be showed in GNIFdb after being reviewed by back-end personnel.

GNIFdb contains information on four different cancer neoantigens from multiple datasets, mainly neoantigen information from glioma. GNIFdb provides gene expression data of TCGA dataset and survive time of glioma neoantigens. For each neoantigen, we calculated its wild type peptide score and mutant peptide score by netMHCpan 4.0. The glioma neoantigen from TCGA is divided into 19 different subtypes by molecular profile and histological criteria. The sunset figure shows different subtypes' distribution of TCGA dataset, while users can click on different targets to understand their relationship.

We calculated amino acid descriptors(protFP, blosumIndice, cruciani Properties, FASGAI, MSWHIM, kidera Factor, stScales, T-scale, zScales, VHSE) and physical-chemical properties (aliphatic, auto-correlation, auto-covariance, Boman index, theoretical net charge, cross-covariance, hydrophobic moment, hydrophobicity, instability, molecular weight,Tiny, Small, Aliphatic, Aromatic, Non-polar, Polar, Charged, Basic, Acidic) in different positons of each neoantigen peptide. The relationship between these peptide scores and their pathological properties might be found. The tree map shows all peptide features provideed in GNIFdb, while users can click the nodes to view their classification.

Search
Search Result

The search result include basic information about gene name, chromosome, position, hla-allele, mutation, WT peptide and MT peptide, while the glioma data contains survive period. Users can use the condition selector to sort the search results or search further, and click the gene name to view the detailed information of neoantigen.

Visualization

Jbrowse2 is used to visualize neoantigen information in GNIFdb. Reference sequence and five annotations from public datasets are provided. Users can select their interest diseases or subtypes, and their position and information will be showed in the window.For each gene in gene expression data, the bars from top to bottom indicate the survival period from short to long. Users can click each gene expression bar or neoantigen object to get their detail information. The window size can be zoomed by sliding the zoom bar on the top. More usage methods can refer to https://jbrowse.org/jb2/.

Browse

Users can browse different diseases and different datasets in this page. Specially, for glioma data form TCGA, we provide detailed classification and summary. The subtype classification is showed by the rectangular tree diagram, which can be clicked to view the subtype page. At the bottom, we provide the distribution of survival time and the enrichment of HLA in various subtypes, and the distribution boxplot of mutant type peptide score and Dai score(wild type peptide score - mutant type peptide score). In the HLA enrichment heatmap, the frequency of neoantigens decreased from upper to bottom, and the frequency of HLA allele decreased from left to right. The left top corner shows the HLAs with top binding capacity to neoantigens for glioma. The wlid type peptide score and mutant type peptide score are both calculated by netMHCpan 4.0.

In each subtype page, in addition to all the neoantigen information, we provide the relationship heatmap between neoantigens' located genes and neoantigens' binding HLAs. The frequency of genes decreased from upper to bottom, and the frequency of HLA allele decreased from left to right. The left top corner shows the HLAs with top binding capacity to neoantigens for glioma. Circos plots of gene mutation and amino acid mutation are also provided.The direction of the arrow shows mutation information.


Neoantigen

In each neoantigen's detail page, users can get the neoantigen's basic information, gene expression data and peptide features.

For AA discriptors(protFP, blosumIndice, cruciani Properties, FASGAI, MSWHIM, kidera Factor, stScales, T-scale, zScales, VHSE), the feature name is divided into two parts, while the first part is the feature label in the discriptor, the second part is the peptide of this score and the calculated position in brackets. For example, protFP1 MT.pep (i,i+1,i+2) means protFP descriptor 1,calculated by the aa of mutant position and the two positions after it. Generally, 'i' means the mutant position.


AA features
Physical-Chemical properties
Type
The amount of amino acids of a particular was classcalculated and classified as: Tiny, Small, Aliphatic, Aromatic, Non-polar, Polar, Charged, Basic and Acidic based on their size and R-groups. The output is the number of amino acids of a particular class.
AIndex
The Ikai (1980) aliphatic index of a protein. The AIndex is defined as the relative volume occupied by aliphatic side chains (Alanine, Valine, Isoleucine, and Leucine). It may be regarded as a positive factor for the increase of thermostability of globular proteins.
Aliphatic amino acids (A, I, L and V) are responsible for the thermal stability of proteins. The aliphatic index was proposed by Ikai (1980) and evaluates the thermostability of proteins based on the percentage of each of the aliphatic amino acids that build up proteins.
Auto-correlation
The Cruciani et al (2004) auto-correlation index. The autoCorrelation index is calculated for a lag 'd' using a descriptor 'f' (centred) over a sequence of length 'L'.
Auto-corvariance
The Cruciani et al (2004) auto-corvariance index. The autoCovariance index is calculated for a lag 'd' using a descriptor 'f' (centred) over a sequence of length 'L'.
Boman index
The potential protein interaction index proposed by Boman (2003) is computed based in the amino acid sequence of a protein. The index is equal to the sum of the solubility values for all residues in a sequence, it might give an overall estimate of the potential of a peptide to bind to membranes or other proteins as receptors, to normalize it is divided by the number of residues. A protein have high binding potential if the index value is higher than 2.48.
Cross-covariance
The Cruciani et al (2004) cross-covariance index. The lagged crossCovariance index is calculated for a lag 'd' using two descriptors 'f1' and 'f2' (centred) over a sequence of length 'L'.
Hydrophobic moment
The hydrophobic moment based on Eisenberg, D., Weiss, R. M., & Terwilliger, T. C. (1984). Hydriphobic moment is a quantitative measure of the amphiphilicity perpendicular to the axis of any periodic peptide structure, such as the a-helix or b-sheet. It can be calculated for an amino acid sequence of N residues and their associated hydrophobicities Hn.
Hydrophobicity
The hydrophobicity is an important stabilization force in protein folding; this force changes depending on the solvent in which the protein is found. The hydrophobicity index is calculated adding the hydrophobicity of individual amino acids and dividing this value by the length of the sequence.
Theoretical net charge
The net charge of a protein sequence based on the Henderson-Hasselbalch equation described by Moore, D. S. (1985). The net charge can be calculated at defined pH using one of the 9 pKa scales availables: Bjellqvist, Dawson, EMBOSS, Lehninger, Murray, Rodwell, Sillero, Solomon or Stryer.
Instability
The instability index proposed by Guruprasad (1990). This index predicts the stability of a protein based on its amino acid composition, a protein whose instability index is smaller than 40 is predicted as stable, a value above 40 predicts that the protein may be unstable.
Molecular weight
The molecular weight is the sum of the masses of each atom constituting a molecule. The molecular weight is directly related to the length of the amino acid sequence and is expressed in units called daltons (Da).
Entropy
Shannon entropy of the selected sequence.
AA properties
The intrinsic features derived from the number of times each amino acid appearing in the mutant peptide calculated based on mutation position and amino acid changes at mutation position.
AA Descriptors
crucianiProperties
The three Cruciani et. al (2004) properties, are the scaled principal component scores that summarize a broad set of descriptors calculated based on the interaction of each amino acid residue with several chemical groups (or "probes"), such as charged ions, methyl, hydroxyl groups, and so forth.

PP1: Polarity
PP2: Hydrophobicity
PP3: H-bonding
zScales
The five Sandberg et al. (1998) Z-scales describe each amino acid with numerical values, descriptors, which represent the physicochemical properties of the amino acids including NMR data and thin-layer chromatography (TLC) data.

Z1: Lipophilicity
Z2: Steric properties (Steric bulk/Polarizability)
Z3: Electronic properties (Polarity / Charge)
Z4: Related to electronegativity, heat of formation, electrophilicity and hardness
Z5: Related to electronegativity, heat of formation, electrophilicity and hardness
FASGAI
Factor Analysis Scale of Generalized Amino Acid Information (FASGAI) proposed by Liang and Li (2007), is a set of amino acid descriptors, that reflects hydrophobicity, alpha and turn propensities, bulky properties, compositional characteristics, local flexibility, and electronic properties, was derived from multi-dimensional properties of 20 naturally occurring amino acids.

F1: Hydrophobicity index
F2: Alpha and turn propensities
F3: Bulky properties
F4: Compositional characteristic index
F5: Local flexibility
F6: Electronic properties
VHSE
The principal components score Vectors of Hydrophobic, Steric, and Electronic properties, is derived from principal components analysis (PCA) on independent families of 18 hydrophobic properties, 17 steric properties, and 15 electronic properties, respectively, which are included in total 50 physicochemical variables of 20 coded amino acids.

VHSE1 and VHSE2: Hydrophobic properties
VHSE3 and VHSE4: Steric properties
VHSE5 to VHSE8: Electronic properties
kideraFactors
A list with the average of the ten Kidera factors. The first four factors are essentially pure physical properties; the remaining six factors are superpositions of several physical properties, and are labelled for convenience by the name of the most heavily weighted component.

KF1: Helix/bend preference
KF2: Side-chain size
KF3: Extended structure preference
KF4: Hydrophobicity
KF5: Double-bend preference
KF6: Partial specific volume
KF7: Flat extended preference
KF8: Occurrence in alpha region
KF9: pK-C
KF10: Surrounding hydrophobicity
T-scales
T-scales are based on 67 common topological descriptors of 135 amino acids. These topological descriptors are based on the connectivity table of amino acids alone, and to not explicitly consider 3D properties of each structure.
ProtFP
The ProtFP descriptor set was constructed from a large initial selection of indices obtained from the AAindex database for all 20 naturally occurring amino acids.
ST-scales
ST-scales were proposed by Yang et al, taking 827 properties into account which are mainly constitutional, topological, geometrical, hydrophobic, elec- tronic, and steric properties of a total set of 167 AAs.
MS-WHIM scores
MS-WHIM scores were derived from 36 electrostatic potential properties derived from the three-dimensional structure of the 20 natural amino acids.
BLOSUM
BLOSUM indices were derived of physicochemical properties that have been subjected to a VARIMAX analyses and an alignment matrix of the 20 natural AAs using the BLOSUM62 matrix.
Citation: Osorio D, Rondon-Villarreal P, Torres R (2015). “Peptides: A Package for Data Mining of Antimicrobial Peptides.” The R Journal, 7(1), 4-14. ISSN 2073-4859.