Download CanProVar Data
CanProVar provides the download of human protein database (Ensembl v69) in the fasta format, in which variation information is recorded in the header line of each sequence.

The README file explains the contents of the following files:

Total
  Description   Protein(FASTA) Statistics
Validated dbSNP_nsSNPs variation information from validated coding SNPs dbSNP_validated_nsSNP_protein 822,259
Cancer related_nsSNPs mutations that have been reported in cancer samples cancer_nsSNP_protein 65,199
Both nsSNPs from both the dbSNP_validated* file and the cancer_* file all_nsSNP_protein 885,747


The csSNPs of each cancer type
Cancer Name Statistics Protein(FASTA)
Adrenal Gland Neoplasms 16
Biliary Tract Cancer 55
Brain Cancer 257
Breast Cancer 13036
breast ductal carcinoma 23
Central Nervous System Neoplasms 2432
Colorectal Cancer 1142
Esophageal Cancer 37
Gastric Cancer 665
Head and Neck Cancer 7381
Hepatocellular Carcinoma 2050
Intestines Cancer 1518
Leukemia 49
acute lymphocytic leukemia 6
acute myeloid leukemia 8
chronic lymphocytic leukemia 201
chronic myeloid leukemia 21
Liver Cancer 26
Lung Cancer 2937
Lymphoma 1178
Melanoma 4314
Myeloproliferative Disorders 59
Neoplasms by Histologic Type 1666
Non-small cell lung carcinoma 61
Oral Cancer 3
Ovarian Cancer 18703
Pancreatic Cancer 4554
pancreatic ductal adenocarcinoma 11
Pituitary Carcinoma 13
Prostate Cancer 145
Renal Cancer 820
Sarcoma 31
Skin Cancer 5437
Small cell lung carcinoma 35
Testicular Cancer 20
Thyroid Carcinoma 182
follicular thyroid carcinoma 4
Urinary Bladder Cancer 32
Uterine Cancer 271


Download MS-CanProVar Data
MS-CanProVar (version 2.0) is a protein sequence database that includes variation information to facilitate peptide variant detection in shotgun proteomics. In the .fasta file, each variant peptide is included as an independent entry; variations are annotated in the header line; variations are labeled as "rs" for SNPs and "cs" for cancer-related mutations. Please refer to A bioinformatics workflow for variant peptide detection in shotgun proteomics. Li et al., MCP, 2011 for details about the MS-CanProVar database. The current version of MS-CanProVar is based on Ensembl V68.




Dr. Jing Li's Group

©2012 Menghuan Zhang, Jing Li