18-JUN-2012 dbSNP currently supports VCF v4.0 (http://www.1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcf4.0). The VCF files are available at ftp://ftp.ncbi.nih.gov/snp/organisms/human_9606/VCF The following are omitted from all VCF files: Variations listed as microsatellites or named variations Variations that are not mapped on the reference genome (GRCh37.x) Variations that are mapped to more than one location on the reference genome. (Weight > 1) Following are the files in the VCF main directory: ./00-All.vcf.gz VCF of all variations that meet the criteria to be in a VCF file. This file is created once per dbSNP build. common_all.vcf.gz VCF of all variations that are polymorphic in a least one population the 1000 Genomes project or any of the following handles: 1000GENOMES CSHL-HAPMAP EGP_SNPS NHLBI-ESP PGA-UW-FHCRC A variation is polymorphic if the minor allele frequency is at least 0.01 and the minor allele is present in at least two samples. clinvar_YYYYMMDD.vcf.gz VCF of variations from clinvar where 'YYYYMMDD' represents the date the file was created. This file is created weekly. common_and_clinical_YYYYMMDD.vcf.gz Variations from common_all.vcf.gz that are clinical. A clinical variation is one the appears in clinvar_YYYYMMDD.vcf.gz with at least one of the following clinical significance codes: 4 - probable-pathogenic 5 - pathogenic 6 - drug-response 7 - histocompatibility 255 - other This file is created weekly. common_no_known_medical_impact_YYYYMMDD.vcf.gz Variations from common_all.vcf.gz that do not meet the clinical criteria described above. This file is created weekly. clinvar_00-newest.vcf.gz common_and_clinical_00-newest.vcf.gz common_no_known_medical_impact_00-newest.vcf.gz Symbolic links of the lastest files described above that are created weekly. Following are the subdirectories of the VCF directory: PreviousWeekly Older versions of files that are created weekly ByChromosome VCF with genotypes and genotype freqencies listed by chromosome and population ID. example: 14-12162-MKK.vcf.gz ByChromosomeNoGeno VCF with genotype freqencies, but the genotypes are omitted. These are listed by chromosome and population ID. example: 14-12162-MKK-nogeno.vcf.gz ByPopulation VCF with genotypes and genotype freqencies listed by population and chromosome. These files are symbolic links to the files in 'ByChromosome' example: MKK-12162-14.vcf.gz ByPopulationNoGeno VCF with genotype freqencies, but the genotypes are omitted. These are listed by population and chromosome and are symbolic links to the files in ByChromosomeNoGeno. example: MKK-12162-14-nogeno.vcf.gz File naming convention example: 14-12162-MKK.vcf.gz 14 - Chromosome 12162 - dbSNP population ID - see http://www.ncbi.nlm.nih.gov/projects/SNP/snp_viewTable.cgi?pop=12162 or http://www.ncbi.nlm.nih.gov/projects/SNP/snp_tableList.cgi?fld=Population+handle&cond=contains&str=CSHL-HAPMAP&type=pop MKK - three letter population ID. For more information see http://ccr.coriell.org/sections/collections/NHGRI/?SsId=11 A note about the position. The RSPOS tag is the position of the SNP in dbSNP and the position reported in column 2 may differ from the RSPOS tag. All alleles for an INDEL or multi-byte SNP must begin with the same nucleotide and to accomplish this, the preceeding base pair is prefixed to each allele and the position of this base pair is reported. Also, if all of the alleles consist of the same repeated sequence or a deletion the beginning of the repeat is calculated and the preceeding base pair is reported. For example, if the variations are AT/ATAT/-, the position in column 2 is the location of the first repeat (AT) minus one. Following is a sample VCF header from ./ByChromosome/14-12162-MKK.vcf.gz ##fileformat=VCFv4.0 ##fileDate=20120604 ##source=dbSNP ##dbSNP_BUILD_ID=137 ##reference=GRCh37.p5 ##phasing=partial ##variationPropertyDocumentationUrl=ftp://ftp.ncbi.nlm.nih.gov/snp/specs/dbSNP_BitField_latest.pdf ##dbSNP_POP_ID=12162 ##dbSNP_LOC_POP_ID=HAPMAP-MKK ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO=SubSNP->Batch.link_out"> ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO=5% minor allele frequency in each and all populations"> ##INFO=5% minor allele frequency in 1+ populations"> ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##INFO= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FORMAT= ##FILTER= #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA21295 NA21297 NA21300 NA21301 NA21303 NA21307 NA21308 NA21311 NA21312 NA21314 NA21316 NA21318 NA21320 NA21333 NA21336 NA21339 NA21344 NA21352 NA21353 NA21355 NA21356 NA21357 NA21359 NA21360 NA21362 NA21363 NA21364 NA21365 NA21367 NA21368 NA21370 NA21371 NA21378 NA21379 NA21381 NA21382 NA21384 NA21385 NA21387 NA21388 NA21390 NA21391 NA21399 NA21400 NA21402 NA21403 NA21405 NA21408 NA21414 NA21415 NA21417 NA21418 NA21420 NA21421 NA21423 NA21424 NA21434 NA21435 NA21436 NA21438 NA21440 NA21441 NA21447 NA21448 NA21451 NA21453 NA21454 NA21457 NA21473 NA21475 NA21476 NA21478 NA21479 NA21485 NA21486 NA21488 NA21489 NA21491 NA21493 NA21509 NA21510 NA21512 NA21513 NA21517 NA21519 NA21520 NA21521 NA21522 NA21523 NA21524 NA21526 NA21528 NA21529 NA21573 NA21574 NA21575 NA21576 NA21577 NA21578 NA21580 NA21582 NA21583 NA21587 NA21596 NA21597 NA21599 NA21600 NA21611 NA21613 NA21614 NA21615 NA21616 NA21617 NA21619 NA21620 NA21631 NA21632 NA21634 NA21635 NA21647 NA21650 NA21678 NA21682 NA21683 NA21685 NA21686 NA21689 NA21693 NA21716 NA21717 NA21719 NA21722 NA21723 NA21733 NA21738 NA21739 NA21740 NA21741 NA21768 NA21776 NA21784 NA21825 NA21826