Map data is a critical component in many aspects of genetics and biological research. Well defined toolkits for manipulating map data do not exist at this point, we propose to build a system for manipulating most types of map data (Genetic, RH, RFLP, Sequence, and LD). Map Proposal This document proposes an object heirarchy for maps, markers, and their manipulation. Key Points * A Map is an object which contains mapable elements. * A Map can be defined for a given organism or population of individuals. * A Mappable element is an element with a position within a map. Background information Maps are made up of elements which are mappable. This includes genetic and physical markers. A genetic map consists of markers which have a given recombination distance between them. This distance is usually given as centi-morgans or 1% recombination between them. Other distances include ... Examples of these are the publicly available Marshfield and Genethon maps. Radiation hybrid maps consist of markers which have been mapped to radiation hybrid panels. Typically these markers are STSes which have been processed on RH panels. The distance between markers is calculated in centi-Rads which represent . Examples of these include Whitehead STS, GeneMap '99. Restriction Enzyme (RE) maps are used to describe RE cut points in a given sequence and can be used to "fingerprint" sections of DNA (typically BAC clones). Clones which share a statitistically (based on known frequency of RE cutting) signifigant collection fingerprints are likely to overlap. Additionally Physical maps or BAC/PAC/YAC maps represent clone fragment overlap. These maps are used to to represent how clones overlap and form a consensus sequence of a genomic or cDNA region. Sequence maps represent the known consensus sequence for a given region of typically genomic DNA. LD and Haplotype maps ... Comparisions between maps from different organisms can yield useful observations about trends in evolution. Additionally comparisons of maps for the same species can provide insight into information such as recombination hot spots and DNA stability. Object proposal Maps are objects which are made up of mappable elements. A mappable element has a position on a map and can be tested for equality and relative position to other mappable element positions. These are some baseline interface and object definitions. Other work has been done by Philip Lijnzaad, Emmanuel Barillot and OMG folks to create definitions for maps. Interfaces Bio::IdentifiableI string getID // unique identifier -- this goes with Juha's // identifiable property? Bio::NameableI string getName Bio::AliasableI isa Bio::NameableI string getAliases Bio::Map::MapI isa Bio::NameableI isa Bio::Identifiable MapIterator getAllElements // for in-order iterator access) ?Bio::ChromosomeI? chromosome // Should maps be build one per // chromosome aggregated for // a whole report set. Bio::SpeciesI species // use existing BP species object // which may need to be more robust numeric length // not sure what to return for // relative or RFLP maps string units // Map units string name // Map Name Bio::Map::MappableI // Where to handle the fact that RFLP // Markers have multiple Map positions PositionI position(MapI) boolean equals(MappableI) boolean less_than(MappableI) boolean greater_than(MappableI) Bio::Map::PositionI // may be undef to handle relative maps [RE]. // This is where a known position for a marker can be retrieved // Multiple positions are possible for RE on a sequence map Array positionValues Bio::MarkerI isa Bio::MappableI isa Bio::AliasableI // heikki to help fill in Variant and Allele information Bio::LiveSeq::AlleleI Bio::LiveSeq::VariantI isa Bio::MarkerI Bio::PrimarySeqI getFwdPrimer() Bio::PrimarySeqI getRevPrimer() // I assume there should always be a primary set of // of markers which defined start/end points // should this be hidden inside more methods to // handle RFLP, etc? Bio::LiveSeq::AlleleI getAlleles() Implementations Bio::Marker::RestrictionEnzyme isa Bio::MarkerI Bio::Marker::STS isa Bio::MarkerI Bio::Marker::Microsat isa Bio::LiveSeq::VariantI Bio::Marker::CytogeneticBand isa Bio::MarkerI Bio::Marker::VLTR isa Bio::MarkerI Bio::Marker::SNP Bio::Bin Bio::Map::Cytogenetic isa Bio::Map::MapI Bio::Map::RadiationHybrid Bio::Map::Genetic Bio::Map::GeneticMap string getSex // code as a string? - only Bio::Map::RFLP Bio::Map::Sequence // Should probably be Bio::Assembly or these two // need to work together Sequence Map could be // be built with Bio::Assemblies Bio::Map::Haplotype // what would this entail -- SNP components? Caveats, questions, etc ----------------------- Namespace is very flexible here. An important useful result of this toolkit will be the ability to programatically go from one map to another. So Querying Maps for a marker - perhaps based on that marker's unique id will allow on to compare distances on different maps or go from genetic to sequence maps very easily. Not sure if we should be doing a Bio::ChromosomeI or can just code with a string/numeric? Does Polyploidy cause any problems in maps or just in population/allele issues?