Package Bio :: Module SeqFeature :: Class SeqFeature
[hide private]
[frames] | no frames]

Class SeqFeature

source code

object --+
         |
        SeqFeature

Represent a Sequence Feature on an object.

Attributes: o location - the location of the feature on the sequence (FeatureLocation) o type - the specified type of the feature (ie. CDS, exon, repeat...) o location_operator - a string specifying how this SeqFeature may be related to others. For example, in the example GenBank feature shown below, the location_operator would be "join" o strand - A value specifying on which strand (of a DNA sequence, for instance) the feature deals with. 1 indicates the plus strand, -1 indicates the minus strand, 0 indicates both strands, and None indicates that strand doesn't apply (ie. for proteins) or is not known. o id - A string identifier for the feature. o ref - A reference to another sequence. This could be an accession number for some different sequence. o ref_db - A different database for the reference accession number. o qualifiers - A dictionary of qualifiers on the feature. These are analagous to the qualifiers from a GenBank feature table. The keys of the dictionary are qualifier names, the values are the qualifier values. o sub_features - Additional SeqFeatures which fall under this 'parent' feature. For instance, if we having something like:

CDS join(1..10,30..40,50..60)

The the top level feature would be a CDS from 1 to 60, and the sub features would be of 'CDS_join' type and would be from 1 to 10, 30 to 40 and 50 to 60, respectively.

To get the nucleotide sequence for this CDS, you would need to take the parent sequence and do seq[0:10]+seq[29:40]+seq[49:60] (Python counting). Things are more complicated with strands and fuzzy positions. To save you dealing with all these special cases, the SeqFeature provides an extract method to do this for you.

Instance Methods [hide private]
 
__init__(self, location=None, type='', location_operator='', strand=None, id='<unknown id>', qualifiers=None, sub_features=None, ref=None, ref_db=None)
Initialize a SeqFeature on a Sequence.
source code
 
__repr__(self)
A string representation of the record for debugging.
source code
 
__str__(self)
A readable summary of the feature intended to be printed to screen.
source code
 
_shift(self, offset)
Returns a copy of the feature with its location shifted (PRIVATE).
source code
 
extract(self, parent_sequence)
Extract feature sequence from the supplied parent sequence.
source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, location=None, type='', location_operator='', strand=None, id='<unknown id>', qualifiers=None, sub_features=None, ref=None, ref_db=None)
(Constructor)

source code 

Initialize a SeqFeature on a Sequence.

location can either be a FeatureLocation (with strand argument also given if required), or a Python slice (with strand given as the step).

e.g. With no strand, on the forward strand, and on the reverse strand:

>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> f1 = SeqFeature(FeatureLocation(5,10), type="domain")
>>> f2 = SeqFeature(FeatureLocation(7,110), strand=1, type="CDS")
>>> f3 = SeqFeature(FeatureLocation(9,108), strand=-1, type="CDS")

An invalid strand will trigger an exception:

>>> f4 = SeqFeature(FeatureLocation(50,60), strand=2)
Traceback (most recent call last):
   ...
ValueError: Strand should be +1, -1, 0 or None, not 2

For exact start/end positions, an integer can be used (as shown above) as shorthand for the ExactPosition object. For non-exact locations, the FeatureLocation must be specified via the appropriate position objects.

Overrides: object.__init__

__repr__(self)
(Representation operator)

source code 

A string representation of the record for debugging.

Overrides: object.__repr__

__str__(self)
(Informal representation operator)

source code 

A readable summary of the feature intended to be printed to screen.

Overrides: object.__str__

_shift(self, offset)

source code 

Returns a copy of the feature with its location shifted (PRIVATE).

The annotation qaulifiers are copied.

extract(self, parent_sequence)

source code 

Extract feature sequence from the supplied parent sequence.

The parent_sequence can be a Seq like object or a string, and will generally return an object of the same type. The exception to this is a MutableSeq as the parent sequence will return a Seq object.

This should cope with complex locations including complements, joins and fuzzy positions. Even mixed strand features should work! This also covers features on protein sequences (e.g. domains), although here reverse strand features are not permitted.

>>> from Bio.Seq import Seq
>>> from Bio.Alphabet import generic_protein
>>> from Bio.SeqFeature import SeqFeature, FeatureLocation
>>> seq = Seq("MKQHKAMIVALIVICITAVVAAL", generic_protein)
>>> f = SeqFeature(FeatureLocation(8,15), type="domain")
>>> f.extract(seq)
Seq('VALIVIC', ProteinAlphabet())

Note - currently only sub-features of type "join" are supported.