Runtime Structure (back to Introduction)


Criteria/Features

Criteria/features are aspects of a hairpin that can be examined in order to gain insight as to whether or not it is a miRNA precursor, such as "number of unpaired nucleotides" or "miRNA 5p nucleotide identity". These are represented following evaluation of a criteria file as entries in a dictionary, fdict, in the global environment. Each feature has a unique name (keys in the dictionary) and is represented as an instance of either string_feature or number_feature, two classes that are implemented in the mirscanModule.py.

Each instance of either feature class requires that the user add two attributes:

There are also four class-defined attributes that satisfy the requirements of a feature interface. They are: The user may give a Feature instance additional attributes to attach any other values that are necessary for the feature's evaluation.

The global environment created by a criteria file also contains the variable mirscan which is bound to a function for the evaluation of miRNA hairpin candidates in terms of the features implemented in fdict. If the user wants any pre-processed information about the hairpin candidate to be passed to the feature.fx functions, such as a secondary structure or an alignment, then the user must include those data as values in the dictionary args that will be passed to all of the feature.fx functions in lieu of specialized sets of arguments.




Training

Abstractly, training comprises an evaluation of a series of foreground and a series of background miRNA hairpins. To this end, mirscanTrainer.py creates three data structures: 1) a set of features to be evaluated, 2) a set of foreground hairpins (the training set; samples of real miRNA hairpins), and 3) a set of background candidate hairpins. Background hairpins will be scored later, and this set is represented as a list of instances of the Candidate class that is implemented in mirscanModule.py. The arguments taken by the constructor are:

The foreground miRNA hairpins also have the start positions of the miRNAs defined in a parallel list. Each item in this list is a dictionary whose keys are the same as orgToSeq and whose values are integers corresponding to the miRNA 5p nucleotide positions in the corresponding hairpin (indexed from 0). The index of each Candidate in the candidates list equals the index of the corresponding start position dictionary in the starts list.

The number of candidates for which each possible value for each feature is returned is kept track of for each dataset (the foreground dataset and the background dataset) in a two-tiered dictionary structure whose first (outer) keys are the names of features, whose second (inner) keys are the items of that feature's Feature.kl list, and whose values are the corresponding hairpin counts.




Scoring

The runtime structure during scoring is very similar to that of training. The mirscan function that is implemented in the criteria file scores a set of candidate miRNA hairpins using a scoring matrix that is represented as a two-tiered dictionary structure whose first (outer) keys are the names of features, whose second (inner) keys are the items of that feature's Feature.kl list, and whose values are the floating-point scores that will be added for that value being returned by the function.