Motif Description (back to Introduction)


Motif description

The sequence upstream of the genomic loci from which 21U-RNAs derive contain two characteristic motifs. The first motif is 34nt long and contains many nucleotides whose frequencies seriously deviate from that of the surrounding genomic sequence. At the core of this "large" motif is an 8nt core whose nucleotide identities have the strongest preference: CTGTTTCA. The second motif is small (6nt) and encompasses the genomic position corresponding to the 5p end of the derivative 21U-RNA. The core of this motif is the 4nt sequence YRNT, where Y is any pyrimidine, R is any purine, N is any nucleotide identity, and T encodes the U at the 5p end of a 21U-RNA. These two motifs are separated in the genome by ~20nt.

The scoring of a 21U-RNA upstream motif requires the combination of three scores: one for the match to the large motif, one for the match to the small motif, and one for the distance between the two. For a particular genomic locus, several scores are possible as the distance between the small and large motif is varied. The score returned by the module provided here is the maximum of the available scores. For scoring a small RNA that mapped to the genome, that is the small motif score (placing the small motif appropriately over the 5p end of the small RNA) added to the maximum sum of distance score and large motif score for all allowed distances upstream of the small motif that the large motif can be placed. For predicting 21U-RNAs, an alternative operation can be performed with all possible placements of the small motif being evaluated given a high-scoring placement for the large motif. This provides a sense for which 21nt sequence is most likely to be expressed as a 21U-RNA as a consequence of the presence of the large motif. However, one instance of the large motif is sometimes observed to generate expression of multiple 21U-RNAs.

The motif scores were calculated based on nucleotide frequencies at each position upstream of each small RNA that mapped with a perfect match to a 21U-RNA-rich portion of C. elegans chromosome IV. The apparent preference for 5p U and 21nt length among this class of RNAs was noted and used to define the relevant portions of chromosome IV from which to gather counts, but neither was explicitly required. Also, pseudocounts were added to each nucleotide frequency that reflected the approximate background nucleotide frequencies (34% A, 34% T, 16% G, 16% C); the number of pseudocounts was set to be the square root of the number of real counts. The score at each position is the base-2 log of the quotient of the foreground frequency + background pseudocounts divided by the background frequency + background pseudocounts.




Scoring matrices

Large motif
index       A        T        C        G
-58       0.3131   0.1347  -0.5629  -0.7366
-57       0.4728  -0.1317  -0.4571  -0.6599
-56       0.7073  -0.4912  -0.2761  -1.1751
-55       0.873   -0.4691  -1.1599  -1.4133
-54       0.9184  -0.4481  -2.1781  -1.1359
-53       0.7061  -0.1089  -1.6642  -1.0082
-52       0.4139   0.2455  -0.7254  -1.7616
-51       0.3161   0.3869  -1.2342  -1.3193
-50      -0.3188   0.8022  -1.2406  -1.2599
-49      -0.5645   0.4823   0.356   -0.8247
-48      -0.8621   0.6782  -0.0328  -0.5081
-47       0.8555  -1.0092  -0.5023  -0.6366
-46      -1.2509  -2.5166   2.2827  -3.0608
-45      -1.3367   1.1333  -1.2534  -1.6096
-44      -2.8371  -2.5685  -3.4609   2.4604
-43      -4.7455   1.4669  -1.9733  -4.5691
-42      -4.5691   1.5215  -4.7756  -5.4173
-41      -4.0972   1.4883  -3.3194  -3.9734
-40      -4.9661  -2.0251   2.4806  -3.6686
-39       1.2954  -2.3261  -3.0274  -1.0356
-38       0.2009  -0.7823   0.6757  -0.037
-37       0.1633  -0.4343   0.089    0.3034
-36       0.3299   0.1125  -1.1842  -0.2486
-35       0.0246   0.5508  -0.8933  -1.1935
-34      -0.7823  -0.3903  -0.0136   1.2646
-33      -1.1758   0.9036  -0.5569  -0.6116
-32      -1.774    0.955    0.1216  -1.247
-31      -0.8132   0.1107   0.5214   0.3905
-30       0.5013   0.2815  -2.4461  -1.0721
-29      -2.7576   1.016    0.2013  -1.0329
-28       1.0106  -0.477   -3.8514  -1.4241
-27       0.4982   0.5254  -3.3884  -3.4026
-26       0.9992  -1.2692  -1.2959  -0.4871
-25       0.5258   0.1859  -1.866   -1.0028


Small motif
index       A        T        C        G
-4       -0.3847   0.4554  -0.1808  -0.2749
-3       -2.172    0.7762   0.922   -2.0323
-2        0.969   -3.459   -3.3102   0.845
-1        0.0342   0.2174  -0.4393  -0.2077
 0       -4.7791   1.475   -3.1639  -2.7011
 1       -0.0594  -1.0039   0.3189   0.9288


Distance
distance   score
16      -5.9606915688
17      -3.6775367302
18      -2.64130835064
19       1.15932881765
20       1.99214182316
21       1.30328543478
22      -0.337753314521
23      -2.4875316342
24      -3.8335895998
25      -5.26133885557



Contributing counts

Large motif
index     A       T       C       G
-58     2792    2464     710     628    
-57     3122    2044     765     663    
-56     3678    1587     869     460    
-55     4129    1612     465     388    
-54     4262    1636     223     473    
-53     3675    2077     324     518    
-52     2996    2663     633     302    
-51     2798    2940     441     415    
-50     1792    3930     439     433    
-49     1507    3143    1354     590
-48     1221    3604    1031     738
-47     4079    1100     741     674
-46      926     369    5184     115   
-45      871    4951     435     337   
-44      290     355      84    5865  
-43       57    6246     259      32    
-42       68    6488      26      12    
-41      105    6340      94      55    
-40       45     530    5948      71    
-39     5543     425     118     508
-38     2581    1292    1693    1028
-37     2514    1652    1123    1305
-36     2825    2426     457     886
-35     2281    3297     562     454
-34     1292    1704    1045    2553  
-33      977    4218     713     686   
-32      636    4372    1149     437   
-31     1264    2423    1520    1387  
-30     3185    2731     183     495   
-29      308    4562    1215     509   
-28     4545    1603      61     385   
-27     3178    3239      89      88    
-26     4509     914     422     749   
-25     3240    2554     280     520   


Small motif
index        A      T      C      G
-4        1784   3216    969    907 
-3         497   4025   2097    259 
-2        4604    187     99   1987
-1        2395   2723    808    951 
 0          58   6551    111    158 
 1        2243   1152   1376   2107


Distance
distance  counts
25             6
24            28
23            82
22           388
21          1225
20          1979
19          1108
18            73
17            32
16             1