import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
print("Hello World!")Hello World!
Jonas Jäger
April 15, 2026
This algorithm was published in 2007 by BAHARAK RASTEGARI and ANNE CONDON in the JOURNAL OF COMPUTATIONAL BIOLOGY (DOI: 10.1089/cmb.2006.0108).
Pseudoknots are a class of RNA secondary structures that are found in all three domains of life as well as viruses Wikipedia.
Prediction of pseudoknotted structures from a given sequence is a very, very difficult problem and found to be NP-hard. Prediction of non-pseudoknotted structures can be done via a dynamic programing approach in \(O(N^3)\) time via e.g Nussinov or Zuker’s algorithm.
In this blog post, we simply want to parse a given RNA molecule with its already predcited structure (pseudoknotted or non-pseudoknotted) and build a Tree-of-closed-regions of the structure.
First we need to make sure python is running by importing some packages.
A RNA secondary structure can be shown in different ways. One common representation is the dot-bracket notation, where each base pair is represented with a opening ( and closing bracket ). Nucleotides that are unbound are displayed as simple dots ..
This representation needs to be extended with more opening and closing characters when we want to deal with pseudoknots. There, we introduce the extended dot-bracket notation that uses these supported bracket types () [] {} <> Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj.
Lets see a test example:
A PairTable is essentially a base pair mapping function bp(), that maps a nucleotide at a specific index to its base pairing partner.
For example: in the structure ..[[..]] the base at index 2 (zero-based indexing) pairs with base at index 7. Therefore, bp(2) = 7 and vice verca bp(7) = 2.
In oder to get from the extended dot-bracket notation to a PairTable representation define a function dot_bracket_to_pairtable().
First, we define our valid character sets (opening, closing and unpaired) and make a mapping dictionary that maps every opening/closing token to its corresponding closing/opening kind (e.g round, square, curly etc).
Example: When we encounter a opening angle <, our mapping dictionary OPEN_TO_KIND[<] maps to the angle kind. Equivalently, CLOSE_TO_KIND[>] maps also to the angle kind. This means every opening/closing token pair maps the same kind.
Additionally, we need to keep different bracket types apart, so we can handle also pseudoknotted structures. Therefore, the basic idea is to maintain a seperate stack for each bracket kind.
A stack is an abstract data type that operates on the LIFO (Last In, First Out) principle. It relies on two primary operations: push, which adds an element to the top of the stack, and pop, which removes the most recently added element. Because of the LIFO principle, the last item placed onto the stack is always the first one to be taken off. For more information, see Wikipedia.
We then loop over every character of the input structure and store the index i, and the character ch. We then check for three things:
ch in the unpaired token setch in the opening token setch in the closing token setch is in the unpaired token set, we just continue with our iteration.ch is in our opening token set, we push the corresponding index i onto the stack with the corresponding key kind.ch is in the closing set, we look if the key kind is in the stack. If it is not in the stack, we know that we encountered a closing bracket without a opening bracket, which is not a valid secondary structure. If kind is indeed in the stack, we pop the last added index and have found our base pairing indices and write the indices into our pair table.At the end the stack must be empty, because every opening bracket must pair with a closing bracket. So we check this with short iteration over the stack items. If the stack is not empty, we know there is an unmatched opening bracket.
If no errors occur, the pair table is returned.
.get() method is similar to the pop method, but does not actually remove the object from the stack but just inspects it..append() method in our code corresponds to a push method, as we use a list as a data structure for our abstract data type stackdef dot_bracket_to_pairtable(structure: str) -> list[int | None]:
"""
Convert an extended dot-bracket string to a pair table representation.
In the pair table, each index contains either:
- None if the position is unpaired
- int the 0-based index of the paired partner
Supported bracket types:
() [] {} <> Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj
Raises:
ValueError on unrecognised characters, unmatched closing brackets,
or unclosed opening brackets.
"""
# Map every closing token to its matching opening kind
OPEN_TO_KIND = {
"(": "round",
"[": "square",
"{": "curly",
"<": "angle",
**{ch: ch.lower() for ch in "ABCDEFGHIJ"}, # uppercase = open
}
CLOSE_TO_KIND = {
")": "round",
"]": "square",
"}": "curly",
">": "angle",
**{ch: ch.lower() for ch in "abcdefghij"}, # lowercase = close
}
UNPAIRED = set(".")
# initialze stacks and pairtable
stacks: dict[str, list[int]] = {}
pairtable: list[int | None] = [None] * len(structure)
# loop over every character in the structure
for i, ch in enumerate(structure):
if ch in UNPAIRED:
continue
elif ch in OPEN_TO_KIND:
kind = OPEN_TO_KIND[ch]
stacks.setdefault(kind, []).append(i)
# .setdefault() adds the key "kind" with a default list empty "[]"
# if the key "kind" is missing, and then appends the list with index i
# .append() is equivalent to the push operation of a stack
# (because we take a list as a data structure for our stack)
elif ch in CLOSE_TO_KIND:
kind = CLOSE_TO_KIND[ch]
stack = stacks.get(kind)
# .get() returns 'None' if kind is not in the dict (but does not pop it yet!)
if not stack:
raise ValueError(
f"Unmatched closing bracket {ch!r} at position {i}"
)
j = stack.pop()
pairtable[i] = j
pairtable[j] = i
else:
raise ValueError(
f"Invalid token {ch!r} at position {i}"
)
for kind, stack in stacks.items():
if stack:
raise ValueError(
f"Unmatched opening bracket of kind '{kind}' at position {stack[0]}"
)
return pairtableLets look at a small example that converts the extended dot-bracket string ..[[..]] into the corresponding pair table. We walk trough each step of the conversion:
| Position | Character | Stack (square) | Action |
|---|---|---|---|
| 0 | . |
[] |
skip — unpaired |
| 1 | . |
[] |
skip — unpaired |
| 2 | [ |
[2] |
push 2 |
| 3 | [ |
[2, 3] |
push 3 |
| 4 | . |
[2, 3] |
skip — unpaired |
| 5 | . |
[2, 3] |
skip — unpaired |
| 6 | ] |
[2] |
pop 3 → pair (3, 6) |
| 7 | ] |
[] |
pop 2 → pair (2, 7) |
So, our resulting pair table should look like this: [None, None, 7, 6, None, None, 3, 2].
And indeed, if we run our function this is the correct pair table.
In order to visualize the connecting base pairs, one popular method is the so-called arc diagram. A simple implementation can be seen below. Every nucleotide type gets a different color, and the sequence is drawn as a straight line, where base pairings are shown as arc above the sequence. This immediatly shows crossing base pairs which indicate our pseudoknots.
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.patches import Arc
_NT_COLORS = {'A': '#E63946', 'U': '#2A9D8F', 'G': '#E9C46A', 'C': '#457B9D'}
def plot_rna_arc(
sequence: str,
pair_table: list[int],
*,
title: str = "RNA Secondary Structure",
figsize: tuple[float, float] = (14, 5),
save_path: str | None = None,
):
n = len(pair_table)
fig, ax = plt.subplots(figsize=figsize)
# ── backbone line ────────────────────────────────────────────────────────
ax.plot([0, n - 1], [0, 0], color="#444444", lw=1.5, zorder=1)
# ── arcs for base pairs ──────────────────────────────────────────────────
for i in range(n):
j = pair_table[i]
if j is not None and j > i: # j == -1 means unpaired
center_x = (i + j) / 2
diameter = j - i
arc = Arc(
xy=(center_x, 0),
width=diameter,
height=diameter,
angle=0,
theta1=0,
theta2=180,
color="#264653",
lw=1.2,
zorder=2,
)
ax.add_patch(arc)
# ── nucleotide nodes ─────────────────────────────────────────────────────
for i, nt in enumerate(sequence):
color = _NT_COLORS.get(nt.upper(), "#AAAAAA")
ax.scatter(i, 0, color=color, s=120, zorder=3)
ax.text(i, -0.6, nt, ha='center', va='top', fontsize=7, color=color)
# ── cosmetics ────────────────────────────────────────────────────────────
legend_handles = [
mpatches.Patch(color=col, label=nt)
for nt, col in _NT_COLORS.items()
if nt in sequence.upper()
]
ax.legend(handles=legend_handles, loc='upper right', framealpha=0.9)
ax.set_title(title, fontweight='bold', pad=10)
ax.set_xlim(-1, n)
ax.set_ylim(-1.5, n / 2)
ax.axis('off')
fig.tight_layout()
if save_path:
fig.savefig(save_path, dpi=150, bbox_inches='tight')
plt.show()Our example shows the arc diagram of an RNA secondary structure from the Turnip yellow mosaic virus and clearly shows crossing base pairs which indicate a pseudoknot.
We want to implement a parsing algorithm for closed regions of a pseudoknotted RNA secondary structure. Closed regions are defined as a region [i, j] of a seconday structure where no nucleotide inside this invertal base pairs with a nucleotide outside of this interval.
We define our Node class that consists of the two indices i and j for the opening and closing bases, as well as a list of children Nodes. One Node represents a closed region in the secondary structure.
Next, we also define the ClosedRegionTree class that consists of n, the sequence length and the root node that spans the entire sequence so that every real closed region is strictly inside it and becomes a child, never a replacement.
The defined methods, are for adding Nodes in post-fix order to the root of the tree.
The python dataclasss decorator defines some useful default methods when called upon a class. For example, __init__(), or __repr__(), if these are not specifically defined.
Here in our case: __init__() is created as default, but __repr__() is defined by us!
from __future__ import annotations
from dataclasses import dataclass, field
# see @dataclass tip above
@dataclass
class Node:
i: int
j: int
children: list[Node] = field(default_factory=list)
def add_child(self, child: Node) -> None:
self.children.append(child)
def __iter__(self):
"""Postfix (post-order) traversal: children before parent."""
for child in self.children:
yield from child
yield self
def __repr__(self):
return f"Node({self.i}, {self.j})"
# @dataclass generates __init__ for all fields in order.
# field(init=False) excludes root from that generated __init__, so the caller only passes n.
# __post_init__ then runs automatically once n is set, using it to build root.
@dataclass
class ClosedRegionTree:
n: int # n represents the sequence length of the input structure
root: Node = field(init=False)
def __post_init__(self):
# Sentinel root [-1, n] spans the entire sequence so that every real
# closed region is strictly inside it and becomes a child, never a replacement.
self.root = Node(-1, self.n)
def _last_child(self) -> Node | None:
"""Child of root with largest left border, or None if no children."""
return self.root.children[-1] if self.root.children else None
def add(self, i: int, j: int) -> None:
"""Add closed region [i, j] in postfix order (procedure Add-To-Tree)."""
node = Node(i, j) # step 1
current = self._last_child() # step 2
sentinel = Node(0, 0)
ab = current if current is not None else sentinel
while ab.i > i: # step 3
node.add_child(ab) # step 4
self.root.children.pop()
current = self._last_child() # step 5
ab = current if current is not None else sentinel
self.root.add_child(node) # step 7
def postfix(self) -> list[Node]:
return list(self.root)
def display(self) -> None:
def _display(node: Node, prefix: str = "", is_last: bool = True) -> None:
connector = "└── " if is_last else "├── "
print(prefix + connector + f"[{node.i}, {node.j}]")
child_prefix = prefix + (" " if is_last else "│ ")
for i, child in enumerate(node.children):
_display(child, child_prefix, is_last=(i == len(node.children) - 1))
print(f"[{self.root.i}, {self.root.j}]")
for i, child in enumerate(self.root.children):
_display(child, "", is_last=(i == len(self.root.children) - 1))In order to build the closed region tree, we now need a function that serves as parser for our structure.
A parser is a program that reads a sequence of symbols and determines its structure according to a formal grammar. It transforms raw input — such as a string of characters — into a structured representation (like a tree) that captures the relationships between the symbols. For more information, see Wikipedia.
In our case, this means our BuildClosedRegionsTree function serves as a parser, that takes as input the extended dot-bracket string (=sequence of symbols), and returns the closed region tree (=structured representation) as an output according to the extended dot-bracket notation rules (=formal grammar).
This whole process of determining the structure of the tree is called parsing.
BuildClosedRegionsTree walks through the exteneded dot-bracket string one character at a time and incrementally builds the closed region tree. For every paired position, it either pushes an opening bracket onto a stack (recording the interval [i, pt[i]] for later), or processes a closing bracket by checking whether the current pair crosses any existing intervals on the stack. That crossing check is the pseudoknot detection:
E seen so far.Once an interval on the stack is fully closed — meaning the current position i matches the right boundary of the stack top — it gets popped and handed to tree.add(), which slots it into the correct position in the tree using the postfix property: any previously attached node with a larger left border must be a child of the new node, not a sibling. The result is a tree where each node represents one closed region, nested exactly according to the containment structure of the input.
def BuildClosedRegionsTree(structure: str):
n = len(structure)
pt = dot_bracket_to_pairtable(structure)
tree = ClosedRegionTree(n)
stack = []
for i, char in enumerate(structure):
# pt[i] is the index of the partner nucleotide, or None if unpaired
if pt[i] is not None:
# Case 1: Opening bracket (i < pt[i])
if i < pt[i]:
stack.append([i, pt[i]])
# Case 2: Closing bracket (pt[i] < i)
# This logic handles interval merging for pseudoknots
elif pt[i] < i:
E = i
# explicitly check for pseudoknot crossing
# even though on properly defined pair tables, the for loop would already poped stack entries where top.E <= i
while stack and stack[-1][1] > i and stack[-1][0] > pt[i]:
_, topE = stack.pop()
E = max(E, topE)
# If there is still a base pair on the stack,
# update its end-point to the furthest point reached (E)
if stack:
stack[-1][1] = max(E, stack[-1][1])
# Case 3: Check if the current nucleotide closes the region on top of stack
if stack and i == stack[-1][1]:
idx, jdx = stack.pop()
tree.add(idx, jdx)
return treeHere, we see the closed region tree of our Turnip yellow mosaic virus, shown with our.display() method. We defined our extended dot-bracket string earlier already, but is shown as a comment again.
For a deeper understanding of the Tree-building-algorithm, we can have a look at the exact state at each step of the stack and the ClosedRegionTree below.
Load a new dot-bracket string (non-pseudoknotted or pseudoknotted) and see how the algorithm parses the structure.
#| '!! shinylive warning !!': |
#| shinylive does not work in self-contained HTML documents.
#| Please set `embed-resources: false` in your metadata.
#| standalone: true
#| viewerHeight: 700
from __future__ import annotations
import copy, io, os
from dataclasses import dataclass, field
from typing import Optional
import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
from matplotlib.patches import FancyBboxPatch, Arc
from shiny import App, ui, render, reactive
# ── Data structures ────────────────────────────────────────────────────────────
@dataclass
class Node:
i: int
j: int
children: list = field(default_factory=list)
@dataclass
class Step:
desc: str
phase: str
position: int
highlight: Optional[tuple]
stack: list
tree: list
# ── Pair-table ─────────────────────────────────────────────────────────────────
def dot_bracket_to_pairtable(structure: str) -> list:
OPEN = {"(": "r", "[": "s", "{": "c", "<": "a",
**{ch: ch.lower() for ch in "ABCDEFGHIJ"}}
CLOSE = {")": "r", "]": "s", "}": "c", ">": "a",
**{ch: ch.lower() for ch in "abcdefghij"}}
UNPAIRED = set(".")
stacks: dict = {}
table = [None] * len(structure)
for i, ch in enumerate(structure):
if ch in UNPAIRED:
continue
elif ch in OPEN:
stacks.setdefault(OPEN[ch], []).append(i)
elif ch in CLOSE:
kind = CLOSE[ch]
s = stacks.setdefault(kind, [])
if not s:
raise ValueError(f"Unmatched closing bracket {ch!r} at pos {i}")
j = s.pop()
table[i] = j
table[j] = i
else:
raise ValueError(f"Invalid token {ch!r} at pos {i}")
for kind, s in stacks.items():
if s:
raise ValueError(f"Unmatched opening bracket kind={kind!r} at pos {s[0]}")
return table
# ── Step recorder ──────────────────────────────────────────────────────────────
def _clone(nodes: list) -> list:
return [Node(n.i, n.j, _clone(n.children)) for n in nodes]
def build_steps(structure: str) -> list[Step]:
pt = dot_bracket_to_pairtable(structure)
n = len(structure)
roots: list[Node] = []
stack: list[list] = []
steps: list[Step] = []
def snap(desc, phase, pos, hl=None):
steps.append(Step(
desc=desc, phase=phase, position=pos, highlight=hl,
stack=copy.deepcopy(stack),
tree=_clone(roots),
))
snap(f"Initialise. Sentinel root spans [-1, {n + 1}].", "init", 0)
for i, char in enumerate(structure):
if pt[i] is None:
continue
if i < pt[i]:
stack.append([i, pt[i]])
snap(f"Position {i} '{char}': push [{i}, {pt[i]}] → stack depth {len(stack)}.",
"push", i, (i, pt[i]))
elif pt[i] < i:
E = i
merged = []
while stack and stack[-1][1] > i and stack[-1][0] > pt[i]:
popped = stack.pop()
E = max(E, popped[1])
merged.append(popped)
if merged:
snap(f"Position {i} '{char}': pseudoknot — merged {merged}, E={E}.",
"merge", i, (pt[i], i))
if stack:
old_j = stack[-1][1]
stack[-1][1] = max(E, old_j)
if stack[-1][1] != old_j:
snap(f"Extended stack top [{stack[-1][0]}, {old_j}] → [{stack[-1][0]}, {stack[-1][1]}].",
"merge", i, (stack[-1][0], stack[-1][1]))
if stack and i == stack[-1][1]:
ri, rj = stack.pop()
node = Node(ri, rj)
stolen = []
while roots and roots[-1].i > ri:
stolen.insert(0, roots.pop())
node.children.insert(0, stolen[0])
roots.append(node)
if stolen:
snap(f"Re-parented {[(c.i, c.j) for c in stolen]} under [{ri}, {rj}].",
"attach", i, (ri, rj))
snap(f"Region [{ri}, {rj}] closed and appended to root's children.",
"attach", i, (ri, rj))
snap("Tree construction complete.", "done", n - 1)
return steps
# ── Colors ─────────────────────────────────────────────────────────────────────
_PHASE_COLOR = {
"init": "#888888", "push": "#3B8ADD",
"merge": "#D85A30", "attach":"#1D9E75", "done": "#7F77DD",
}
_BRACKET_COLOR = {
"(": "#3B8ADD", ")": "#3B8ADD",
"[": "#1D9E75", "]": "#1D9E75",
"{": "#EF9F27", "}": "#EF9F27",
"<": "#D85A30", ">": "#D85A30",
}
# ── Sequence panel ─────────────────────────────────────────────────────────────
def _draw_sequence(ax, structure, pt, step):
n = len(structure)
CHAR_Y = 0.55
ax.set_xlim(0, n); ax.set_ylim(-0.25, 2.0); ax.axis("off")
for i in range(n):
if pt[i] is not None and i < pt[i]:
j = pt[i]; span = j - i; cx = (i + j + 1) / 2
arc_h = min(1.2, span * 0.055 + 0.12)
is_hl = step.highlight and (i, j) == tuple(step.highlight)
ax.add_patch(Arc((cx, CHAR_Y), width=span, height=arc_h * 2,
angle=0, theta1=0, theta2=180,
color="#EF9F27" if is_hl else "#DDDDDD",
lw=1.6 if is_hl else 0.7,
linestyle="--" if is_hl else "-", zorder=1))
if step.highlight:
hi, hj = step.highlight
ax.add_patch(mpatches.FancyBboxPatch(
(hi + 0.05, CHAR_Y - 0.22), hj - hi + 0.9, 0.44,
boxstyle="round,pad=0.04",
facecolor="#FFF3CD", edgecolor="#EF9F27", lw=1.2, zorder=0))
ax.axvline(x=step.position + 0.5, color="#CCCCCC", lw=0.8, linestyle=":")
fs = max(6, min(10, int(450 / n)))
for i, c in enumerate(structure):
paired = pt[i] is not None
color = _BRACKET_COLOR.get(c, "#CCCCCC") if paired else "#CCCCCC"
ax.text(i + 0.5, CHAR_Y, c, ha="center", va="center",
fontsize=fs, family="monospace",
color=color, fontweight="bold" if paired else "normal", zorder=2)
if n <= 80 and i % 5 == 0:
ax.text(i + 0.5, CHAR_Y - 0.2, str(i),
ha="center", va="top", fontsize=5.5, color="#BBBBBB")
ax.axvline(x=0, color=_PHASE_COLOR.get(step.phase, "#888"), lw=5)
ax.set_title(f"[{step.phase}] {step.desc}", fontsize=8.5, loc="left", color="#333", pad=3)
# ── Stack panel ────────────────────────────────────────────────────────────────
def _draw_stack(ax, step):
ax.axis("off"); ax.set_xlim(0, 1); ax.set_ylim(0, 1)
ax.set_title("Algorithm stack", fontsize=9, color="#555", pad=4)
stack = step.stack
if not stack:
ax.text(0.5, 0.5, "empty", ha="center", va="center",
fontsize=10, color="#AAAAAA", style="italic", family="monospace")
return
MAX = 12; visible = stack[-MAX:]; nv = len(visible)
item_h = min(0.088, 0.82 / max(nv, 1)); gap = 0.012; top_y = 0.95
for k, (si, sj) in enumerate(reversed(visible)):
is_top = (k == 0)
y = top_y - k * (item_h + gap)
face = "#5DCAA5" if is_top else ("#9FE1CB" if k % 2 == 0 else "#BCEAD5")
edge = "#0F6E56" if is_top else "#1D9E75"
ax.add_patch(FancyBboxPatch(
(0.08, y - item_h * 0.9), 0.84, item_h * 0.86,
boxstyle="round,pad=0.005",
facecolor=face, edgecolor=edge,
lw=1.5 if is_top else 0.7, zorder=2))
fs = max(6, min(9, int(90 / max(nv, 1)) + 4))
ax.text(0.5, y - item_h * 0.45, f"[{si}, {sj}]",
ha="center", va="center", fontsize=fs,
family="monospace", color="#04342C", fontweight="bold", zorder=3)
if len(stack) > MAX:
ax.text(0.5, top_y - MAX * (item_h + gap), f"… +{len(stack) - MAX} more",
ha="center", va="top", fontsize=7, color="#999")
ax.annotate("← top", xy=(0.92, top_y - item_h * 0.45),
fontsize=6.5, color="#0F6E56", va="center", ha="left")
# ── Tree panel ─────────────────────────────────────────────────────────────────
def _layout(nodes, depth=0, counter=None):
if counter is None: counter = [0]
for node in nodes:
if not node.children:
node._cx = counter[0] + 0.5; counter[0] += 1
else:
_layout(node.children, depth + 1, counter)
node._cx = (node.children[0]._cx + node.children[-1]._cx) / 2
node._d = depth
def _count_leaves(nodes):
c = 0
for n in nodes: c += _count_leaves(n.children) if n.children else 1
return c
def _max_depth(nodes):
if not nodes: return 0
return max(max(n._d, _max_depth(n.children)) for n in nodes)
def _draw_tree(ax, step, n_seq):
ax.axis("off")
ax.set_title("Closed region tree", fontsize=9, color="#555", pad=4)
tree = step.tree
root_node = Node(-1, n_seq, list(tree))
_layout([root_node])
n_leaves = max(_count_leaves([root_node]), 1)
max_d = max(1, _max_depth([root_node]))
LH = 1.3; BOX_W = 0.84; BOX_H = 0.62; FS = 8
ax.set_xlim(-0.5, n_leaves + 0.2)
ax.set_ylim(-(max_d + 0.6) * LH, 0.85)
def nxy(node): return node._cx, -node._d * LH
def draw_edges(node):
px, py = nxy(node)
for c in node.children:
cx, cy = nxy(c)
ax.plot([px, cx], [py, cy], color="#CCCCCC", lw=0.9, zorder=1)
draw_edges(c)
def draw_nodes(node):
x, y = nxy(node)
is_root = (node.i == 0 and node.j == n_seq + 1)
is_hl = (step.highlight is not None
and node.i == step.highlight[0]
and node.j == step.highlight[1])
if is_root: face, edge, tc, lw = "#F0F0F0","#AAAAAA","#555555", 0.8
elif is_hl: face, edge, tc, lw = "#5DCAA5","#0F6E56","#04342C", 1.5
else: face, edge, tc, lw = "#9FE1CB","#1D9E75","#04342C", 0.8
ax.add_patch(FancyBboxPatch(
(x - BOX_W/2, y - BOX_H/2), BOX_W, BOX_H,
boxstyle="round,pad=0.04",
facecolor=face, edgecolor=edge, lw=lw, zorder=2))
label = (f"[{node.i},{node.j}]" if not is_root else f"root\n[0,{n_seq+1}]")
ax.text(x, y, label, ha="center", va="center", fontsize=FS,
family="monospace", color=tc, fontweight="bold",
multialignment="center", zorder=3)
for c in node.children: draw_nodes(c)
if not tree:
ax.set_xlim(-0.5, 1.2); ax.set_ylim(-0.78, 0.85)
draw_edges(root_node); draw_nodes(root_node)
# ── draw_step ──────────────────────────────────────────────────────────────────
def draw_step(structure, steps, idx, figsize=None):
pt = dot_bracket_to_pairtable(structure)
step = steps[min(idx, len(steps) - 1)]
n = len(structure)
_root_tmp = Node(0, n + 1, _clone(step.tree))
_layout([_root_tmp])
n_leaves = max(_count_leaves([_root_tmp]), 1)
tree_depth = max(1, _max_depth([_root_tmp]))
if figsize is None:
fw = max(13.0, n_leaves * 1.2 + 4.5)
fh = max(7.5, tree_depth * 1.4 + 4.0)
figsize = (fw, fh)
fig = plt.figure(figsize=figsize, facecolor="white")
gs = fig.add_gridspec(
2, 2,
height_ratios=[1.0, max(2.8, tree_depth * 0.9)],
width_ratios=[1, max(2.6, n_leaves * 0.45)],
hspace=0.42, wspace=0.22,
left=0.04, right=0.97, top=0.93, bottom=0.04,
)
ax_seq = fig.add_subplot(gs[0, :])
ax_stack = fig.add_subplot(gs[1, 0])
ax_tree = fig.add_subplot(gs[1, 1])
fig.text(0.98, 0.97, f"step {idx + 1} / {len(steps)}",
ha="right", va="top", fontsize=8, color="#AAAAAA")
_draw_sequence(ax_seq, structure, pt, step)
_draw_stack(ax_stack, step)
_draw_tree(ax_tree, step, n)
return fig
# ── Shiny app ──────────────────────────────────────────────────────────────────
DEFAULT = "(((((.......)))))(((....[[[[[)))...]]]]]...."
_steps = build_steps(DEFAULT)
def _step_for_base(steps, base):
"""Latest step whose position <= base."""
idx = 0
for k, s in enumerate(steps):
if s.position <= base:
idx = k
return idx
app_ui = ui.page_fluid(
ui.h4("Closed region tree — step-by-step", style="margin-bottom:4px"),
ui.layout_columns(
ui.input_text("structure", None, value=DEFAULT, width="100%"),
ui.input_action_button("load", "Load", class_="btn-sm"),
col_widths=[10, 2],
),
ui.input_slider("base", "Base position", min=0, max=len(DEFAULT) - 1,
value=0, step=1, width="100%",
animate=ui.AnimationOptions(interval=350)),
ui.download_button("save_svg", "Save current step as SVG", class_="btn-sm"),
ui.output_plot("tree_plot", height="560px"),
)
def server(input, output, session):
steps = reactive.value(_steps)
@reactive.effect
@reactive.event(input.load)
def _reload():
s = input.structure().strip()
try:
new_steps = build_steps(s)
steps.set(new_steps)
ui.update_slider("base", min=0, max=len(s) - 1, value=0)
except ValueError as e:
ui.notification_show(str(e), type="error")
@output
@render.plot
def tree_plot():
s = input.structure().strip()
st = steps.get()
idx = _step_for_base(st, input.base())
return draw_step(s, st, idx)
@output
@render.download(filename=lambda: f"tree_base_{input.base():03d}.svg")
def save_svg():
s = input.structure().strip()
st = steps.get()
idx = _step_for_base(st, input.base())
fig = draw_step(s, st, idx)
buf = io.StringIO()
fig.savefig(buf, format="svg", bbox_inches="tight", facecolor="white")
plt.close(fig)
buf.seek(0)
yield buf.read()
app = App(app_ui, server)
Now that we have successfully built the closed region tree, the question of why naturally arises. The answer lies in the free energy evaluation of RNA secondary structures. The total free energy of a structure can be decomposed into independent contributions from different secondary structure elements. In standard nearest-neighbor thermodynamic models (like the Turner energy model), the overall free energy of an RNA structure is calculated by summing the energy contributions of its constituent loops and stems Wikipedia. Closed regions are necessary to determine which loop type is present in a given structure.
This matters enormously in practice. Free energy evaluation is the most important aspect for RNA structure prediction, and structure prediction underpins applications ranging from mRNA vaccine design — where the folding of the transcript directly affects stability and translation efficiency — to ribozyme engineering and other synthetic biology tasks where precise control over RNA shape is essential.