1 Stage 2 - Representation of subunits and translation of the data into spatial restraints {#rnapolii_2}
2 ========================================================================================
4 In
this stage, we will initially define a representation of the system. Afterwards, we will convert the data into spatial restraints. This is performed
using the script `rnapolii/modeling/modeling.py` and uses the
6 `topology.txt`, to define the system components and their representation
9 ### Setting up Model Representation in IMP
12 Very generally, the *representation* of a system is defined by all the variables that need to be determined based on input information, including the assignment of the system components to geometric objects (e.g. points, spheres, ellipsoids, and 3D Gaussian density functions).
14 Our
RNA Pol II representation employs *spherical beads* of varying sizes and *3D Gaussians*, which coarsen domains of the complex
using several resolution scales simultaneously. The *spatial restraints* will be applied to individual resolution scales as appropriate.
16 Beads and Gaussians of a given domain are arranged into either a rigid body or a flexible string, based on the crystallographic structures. In a *rigid body*, all the beads and the Gaussians of a given domain have their relative distances constrained during configurational sampling,
while in a *flexible
string* the beads and the Gaussians are restrained by the sequence connectivity.
18 <img src=
"rnapolii_Multi-scale_representation.png" width=
"600px" />
19 _Multi-scale representation of Rpb1 subunit of
RNA Pol II_
23 The GMM of a subunit is the set of all 3D Gaussians used to represent it; it will be used to calculate the EM score. The calculation of the GMM of a subunit can be done automatically in the
25 For the purposes of
this tutorial, we already created these
for Rpb4 and Rpb7 and placed them in the `rnapolii/data` directory in their respective `.mrc` and `.txt` files.
27 **Dissecting the script**
28 The script `rnapolii/modeling/modeling.py` sets up the representation of the system and the restraint. (Subsequently it also performs [sampling](@ref rnapolii_3), but more on that later.)
31 The first part of the script defines the files used in model building and restraint generation.
34 #---------------------------
36 #---------------------------
37 datadirectory =
"../data/"
38 topology_file = datadirectory+
"topology.txt"
39 target_gmm_file = datadirectory+
'emd_1883.map.mrc.gmm.50.txt'
42 The first section defines where input files are located. The
43 [topology file](https:
44 defines how the system components are structurally represented. `target_gmm_file` stores the EM map
for the entire complex, which has already been converted into a Gaussian mixture model.
46 **Build the Model Representation Using a Topology File**
47 Using the topology file we define the overall topology: we introduce the
48 molecules with their sequence and their known structure, and define the movers.
49 Each line in the file is a user-defined molecular **Domain**, and each column
50 contains the specifics needed to build the system.
52 for a full description of the topology file format.
58 # Read in the topology file.
59 # Specify the directory wheere the PDB files, fasta files and GMM files are
60 topology = IMP.pmi.topology.TopologyReader(topology_file,
61 pdb_dir=datadirectory,
62 fasta_dir=datadirectory,
63 gmm_dir=datadirectory)
65 # Use the BuildSystem macro to build states from the topology file
66 bs = IMP.pmi.macros.BuildSystem(m)
68 # Each state can be specified by a topology file.
69 bs.add_state(topology)
72 **Building the System Representation and Degrees of Freedom**
74 Here we can set the **Degrees of Freedom** parameters, which should be
75 optimized according to MC acceptance ratios. There are three kind of movers:
76 Rigid Body, Bead, and Super Rigid Body (super rigid bodies are sets of
77 rigid bodies and beads that will move together in an additional Monte Carlo
80 `max_rb_trans` and `max_rb_rot` are the maximum translation and rotation
81 of the Rigid Body mover, `max_srb_trans` and `max_srb_rot` are the maximum
82 translation and rotation of the Super Rigid Body mover and `max_bead_trans`
83 is the maximum translation of the Bead Mover.
85 The excecution of the macro will
return the root hierarchy (`root_hier`)
86 and the degrees of freedom (`dof`) objects, both of which are used later on.
89 root_hier, dof = bs.execute_macro(max_rb_trans=4.0,
96 Since we
're interested in modeling the stalk, we will fix all subunits
97 except Rpb4 and Rpb7. Note that we are using IMP.atom.Selection to get the
98 particles that correspond to the fixed Molecules.
101 # Fix all rigid bodies but not Rpb4 and Rpb7 (the stalk)
102 # First select and gather all particles to fix.
104 for prot in ["Rpb1","Rpb2","Rpb3","Rpb5","Rpb6","Rpb8","Rpb9","Rpb10","Rpb11","Rpb12"]:
105 fixed_particles+=IMP.atom.Selection(root_hier,molecule=prot).get_selected_particles()
107 # Fix the Corresponding Rigid movers and Super Rigid Body movers using dof
108 # The flexible beads will still be flexible (fixed_beads is an empty list)!
109 fixed_beads,fixed_rbs=dof.disable_movers(fixed_particles,
110 [IMP.core.RigidBodyMover,
111 IMP.pmi.TransformMover])
114 Finally we randomize the initial configuration to remove any bias from the
115 initial starting configuration read from input files. Since each subunit is
116 composed of rigid bodies (i.e., beads constrained in a structure) and flexible
117 beads, the configuration of the system is initialized by displacing each
118 mobile rigid body and each bead randomly by 50 Angstroms, and rotate them
119 randomly, and far enough from each other to prevent any steric clashes.
121 The `excluded_rigid_bodies=fixed_rbs` will exclude from the randomization
122 everything that was fixed above.
125 # Randomize the initial configuration before sampling, of only the molecules
126 # we are interested in (Rpb4 and Rpb7)
127 IMP.pmi.tools.shuffle_configuration(root_hier,
128 excluded_rigid_bodies=fixed_rbs,
135 ### Set up Restraints
137 After defining the representation of the model, we build the restraints by which the individual structural models will be scored based on the input data.
139 **Connectivity Restraint**
141 # Connectivity keeps things connected along the backbone (ignores if inside
143 mols = IMP.pmi.tools.get_molecules(root_hier)
145 molname=mol.get_name()
146 IMP.pmi.tools.display_bonds(mol)
147 cr = IMP.pmi.restraints.stereochemistry.ConnectivityRestraint(mol,scale=2.0)
149 cr.set_label(molname)
150 outputobjects.append(cr)
153 **Excluded Volume Restraint**
155 ev = IMP.pmi.restraints.stereochemistry.ExcludedVolumeSphere(
156 included_objects=root_hier,
159 outputobjects.append(ev)
162 The excluded volume restraint is calculated at resolution 10 (20 residues per bead).
167 A crosslinking restraint is implemented as a distance restraint between two residues. The two residues are each defined by the protein (component) name and the residue number. The script here extracts the correct four columns that provide this information from the [input data file](@ref rnapolii_1).
170 xldbkwc = IMP.pmi.io.crosslink.CrossLinkDataBaseKeywordsConverter()
171 xldbkwc.set_protein1_key("pep1.accession")
172 xldbkwc.set_protein2_key("pep2.accession")
173 xldbkwc.set_residue1_key("pep1.xlinked_aa")
174 xldbkwc.set_residue2_key("pep2.xlinked_aa")
176 xl1db = IMP.pmi.io.crosslink.CrossLinkDataBase(xldbkwc)
177 xl1db.create_set_from_file(datadirectory+'polii_xlinks.csv
')
179 xl1 = IMP.pmi.restraints.crosslinking.CrossLinkingMassSpectrometryRestraint(
181 CrossLinkDataBase=xl1db,
191 An object `xl1` for this crosslinking restraint is created and then added to the model.
192 * `length`: The maximum length of the crosslink
193 * `slope`: Slope of linear energy function added to sigmoidal restraint
194 * `resolution`: The resolution at which the restraint is evaluated. 1 = residue level
195 * `label`: A label for this set of cross links - helpful to identify them later in the stat file
200 em_components = IMP.pmi.tools.get_densities(root_hier)
202 gemt = IMP.pmi.restraints.em.GaussianEMRestraint(em_components,
204 scale_target_to_mass=True,
208 outputobjects.append(gemt)
211 The GaussianEMRestraint uses a density overlap function to compare model to data. First the EM map is approximated with a Gaussian Mixture Model (done separately). Second, the components of the model are represented with Gaussians (forming the model GMM)
212 * `scale_to_target_mass` ensures the total mass of model and map are identical
213 * `slope`: nudge model closer to map when far away
214 * `weight`: heuristic, needed to calibrate the EM restraint with the other terms.
216 and then add it to the output object.
220 Completion of these steps sets the energy function.
221 The next step is \ref rnapolii_3.