IMP logo
IMP Manual  for IMP version 2.10.1
rnapolii_2.md
1 Stage 2 - Representation of subunits and translation of the data into spatial restraints {#rnapolii_2}
2 ========================================================================================
3 
4 In this stage, we will initially define a representation of the system. Afterwards, we will convert the data into spatial restraints. This is performed using the script `rnapolii/modeling/modeling.py` and uses the
5 [topology file](@ref IMP::pmi::topology::TopologyReader),
6 `topology.txt`, to define the system components and their representation
7 parameters.
8 
9 ### Setting up Model Representation in IMP
10 
11 **Representation**
12 Very generally, the *representation* of a system is defined by all the variables that need to be determined based on input information, including the assignment of the system components to geometric objects (e.g. points, spheres, ellipsoids, and 3D Gaussian density functions).
13 
14 Our RNA Pol II representation employs *spherical beads* of varying sizes and *3D Gaussians*, which coarsen domains of the complex using several resolution scales simultaneously. The *spatial restraints* will be applied to individual resolution scales as appropriate.
15 
16 Beads and Gaussians of a given domain are arranged into either a rigid body or a flexible string, based on the crystallographic structures. In a *rigid body*, all the beads and the Gaussians of a given domain have their relative distances constrained during configurational sampling, while in a *flexible string* the beads and the Gaussians are restrained by the sequence connectivity.
17 
18 <img src="rnapolii_Multi-scale_representation.png" width="600px" />
19 _Multi-scale representation of Rpb1 subunit of RNA Pol II_
20 
21 
22 
23 The GMM of a subunit is the set of all 3D Gaussians used to represent it; it will be used to calculate the EM score. The calculation of the GMM of a subunit can be done automatically in the
24 [topology file](@ref IMP::pmi::topology::TopologyReader).
25 For the purposes of this tutorial, we already created these for Rpb4 and Rpb7 and placed them in the `rnapolii/data` directory in their respective `.mrc` and `.txt` files.
26 
27 **Dissecting the script**
28 The script `rnapolii/modeling/modeling.py` sets up the representation of the system and the restraint. (Subsequently it also performs [sampling](@ref rnapolii_3), but more on that later.)
29 
30 **Header**
31 The first part of the script defines the files used in model building and restraint generation.
32 
33 \code{.py}
34 #---------------------------
35 # Define Input Files
36 #---------------------------
37 datadirectory = "../data/"
38 topology_file = datadirectory+"topology.txt"
39 target_gmm_file = datadirectory+'emd_1883.map.mrc.gmm.50.txt'
40 \endcode
41 
42 The first section defines where input files are located. The
43 [topology file](https://github.com/salilab/imp_tutorial/blob/pmi2/rnapolii/data/topology.txt)
44 defines how the system components are structurally represented. `target_gmm_file` stores the EM map for the entire complex, which has already been converted into a Gaussian mixture model.
45 
46 **Build the Model Representation Using a Topology File**
47 Using the topology file we define the overall topology: we introduce the
48 molecules with their sequence and their known structure, and define the movers.
49 Each line in the file is a user-defined molecular **Domain**, and each column
50 contains the specifics needed to build the system.
51 See the [TopologyReader](@ref IMP::pmi::topology::TopologyReader) documentation
52 for a full description of the topology file format.
53 
54 \code{.py}
55 # Initialize model
56 m = IMP.Model()
57 
58 # Read in the topology file.
59 # Specify the directory wheere the PDB files, fasta files and GMM files are
60 topology = IMP.pmi.topology.TopologyReader(topology_file,
61  pdb_dir=datadirectory,
62  fasta_dir=datadirectory,
63  gmm_dir=datadirectory)
64 
65 # Use the BuildSystem macro to build states from the topology file
66 bs = IMP.pmi.macros.BuildSystem(m)
67 
68 # Each state can be specified by a topology file.
69 bs.add_state(topology)
70 \endcode
71 
72 **Building the System Representation and Degrees of Freedom**
73 
74 Here we can set the **Degrees of Freedom** parameters, which should be
75 optimized according to MC acceptance ratios. There are three kind of movers:
76 Rigid Body, Bead, and Super Rigid Body (super rigid bodies are sets of
77 rigid bodies and beads that will move together in an additional Monte Carlo
78 move).
79 
80 `max_rb_trans` and `max_rb_rot` are the maximum translation and rotation
81 of the Rigid Body mover, `max_srb_trans` and `max_srb_rot` are the maximum
82 translation and rotation of the Super Rigid Body mover and `max_bead_trans`
83 is the maximum translation of the Bead Mover.
84 
85 The excecution of the macro will return the root hierarchy (`root_hier`)
86 and the degrees of freedom (`dof`) objects, both of which are used later on.
87 
88 \code{.py}
89 root_hier, dof = bs.execute_macro(max_rb_trans=4.0,
90  max_rb_rot=0.3,
91  max_bead_trans=4.0,
92  max_srb_trans=4.0,
93  max_srb_rot=0.3)
94 \endcode
95 
96 Since we're interested in modeling the stalk, we will fix all subunits
97 except Rpb4 and Rpb7. Note that we are using IMP.atom.Selection to get the
98 particles that correspond to the fixed Molecules.
99 
100 \code{.py}
101 # Fix all rigid bodies but not Rpb4 and Rpb7 (the stalk)
102 # First select and gather all particles to fix.
103 fixed_particles=[]
104 for prot in ["Rpb1","Rpb2","Rpb3","Rpb5","Rpb6","Rpb8","Rpb9","Rpb10","Rpb11","Rpb12"]:
105  fixed_particles+=IMP.atom.Selection(root_hier,molecule=prot).get_selected_particles()
106 
107 # Fix the Corresponding Rigid movers and Super Rigid Body movers using dof
108 # The flexible beads will still be flexible (fixed_beads is an empty list)!
109 fixed_beads,fixed_rbs=dof.disable_movers(fixed_particles,
110  [IMP.core.RigidBodyMover,
111  IMP.pmi.TransformMover])
112 \endcode
113 
114 Finally we randomize the initial configuration to remove any bias from the
115 initial starting configuration read from input files. Since each subunit is
116 composed of rigid bodies (i.e., beads constrained in a structure) and flexible
117 beads, the configuration of the system is initialized by displacing each
118 mobile rigid body and each bead randomly by 50 Angstroms, and rotate them
119 randomly, and far enough from each other to prevent any steric clashes.
120 
121 The `excluded_rigid_bodies=fixed_rbs` will exclude from the randomization
122 everything that was fixed above.
123 
124 \code{.py}
125 # Randomize the initial configuration before sampling, of only the molecules
126 # we are interested in (Rpb4 and Rpb7)
127 IMP.pmi.tools.shuffle_configuration(root_hier,
128  excluded_rigid_bodies=fixed_rbs,
129  max_translation=50,
130  verbose=False,
131  cutoff=5.0,
132  niterations=100)
133 \endcode
134 
135 ### Set up Restraints
136 
137 After defining the representation of the model, we build the restraints by which the individual structural models will be scored based on the input data.
138 
139 **Connectivity Restraint**
140 \code{.py}
141 # Connectivity keeps things connected along the backbone (ignores if inside
142 # same rigid body)
143 mols = IMP.pmi.tools.get_molecules(root_hier)
144 for mol in mols:
145  molname=mol.get_name()
146  IMP.pmi.tools.display_bonds(mol)
147  cr = IMP.pmi.restraints.stereochemistry.ConnectivityRestraint(mol,scale=2.0)
148  cr.add_to_model()
149  cr.set_label(molname)
150  outputobjects.append(cr)
151 \endcode
152 
153 **Excluded Volume Restraint**
154 \code{.py}
155 ev = IMP.pmi.restraints.stereochemistry.ExcludedVolumeSphere(
156  included_objects=root_hier,
157  resolution=10)
158 ev.add_to_model()
159 outputobjects.append(ev)
160 \endcode
161 
162 The excluded volume restraint is calculated at resolution 10 (20 residues per bead).
163 
164 
165 **Crosslinks**
166 
167 A crosslinking restraint is implemented as a distance restraint between two residues. The two residues are each defined by the protein (component) name and the residue number. The script here extracts the correct four columns that provide this information from the [input data file](@ref rnapolii_1).
168 
169 \code{.py}
170 xldbkwc = IMP.pmi.io.crosslink.CrossLinkDataBaseKeywordsConverter()
171 xldbkwc.set_protein1_key("pep1.accession")
172 xldbkwc.set_protein2_key("pep2.accession")
173 xldbkwc.set_residue1_key("pep1.xlinked_aa")
174 xldbkwc.set_residue2_key("pep2.xlinked_aa")
175 
176 xl1db = IMP.pmi.io.crosslink.CrossLinkDataBase(xldbkwc)
177 xl1db.create_set_from_file(datadirectory+'polii_xlinks.csv')
178 
179 xl1 = IMP.pmi.restraints.crosslinking.CrossLinkingMassSpectrometryRestraint(
180  root_hier=root_hier,
181  CrossLinkDataBase=xl1db,
182  length=21.0,
183  slope=0.01,
184  resolution=1.0,
185  label="Trnka",
186  weight=1.)
187 
188 xl1.add_to_model()
189 \endcode
190 
191 An object `xl1` for this crosslinking restraint is created and then added to the model.
192 * `length`: The maximum length of the crosslink
193 * `slope`: Slope of linear energy function added to sigmoidal restraint
194 * `resolution`: The resolution at which the restraint is evaluated. 1 = residue level
195 * `label`: A label for this set of cross links - helpful to identify them later in the stat file
196 
197 **EM Restraint**
198 
199 \code{.py}
200 em_components = IMP.pmi.tools.get_densities(root_hier)
201 
202 gemt = IMP.pmi.restraints.em.GaussianEMRestraint(em_components,
203  target_gmm_file,
204  scale_target_to_mass=True,
205  slope=0.000001,
206  weight=80.0)
207 gemt.add_to_model()
208 outputobjects.append(gemt)
209 \endcode
210 
211 The GaussianEMRestraint uses a density overlap function to compare model to data. First the EM map is approximated with a Gaussian Mixture Model (done separately). Second, the components of the model are represented with Gaussians (forming the model GMM)
212 * `scale_to_target_mass` ensures the total mass of model and map are identical
213 * `slope`: nudge model closer to map when far away
214 * `weight`: heuristic, needed to calibrate the EM restraint with the other terms.
215 
216 and then add it to the output object.
217 
218 ---
219 
220 Completion of these steps sets the energy function.
221 The next step is \ref rnapolii_3.
const ChainType RNA