IMP logo
IMP Manual  for IMP version 2.16.0
biosystem.md
1 Applying %IMP to a new biological system {#biosystem}
2 ========================================
3 
4 We have already applied %IMP to solve the structures of many novel biological
5 systems, listed on the [biological systems page](https://integrativemodeling.org/systems/).
6 Each system on that page includes all of the files needed to reproduce the
7 results in the accompanying publication. For example, the list includes the
8 [modeling example from earlier in this manual](@ref rnapolii_stalk), as well
9 as [modeling of the Nup84 subcomplex of the Nuclear Pore Complex](https://salilab.org/nup84). Each system is periodically rerun with the latest version of %IMP
10 to make sure that it still works correctly.
11 
12 To apply %IMP to a new biological system, you are welcome to use one of the
13 existing systems, such as the [Nup84 model](https://salilab.org/nup84),
14 as your template - or you can write from scratch using the basic %IMP classes
15 and/or the IMP::pmi higher level interface. In either case, we strongly
16 recommend that you manage your application as a GitHub repository so that
17  - others can reproduce your published work
18  - changes to the protocol can be documented or rolled back if necessary
19  - your system can be added to [our list](https://integrativemodeling.org/systems/), so that we can test newer versions of %IMP to make sure we don't break something
20 
21 We recommend the following contents for your repository (see the
22 [Nup84 repository](https://github.com/integrativemodeling/nup84)
23 for an example, or
24 [this journal article](https://doi.org/10.1002/1873-3468.14067) for more
25 general recommendations):
26 
27  - subdirectories containing
28  - your modeling protocol (generally one or more Python scripts).
29  - input files (e.g. PDB files, EM density maps, lists of crosslinks),
30  especially if these files aren't in a database somewhere already.
31  If these inputs are derived in some fashion (e.g. you use a PDB file as
32  input that's a comparative model or docking result, or you use an EM map
33  that's been segmented) then this needs to be described somewhere, with
34  links to the original unmodified files (e.g. PDB IDs for templates of any
35  comparative models, alignment files, Modeller scripts).
36  - outputs (trajectories, clusters, analysis). Where this isn't possible
37  due to size, we can host the larger files, such as trajectories, elsewhere
38  (e.g. as a dataset in [Zenodo](https://zenodo.org)) and link to them
39  from the repository. Aim to keep the repository below 1GB in size so that
40  it's manageable.
41  - a top-level `%README.md` file describing the system and explaining how to
42  run the protocol.
43  - a top-level `LICENSE` file with the license for the data files and scripts.
44  This doesn't need to be the same license (LGPL/GPL) that %IMP uses; in fact,
45  for data files one of the [Creative Commons](https://creativecommons.org/)
46  licenses probably makes more sense. We recommend the
47  [CC BY-SA license](https://creativecommons.org/licenses/by-sa/4.0/)
48  which allows anybody to use and modify the data under the same terms, as
49  long as they cite the original work.
50  - a `test` directory containing one or more Python scripts with names starting
51  with `test`. It should be
52  possible to run these scripts without any "special" setup (e.g. they should
53  not require any input arguments or environment variables, or use
54  hard-coded paths). These scripts should run as much of your modeling
55  protocol as possible, and ideally test the results (e.g. by comparing models
56  against 'known good' clusters). Each script should simply exit with a
57  non-zero exit code (e.g. by raising an exception) if something failed; one
58  easy way to do this nicely is to use Python's
59  [unittest](https://docs.python.org/2/library/unittest.html) module. The
60  tests should run in a "reasonable" amount of time (no more than 48 hours)
61  on a single processor. If this is not enough time to run your entire
62  protocol, run only a representative subset
63  (e.g. the Nup84 modeling test passes a `--test` option to the modeling
64  script, which has it perform fewer iterations of sampling).
65  - to add your system to [our list](https://integrativemodeling.org/systems/)
66  it will also need a `metadata` subdirectory (also
67  [contact us](https://integrativemodeling.org/contact.html) to let us know
68  about it).
69  This should contain two files:
70  - `thumb.png`: a small image used to represent your system on the page.
71  - `metadata.yaml`: a file in [YAML](http://yaml.org/) format specifying
72  (see also the [Nup84 example](https://github.com/integrativemodeling/nup84/blob/main/metadata/metadata.yaml)):
73  - `title`: a short descriptive name for your system
74  - `tags`: a list of tags to group your system with others that use
75  similar methods or input data
76  - `pmid`: the PubMed ID of the accompanying publication
77  - `prereqs`: a list of any non-standard packages that are needed
78  (in addition to %IMP and Python's standard library) to run the scripts
79  - `runtime`: upper limit to the time the tests will take to run
80  - `build`: which type of %IMP build to run the tests with
81  (`release`, `fast` or `debug`); `release` is generally recommended
82  - `parallel`: if set, the tests will be run in an MPI environment, with
83  the given number of cores available (by default, a serial environment
84  is used)
85 
86 Publication or deposition generally require a [DOI](https://www.doi.org/).
87 We generally do this by uploading a snapshot of the GitHub repository to
88 [Zenodo](https://zenodo.org), alongside other input/output datasets that
89 aren't deposited in a specialist repository such as
90 [PDB](https://www.wwpdb.org/), [EMDB](https://www.ebi.ac.uk/emdb/) etc.