E-Cell 4 Core Design

Authors: Koichi Takahashi
Date: 01/18/2007

This document is a work in progress.

Introduction

This document describes a proposed design of the new E-Cell simulator core (Version 4). This document discusses design of the core putting an emphasis on architectural and other important changes from the previous version (E-Cell 3). Frontend architecture is out of scope of this document. However, functionalities that were implemented as part of the core in the previous version but are proposed to be moved to the frontend will be discussed.

Features

Takahashi identified one possible set of requiresite features of simulators to be truly useful in integrated modeling, simulation and analysis of highly complex and large-scale systems like the biological cells.

  1. Multi-algorithm simulation
  2. Multi-timescale simulation
  3. Object-oriented modeling
  4. Object-oriented simulation
  5. Runtime user interaction
  6. Dynamic model structure
  7. Spatial modeling and simulation

Requirements (1), (3), (4) and (5) are addressed in E-Cell 1. E-Cell 3 extended multi-algorithm-ness of E-Cell 1 and added a multi-timescale capability (2). Detailed discussions on rationale and some possible approaches to the identified requirements can be found in section 2.4 of Takahashi PhD thesis (2004) and Takahashi et. al. (2002).

In E-Cell 4 we would like to tackle the remaining two requirements, (6) Dynamic model structure and (7) Spatial modeling and simulation.

Dynamic model structure

This feature is crucially important for cell simulators to have an ability to represent dynamically changing model structure. Dynamic model structure here means an ability of a simulator to (1) run simulation models that program conditions and manners of creation, deletion and changes in relations and connections of objects during simulation, and (2) handle any creation, deletion and changes in relations and connections of objects by the user at any time during simulation properly without destroying the model's consistency.

This feature will be useful in cases that include when

  1. the model needs to represent dynamic changes in structures of reaction pathways in biochemical modeling (such as cell division).
  2. the model needs to represent dynamic creation and deletion of model components such as vesicles or cells.
  3. the model dynamically create pathways when enumerating every possible reaction channels and types of complexes is impractical (e.g. Molecular technology).

Spatial modeling and simulation

We would like E-Cell 4 to have an ability to drive multiple sub-models with different types of representation of space interacting each other to constitute the whole simulation model. E-Cell 4 shall be capable of multi-spatial representation in this sense.

E-Cell 4 must support at least the following types of representations of space.

  • Particle space.
  • Microscopic lattice space.
  • Mesoscopic lattice space.
  • Compartmental space.
  • (Mesh space)

Support for mesh spatial representation as commonly used in simulation algorithms that relies on partial differential equations needs more discussions. Initial version of E-Cell 4 may not be released with support for mesh, as (1) fully supporting either structured or unstructured mesh and making it fully interoperatable with other spatial representations may be a challenge by itself. (2) PDE models of diffusion-reaction at the cellular level has become slightly less appealing as we understand how noise, diffusion and crowding address themselves in intracellular milleu (Takahashi et al., FEBS Letters (2005)). However, the general spatial framework in E-Cell 4 must have a carefully designed generality to allow useful implementation of mesh space when it is necessary.

General strategies in design and implementation

  1. Do what can be done in frontend in frontend
  • Minimize lines of code of C++ part for maintainability
  • The only reason to do something in C++ backend is performance. As we move into the era of spatial simulation, cost of executing a simulation step increases significantly, and it gets easier to justify doing more things in frontend.

Proposed changes in simulator core

Primitive classes

E-Cell 3 primitive classes:

PropertiedClass <----- Entity <------ Variable
                   |              |
                   |              |-- Process
                   |              |
                   |              |-- System
                   |
                   |-- Stepper

Proposed E-Cell 4 primitive classes:

PropertiedClass <----- Entity (was System)
                   |
                   |-- Process
                   |
                   |-- Stepper

Entity <>--- Variable

where <--- means inheritance, and <>--- means aggregation.

  • Entity is deprecated.
  • System is renamed to Entity
  • New Entity class has Variable objects, associated with IDs.
  • Variable object is no longer a PropertiedClass, and becomes lightweight.

Support for multi-space

Discussions in this section is much more premature than other sections.

The system defines a set of several different specifications of spatial representations. A Spatial reseresentation is defined as a view concept (name needs more considrations) in the system.

Variable holds a multi-dimensional array of numbers (real numbers. should integers be supported too?). This multi-array is associated with a view object. The view object provides the method how this plain sequence of numbers can be interpreted as a spatial distribution of quantities.

PropertyInterface and PropertySlot

Array type

Currently a PropertySlot handles a Polymorph as its property value. Polymorph is a class that can be one of the following four types and can be interconverted at any time.

  • Real
  • Integer
  • String
  • List (list of any mix of these four types including other lists, thus can be nested.)

To handle spatial algorithms more efficiently, adding Array type to this list of types is proposed.

Array is an ordered set of values of the same type, either Real or Integer. Array can be multi-dimensional in the way same as Python numpy array.

Property typing system should be designed and implemented in the way that is interoperative with the frontend language (Python)'s data structure including its numeric array support (numpy).

Property Slot Info

PropertySlotInfo needs to be reimplemented.

Other proposed architectural changes

See Takahashi PhD thesis Chapter 4 for implementation of E-Cell 3, on which discussions in this section are based.

Simulator architecture

E-Cell 3 implementation

Frontend / scripts
-------------------------
Python Interpreter
-------------------------
PyECS (functional)
-------------------------
E-Cell Microcore (LibEMC)
-------------------------
     LibECS

Proposed changes

  • Remake PyECS's flat, functional API to an object-oriented one that simply wraps object classes in LibECS.
  • Get rid of LibEMC layer.
  • Move the Simulator class from LibEMC to LibECS.

Proposed E-Cell 4 architecture

Frontend / scripts
-------------------------
Python Interpreter
-------------------------
PyECS (OO)
-------------------------
     LibECS
-------------------------

Rationale -- Removing LibEMC

E-Cell Microcore (EMC) architecture was originally devised to make E-Cell portable in distributed environments. LibEMC hides object-oriented details implemented using many classes in libecs with a flat functional API (facade design pattern). LibEMC then isolates interface (Simulator class) from implementation (SimulatorImplementation class) by applying bridge design pattern. Allowing subclassing of each side, LibEMC permits any combination of backend / frontend configurations to be used. For example, a running instance of the simulator on a remote computation node may be instantiated as a LocalSimulatorImplementation, a subclass of SimulatorImplementation, attached to a CORBASimulatorSkel, that is a sub-class of Simulator, that works as a 'skeleton' for the CORBA environment. This remote simulator instance may be used from another LibEMC Simulator object attached to another subclass of SimulatorImplementation, CORBASimulatorImplementation which is a 'stub' that communicates with the CORBASimulatorSkel.

An alternative approach is to implement this feature as part of the frontend. We did not take this design in 2000 for some reasons; (1) we were sure but not 100% so about Python language's long-term prospective, (2) we were not sure about the language's performance in terms of its use as a distributed communication middleware.

Now we think this frontend approach works better than the LibEMC approach. (1) Functional API approach in LibEMC is not quite compatible with the object-oriented API approach (see next) (2) ECellSessionManager framework was developed successfully to offer essentially the same functionality at the frontend layer. (3) There has not been any use of LibEMC for the purpose it is designed for. (4) In 2007 we are sufficiently confident about use of Python as a reliable frontend language and its performance. (5) We learned from experience in developing E-Cell 3 that simplifying C++ portions of the system increases maintainability of the code.

Rationale -- Object-oriented Programming Interface

In E-Cell3, LibEMC Simulator/SimulatorImplementation classes are used as a 'facade' to provide a flat and compact API to the object-oriented simulation engine implemented in LibECS. Object-oriented appearance is reclaimed at the Python layer by the use of 'Stub' objects.

One drawback of this is that it is not straightforward to keep track of backend objects (such as Processes, Variables and Loggers). It will be particularly so if one of the design goals of the E-Cell 4 project is dynamic model structure in which objects are dynamically created and deleted during simulation.

The advent of the boost-python library that enables nearly seamless interoperability between C++ and Python offers another possibility of exposing (part of) the object-oriented design in LibECS to the frontend layer as-is.

Logging

There will no longer be Logger objects in the C++ backend as in ecell3, but all logging tasks will be performed in the Python layer. The frontend code registers some callback functions which are called when the simulator changed the value of specified variables.

Accompanying to this, the following libecs classes will be obsoleted

  • Logger
  • PhysicalLogger
  • LoggerAdapter
  • LoggerBroker
  • DataPointVector
  • VVector

At least the following classes needs to be modified to remove the old logging mechanism

  • PropertiedClass
  • PropertySlotProxy
  • Model

However, some part of it may be reused to implement proposed callback-based logging mechanism.

Other proposed changes

Coding style

New E-Cell coding standard is available here.

C++ Type Declarations

In E-Cell 3, use of type suffixes to indicate type modifiers was mandatory.

  • TYPEPtr

    Pointer type. (== TYPE*)

  • TYPECptr

    Const pointer type. (== const TYPE*)

  • TYPERef

    Reference type. (== TYPE&)

  • TYPECref

    Const reference type. (== const TYPE&)

Some types defined Param suffixes, such as RealParam and IntegerParam as well.

Although these provided some cognitive support for programmers, such benefits were not evident enough to justify increased amount of coding and maintenance cost that accompanies with it (mainly macros involving typedefs).

In E-Cell 4, these type suffixes are deprecated and normal C++ notations are used (like TYPE* or const TYPE&) for better interoperability with third-party codes.

In relation to this, macros that declares iterator types (such as RealVectorIterator or RealListConstIterator) for standard containers will also be deprecated.

The following macros are deprecated;

  • DECLARE_TYPE

  • DECLARE_CLASS

  • DECLARE_SHAREDPTR

  • DECLARE_*

    where * is one of LIST, VECTOR, SET, MULTISET, MAP, MULTIMAP, QUEUE, ASSOCVECTOR, ASSOCVECTOR_TEMPLATE.

Use standard typedef and typename instead.

C++ naming conventions

Changes include;

  • Deprecate a and the prefix in variable/member names that were mandatory.
  • Instead, mandate this-> when referring to member variables in member functions.

In this way it is possible to clearly distinguish local and member variables without a and the. (Only drawback is that GCC doesn't have a compiler option to really mandate this->.)

Other

  • Time type is deprecated. Use Real instead.

License

E-Cell 4 will be released under original GPL v2.

E-Cell 3 adopted a modified version of GNU General Public License version 2. COPYRIGHT file distributed with E-Cell 3 started with the following paragraphs.

This software is licensed under GPL, in which subclasses are
considered 'derived works'.

As a special exception, dynamically loadable subclasses of classes
defined in libecs, which are developed as part of simulation models or
algorithm modules, are not covered by the license even if resulting
binary includes code from libecs by inlining, template instantiation,
and subclassing. Modifications and additions which are included in
libecs itself are coverd by GPL.

This modification was useful for some users who do not like the viral nature of GPL to affect their model. Early versions of E-Cell required subclassing of Process to implement a new reaction rate equation, for example. Now that it is possible to describe such ordinary details of the model as part of model files, this exception to GPL is no longer very meaningful.

(Perhaps too early to think about GPL v3 when it is still a draft and v2 is working perfectly for us.)