unison/doc/manual/distributed.texi

@node Distributed
@chapter Distributed Simulation with MPI
@anchor{chap:Distributed}

@menu
* Current Implementation Details::
* Running Distributed Simulations::
* Tracing During Distributed Simulations::
@end menu

Parallel and distributed discrete event simulation allows the execution of a
single simulation program on multiple processors. By splitting up the
simulation into logical processes, LPs, each LP can be executed by a different
processor. This simulation methodology enables very large-scale simulations by
leveraging increased processing power and memory availability. In order to
ensure proper execution of a distributed simulation, message passing between
LPs is required. To support distributed simulation in ns-3, the standard
Message Passing Interface (MPI) is used, along with a new distributed simulator
class. Currently, dividing a simulation for distributed purposes in ns-3 can
only occur across point-to-point links.

@node Current Implementation Details
@section Current Implementation Details
During the course of a distributed simulation, many packets must cross
simulator boundaries. In other words, a packet that originated on one LP
is destined for a different LP, and in order to make this transition, a message
containing the packet contents must be sent to the remote LP.  Upon receiving
this message, the remote LP can rebuild the packet and proceed as normal. The
process of sending an receiving messages between LPs is handled easily by the
new MPI interface in ns-3.

Along with simple message passing between LPs, a distributed simulator is used
on each LP to determine which events to process. It is important to process
events in time-stamped order to ensure proper simulation execution. If a
LP receives a message containing an event from the past, clearly this is an
issue, since this event could change other events which have already been
executed. To address this problem, a conservative synchronization algorithm with
lookahead is used in ns-3. For more information on different synchronization
approaches and parallel and distributed simulation in general, please refer to
"Parallel and Distributed Simulation Systems" by Richard Fujimoto.

@subsection Remote point-to-point links
As described in the introduction, dividing a simulation for distributed purposes
in ns-3 currently can only occur across point-to-point links; therefore, the idea
of remote point-to-point links is very important for distributed simulation in ns-3.
When a point-to-point link is installed, connecting two nodes, the point-to-point
helper checks the system id, or rank, of both nodes.  The rank should be assigned
during node creation for distributed simulation and is intended to signify on which
LP a node belongs.  If the two nodes are on the same rank, a regular point-to-point
link is created. If, however, the two nodes are on different ranks, then these nodes
are intended for different LPs, and a remote point-to-point link is used. If a packet
is to be sent across a remote point-to-point link, MPI is used to send the message to
the remote LP.

@subsection Distributing the topology
Currently, the full topology is created on each rank, regardless of the individual node
system ids.  Only the applications are specific to a rank.  For example, consider
node 1 on LP 1 and node 2 on LP 2, with a traffic generator on node 1. Both node
1 and node 2 will be created on both LP1 and LP2; however, the traffic generator
will only be installed on LP1.  While this is not optimal for memory efficiency, it
does simplify routing, since all current routing implementations in ns-3 will work
with distributed simulation.

@node Running Distributed Simulations
@section Running Distributed Simulations

@subsection Prerequisites
Ensure that MPI is installed, as well as mpic++. In Ubuntu repositories,
these are openmpi-bin, openmpi-common, openmpi-doc, libopenmpi-dev. In
Fedora, these are openmpi and openmpi-devel.

Note:
There is a conflict on some Fedora systems between libotf and openmpi. A
possible "quick-fix" is to yum remove libotf before installing openmpi.
This will remove conflict, but it will also remove emacs. Alternatively,
these steps could be followed to resolve the conflict:

@verbatim
1) Rename the tiny otfdump which emacs says it needs:

     mv /usr/bin/otfdump /usr/bin/otfdump.emacs-version

2) Manually resolve openmpi dependencies:

     sudo yum install libgfortran libtorque numactl

3) Download rpm packages:

     openmpi-1.3.1-1.fc11.i586.rpm
     openmpi-devel-1.3.1-1.fc11.i586.rpm
     openmpi-libs-1.3.1-1.fc11.i586.rpm
     openmpi-vt-1.3.1-1.fc11.i586.rpm

     from

     http://mirrors.kernel.org/fedora/releases/11/Everything/i386/os/Packages/

4) Force the packages in:

     sudo rpm -ivh --force openmpi-1.3.1-1.fc11.i586.rpm
     openmpi-libs-1.3.1-1.fc11.i586.rpm openmpi-devel-1.3.1-1.fc11.i586.rpm
     openmpi-vt-1.3.1-1.fc11.i586.rpm

@end verbatim

Also, it may be necessary to add the openmpi bin directory to PATH in order to
execute mpic++ and mpirun from the command line.  Alternatively, the full path
to these executables can be used.  Finally, if openmpi complains about the
inability to open shared libraries, such as libmpi_cxx.so.0, it may be
necessary to add the openmpi lib directory to LD_LIBRARY_PATH.

@subsection Building and Running Examples
If you already built ns-3 without MPI enabled, you must re-build:
@verbatim
./waf distclean
@end verbatim

Configure ns-3 with the --enable-mpi option:
@verbatim
./waf -d debug configure --enable-mpi
@end verbatim

Ensure that MPI is enabled by checking the optional features shown from the
output of configure.

Next, build ns-3:
@verbatim
./waf
@end verbatim

After building ns-3 with mpi enabled, the example programs are now ready to
run with mpirun.  Here are a few examples (from the root ns-3 directory):

@verbatim
mpirun -np 2 ./waf --run simple-distributed
mpirun -np 4 -machinefile mpihosts ./waf --run 'nms-udp-nix --LAN=2 --CN=4 --nix=1 --tracing=0'
@end verbatim

The np switch is the number of logical processors to use. The
machinefile switch is which machines to use.  In order to use machinefile,
the target file must exist (in this case mpihosts). This can simply contain
something like:

@verbatim
localhost
localhost
localhost
...
@end verbatim

Or if you have a cluster of machines, you can name them.

** NOTE: Some users have experienced issues using mpirun and waf together.
An alternative way to run distributed examples is shown below:

@verbatim
./waf shell
cd build/debug
mpirun -np 2 examples/mpi/simple-distributed
@end verbatim

@subsection Creating custom topologies
The example programs in examples/mpi give a good idea of how to create
different topologies for distributed simulation. The main points are
assigning system ids to individual nodes, creating point-to-point
links where the simulation should be divided, and installing
applications only on the LP associated with the target node.

Assigning system ids to nodes is simple and can be handled two different
ways. First, a NodeContainer can be used to create the nodes and assign
system ids:

@verbatim
NodeContainer nodes;
nodes.Create (5, 1); // Creates 5 nodes with system id 1.
@end verbatim

Alternatively, nodes can be created individually, assigned system ids, and
added to a NodeContainer. This is useful if a NodeContainer holds nodes with
different system ids:

@verbatim
NodeContainer nodes;
Ptr<Node> node1 = CreateObject<Node> (0); // Create node1 with system id 0
Ptr<Node> node2 = CreateObject<Node> (1); // Create node2 with system id 1
nodes.Add (node1);
nodes.Add (node2);
@end verbatim

Next, where the simulation is divided is determined by the placement of
point-to-point links.  If a point-to-point link is created between two
nodes with different system ids, a remote point-to-point link is created,
as described in @ref{Current Implementation Details}.

Finally, installing applications only on the LP associated with the target
node is very important. For example, if a traffic generator is to be placed
on node 0, which is on LP0, only LP0 should install this application.
This is easily accomplished by first checking the simulator system id, and
ensuring that it matches the system id of the target node before installing
the application.

@node Tracing During Distributed Simulations
@section Tracing During Distributed Simulations
Depending on the system id (rank) of the simulator, the information traced
will be different, since traffic originating on one simulator is not seen
by another simulator until it reaches nodes specific to that simulator.  The
easiest way to keep track of different traces is to just name the trace files
or pcaps differently, based on the system id of the simulator.  For example,
something like this should work well, assuming all of these local variables
were previously defined:

@verbatim
if (MpiInterface::GetSystemId () == 0)
  {
    pointToPoint.EnablePcapAll ("distributed-rank0");
    phy.EnablePcap ("distributed-rank0", apDevices.Get (0));
    csma.EnablePcap ("distributed-rank0", csmaDevices.Get (0), true);
  }
else if (MpiInterface::GetSystemId () == 1)
  {
    pointToPoint.EnablePcapAll ("distributed-rank1");
    phy.EnablePcap ("distributed-rank1", apDevices.Get (0));
    csma.EnablePcap ("distributed-rank1", csmaDevices.Get (0), true);
  }
@end verbatim