With fine-grained partition and load-adaptive scheduling, Unison allows users to easily simulate models with multithreaded parallelization without further configurations.
Meanwhile, cache misses are reduced by fine-grained partition, and the mutual waiting time among threads is minimized by load-adaptive scheduling, resulting in efficient parallelization.
The simulation should finish in 4-5 minutes for `dctcp-example` and 1-2 minutes for `dctcp-example-mtp`, depending on your hardware and your build profile.
The output in `*.dat` should be in accordance with the comments in the source file.
If you are interested in using it to simulate topologies like fat-tree, BCube and 2D-torus, please refer to [Running Evaluations](#running-evaluations).
It turns out that to bring Unison to the existing model code, all you need to do is to include the `ns3/mtp-interface.h` header file and add the following line at the beginning of the `main` function:
If it is omitted, the number of threads is automatically chosen and will not exceed the maximum number of available hardware threads on your system.
If you want to enable Unison for distributed simulation on existing MPI programs for further speedup, place the above line before MPI initialization and do not explicitly specify the simulator implementation in your code.
For such hybrid simulation with MPI, the `--enable-mpi` option is also required when configuring ns-3.
You don't need to consider these issues on your own for most of the time, except if you have custom global statistics other than the built-in flow-monitor.
To evaluate Unison, please switch to [unison-evaluations](https://github.com/NASA-NJU/Unison-for-ns-3/tree/unison-evaluations) branch, which is based on ns-3.36.1.
This module contains three parts: A parallel simulator implementation `multithreaded-simulator-impl`, an interface to users `mtp-interface`, and `logical-process` to represent LPs in terms of parallel simulation.
All LPs and threads are stored in the `mtp-interface`.
It controls the simulation progress, schedules LPs to threads and manages the lifecycles of LPs and threads.
The interface also provides some methods and options for users to tweak the simulation.
Each LP's logic is implemented in `logical-process`. It contains most of the methods of the default sequential simulator plus some auxiliary methods for parallel simulation.
The simulator implementation `multithreaded-simulator-impl` is a derived class from the base simulator.
It converts calls to the base simulator into calls to logical processes based on the context of the current thread.
It also provides a partition method for automatic fine-grained topology partition.
This simulator uses both `mtp-interface` and `mpi-interface` to coordinate local LPs and global MPI communications.
We also modified the module to make it locally thread-safe.
### 2. Modifications to ns-3 Architecture
In addition to the `mtp` and `mpi` modules, we also modified the following part of the ns-3 architecture to make it thread-safe, also with some bug fixing for ns-3.
The reason behind Unison's fast speed is that it divides the network into multiple logical processes (LPs) with fine granularity and schedules them dynamically.
If you find the code useful, please consider citing [our paper](https://dl.acm.org/doi/10.1145/3627703.3629574).
```bibtex
@inproceedings{10.1145/3627703.3629574,
author = {Bai, Songyuan and Zheng, Hao and Tian, Chen and Wang, Xiaoliang and Liu, Chang and Jin, Xin and Xiao, Fu and Xiang, Qiao and Dou, Wanchun and Chen, Guihai},
title = {Unison: A Parallel-Efficient and User-Transparent Network Simulation Kernel},
year = {2024},
isbn = {9798400704376},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3627703.3629574},
doi = {10.1145/3627703.3629574},
abstract = {Discrete-event simulation (DES) is a prevalent tool for evaluating network designs. Although DES offers full fidelity and generality, its slow performance limits its application. To speed up DES, many network simulators employ parallel discrete-event simulation (PDES). However, adapting existing network simulation models to PDES requires complex reconfigurations and often yields limited performance improvement. In this paper, we address this gap by proposing a parallel-efficient and user-transparent network simulation kernel, Unison, that adopts fine-grained partition and load-adaptive scheduling optimized for network scenarios. We prototype Unison based on ns-3. Existing network simulation models of ns-3 can be seamlessly transitioned to Unison. Testbed experiments on commodity servers demonstrate that Unison can achieve a 40\texttimes{} speedup over DES using 24 CPU cores, and a 10\texttimes{} speedup compared with existing PDES algorithms under the same CPU cores.},
booktitle = {Proceedings of the Nineteenth European Conference on Computer Systems},
pages = {115–131},
numpages = {17},
keywords = {Data center networks, Network simulation, Parallel discrete-event simulation},