build, doc: add options to run with Heaptrack, Memray or Perf profilers

This commit is contained in:
Gabriel Ferreira
2022-10-14 12:59:30 -03:00
parent 6e8642d294
commit 642ffe8361
2 changed files with 172 additions and 14 deletions

View File

@@ -225,13 +225,13 @@ If you prefer to use the ``ns3`` wrapper, try:
.. sourcecode:: console
~ns-3-dev/$ ./ns3 run "wifi-he-network --simulationTime=0.3 --frequency=5 --useRts=1 --minExpectedThroughput=6 --maxExpectedThroughput=745" --command-template "heaptrack %s" --no-build
~ns-3-dev/$ ./ns3 run "wifi-he-network --simulationTime=0.3 --frequency=5 --useRts=1 --minExpectedThroughput=6 --maxExpectedThroughput=745" --heaptrack --no-build
In both cases, heaptrack will print to the terminal the output file:
.. sourcecode:: console
~ns-3-dev/$ ./ns3 run "wifi-he-network --simulationTime=0.3 --frequency=5 --useRts=1 --minExpectedThroughput=6 --maxExpectedThroughput=745" --command-template "heaptrack %s" --no-build
~ns-3-dev/$ ./ns3 run "wifi-he-network --simulationTime=0.3 --frequency=5 --useRts=1 --minExpectedThroughput=6 --maxExpectedThroughput=745" --heaptrack --no-build
heaptrack output will be written to "~ns-3-dev/heaptrack.ns3-dev-wifi-he-network.210305.zst"
starting application, this might take some time...
MCS value Channel width GI Throughput
@@ -388,6 +388,79 @@ were removed, which translates to a 20% reduction. This resulted in a 1.07x spee
test suite with Valgrind (``./test.py -d -g``) and 1.02x speedup without it.
Memray
++++++
.. _Memray : https://bloomberg.github.io/memray/
`Memray`_ is an utility made by Bloomberg to trace memory allocations of Python programs,
including native code called by them. Along with stack traces, developers can trace down
possible memory leaks and unnecessary allocations.
Note: Memray is ineffective for profiling the ns-3 python bindings since Cppyy hides away
the calls to the ns-3 module libraries. However, it is still useful for python scripts
in general, for example ones used to parse and consolidate simulation results.
The ``ns3`` script includes a run option to launch Python programs with Memray.
Memray can produce different types of reports, such as a flamegraph in HTML, or
text reports (``summary`` and ``stats``).
.. sourcecode:: console
~/ns-3-dev/$ ./ns3 run sample-rng-plot.py --memray
Writing profile results into memray.output
Memray WARNING: Correcting symbol for aligned_alloc from 0x7fd97023c890 to 0x7fd97102fce0
[memray] Successfully generated profile results.
You can now generate reports from the stored allocation records.
Some example commands to generate reports:
/usr/bin/python3 -m memray flamegraph memray.output
~/ns-3-dev$ /usr/bin/python3 -m memray stats memray.output
Total allocations:
5364235
Total memory allocated:
10.748GB
Histogram of allocation size:
min: 0.000B
----------------------------------------------
< 8.000B : 264149 |||
< 78.000B : 2051906 |||||||||||||||||||||||
< 699.000B : 2270941 |||||||||||||||||||||||||
< 6.064KB : 608993 |||||||
< 53.836KB : 165307 ||
< 477.912KB: 2220 |
< 4.143MB : 511 |
< 36.779MB : 188 |
< 326.492MB: 19 |
<=2.830GB : 1 |
----------------------------------------------
max: 2.830GB
Allocator type distribution:
MALLOC: 4647765
CALLOC: 435525
REALLOC: 277736
POSIX_MEMALIGN: 2686
MMAP: 523
Top 5 largest allocating locations (by size):
- include:/usr/local/lib/python3.10/dist-packages/cppyy/__init__.py:243 -> 8.814GB
- <stack trace unavailable> -> 746.999MB
- show:~/.local/lib/python3.10/site-packages/matplotlib/backends/backend_gtk4.py:340 -> 263.338MB
- load_library:/usr/local/lib/python3.10/dist-packages/cppyy/__init__.py:235 -> 245.684MB
- __init__:/usr/lib/python3.10/ctypes/__init__.py:374 -> 225.797MB
Top 5 largest allocating locations (by number of allocations):
- include:/usr/local/lib/python3.10/dist-packages/cppyy/__init__.py:243 -> 2246145
- show:~/.local/lib/python3.10/site-packages/matplotlib/backends/backend_gtk4.py:340 -> 1264614
- <stack trace unavailable> -> 1098543
- __init__:~/.local/lib/python3.10/site-packages/matplotlib/backends/backend_gtk4.py:61 -> 89466
- run:/usr/lib/python3/dist-packages/gi/overrides/Gio.py:42 -> 79582
Performance Profilers
*********************
@@ -458,6 +531,63 @@ to the ``perf.data`` output file.
~/ns-3-dev$ ./ns3 run "wifi-he-network --simulationTime=0.3 --frequency=5 --useRts=1 --minExpectedThroughput=6 --maxExpectedThroughput=745" --command-template "perf record -o ./perf.data --call-graph dwarf --event cycles,cache-misses,branch-misses --sample-cpu %s" --no-build
For ease of use, ``ns3`` also provides the ``--perf`` run option, that
include the recommended settings.
.. sourcecode:: console
~/ns-3-dev$ ./ns3 run "wifi-he-network --simulationTime=0.3 --frequency=5 --useRts=1 --minExpectedThroughput=6 --maxExpectedThroughput=745" --perf --no-build
When running for the first time, you may receive the following error:
.. sourcecode:: console
~/ns-3-dev$ ./ns3 run "wifi-he-network --simulationTime=0.3 --frequency=5 --useRts=1 --minExpectedThroughput=6 --maxExpectedThroughput=745" --perf --no-build
Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 1:
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
Command 'build/examples/wireless/ns3-dev-wifi-he-network-default record --call-graph dwarf -a -e cache-misses,branch-misses,cpu-cycles,instructions,context-switches build/examples/wireless/ns3-dev-wifi-he-network-default -n=100' returned non-zero exit status 255.
This error is related to lacking permissions to access performance events from the kernel and CPU.
As said in the error, permissions can be granted for the current session
by changing the ``perf_event_paranoid`` setting with ``echo 0 > /proc/sys/kernel/perf_event_paranoid``.
This change can be made permanent by changing the setting in ``/etc/sysctl.conf``, but
this is not recommended. Administrative permissions (``sudo su``) are required in both cases.
After the program finishes, it will print recording statistics.
.. sourcecode:: console
MCS value Channel width GI Throughput
0 20 MHz 3200 ns 6.01067 Mbit/s
0 20 MHz 1600 ns 5.936 Mbit/s
...
11 160 MHz 1600 ns 493.397 Mbit/s
11 160 MHz 800 ns 534.016 Mbit/s
[ perf record: Woken up 9529 times to write data ]
Warning:
Processed 517638 events and lost 94 chunks!
Check IO/CPU overload!
Warning:
1 out of order events recorded.
[ perf record: Captured and wrote 2898,307 MB perf.data (436509 samples) ]
Results saved in ``perf.data`` can be reviewed with the ``perf report`` command.
`Hotspot`_ is a GUI for Perf, that makes performance profiling more
enjoyable and productive. It can parse the ``perf.data`` and show in

52
ns3
View File

@@ -30,7 +30,7 @@ def exit_handler(dry_run):
return
if print_buffer == "":
return
print_buffer = print_buffer.replace('\\','/').replace('//','/').replace('/', os.sep)
print_buffer = print_buffer.replace('\\', '/').replace('//', '/').replace('/', os.sep)
if dry_run:
print("The following commands would be executed:")
elif run_verbose:
@@ -245,6 +245,15 @@ def parse_args(argv):
parser_run.add_argument('-g', '--valgrind',
help='Change the default command template to run programs with valgrind',
action="store_true", default=None)
parser_run.add_argument('--memray',
help='Use Memray memory profiler for Python scripts. Output will be saved to memray.output',
action="store_true", default=None)
parser_run.add_argument('--heaptrack',
help='Use Heaptrack memory profiler for C++',
action="store_true", default=None)
parser_run.add_argument('--perf',
help='Use Linux\'s perf to profile a program',
action="store_true", default=None)
parser_run.add_argument('--vis', '--visualize',
help='Modify --run arguments to enable the visualizer',
action="store_true", dest="visualize", default=None)
@@ -706,17 +715,16 @@ def get_program_shortcuts(build_profile, ns3_version):
# Add an additional shortcut with .exe suffix when running on Windows
if sys.platform == "win32":
ns3_program_map[shortcut_path.replace("\\","/")] = [program]
ns3_program_map[shortcut_path+".exe"] = [program]
ns3_program_map[shortcut_path.replace("\\","/")+".exe"] = [program]
ns3_program_map[shortcut_path.replace("\\", "/")] = [program]
ns3_program_map[shortcut_path + ".exe"] = [program]
ns3_program_map[shortcut_path.replace("\\", "/") + ".exe"] = [program]
if source_shortcut:
cc_shortcut_path = shortcut_path + ".cc"
ns3_program_map[cc_shortcut_path] = [program]
if sys.platform == "win32":
ns3_program_map[cc_shortcut_path] = [program]
ns3_program_map[cc_shortcut_path.replace("\\","/")] = [program]
ns3_program_map[cc_shortcut_path.replace("\\", "/")] = [program]
# Store longest shortcut path for collisions
if cc_shortcut_path not in longest_shortcut_map:
@@ -1008,6 +1016,7 @@ def check_program_installed(program_name: str) -> str:
exit(-1)
return program_path
def check_module_installed(module_name: str):
import importlib
try:
@@ -1016,6 +1025,7 @@ def check_module_installed(module_name: str):
print("Python module '{module}' was not found".format(module=module_name))
exit(-1)
def run_step(args, target_to_run, target_args):
libdir = "%s/lib" % out_dir
@@ -1046,13 +1056,23 @@ def run_step(args, target_to_run, target_args):
target_args = [target_to_run] + target_args
target_to_run = "python3"
# running with memray?
if args.memray:
check_module_installed("memray")
target_args = ["-m", "memray", "run", "-o", "memray.output", "--native"] + target_args
# running from ns-3-dev (ns3_path) or cwd
if args.cwd:
working_dir = args.cwd
# running with heaptrack?
if args.heaptrack:
debugging_software.append(check_program_installed("heaptrack"))
# running valgrind?
if args.valgrind:
debugging_software.extend([check_program_installed("valgrind"), "--leak-check=full", "--show-leak-kinds=all"])
debugging_software.extend(
[check_program_installed("valgrind"), "--leak-check=full", "--show-leak-kinds=all"])
# running gdb?
if args.gdb:
@@ -1065,6 +1085,14 @@ def run_step(args, target_to_run, target_args):
if args.lldb:
debugging_software.extend([check_program_installed("lldb"), "--"])
# running with perf?
if args.perf:
debugging_software.extend([
check_program_installed("perf"),
"record", "--call-graph", "dwarf", "-a", "-e",
"cache-misses,branch-misses,cpu-cycles,stalled-cycles-frontend,stalled-cycles-backend,context-switches"
])
# running with the visualizer?
if args.visualize:
target_args.append("--SimulatorImplementationType=ns3::VisualSimulatorImpl")
@@ -1099,10 +1127,10 @@ def run_step(args, target_to_run, target_args):
try:
subprocess.run(program_arguments, env=proc_env, cwd=working_dir, shell=use_shell, check=True)
except subprocess.CalledProcessError as e:
# Replace full path to binary to relative path
e.cmd[0] = os.path.relpath(target_to_run, ns3_path)
# Replace list of arguments with a single string
e.cmd = " ".join(e.cmd)
# Replace full path to binary to relative path
e.cmd = e.cmd.replace(os.path.abspath(target_to_run), os.path.relpath(target_to_run, ns3_path))
# Print error message and forward the return code
print(e)
exit(e.returncode)
@@ -1130,7 +1158,7 @@ def non_ambiguous_program_target_list(programs: dict) -> list:
def print_targets_list(ns3_modules: list, ns3_programs: dict) -> None:
def list_to_table(l: list) -> str:
def list_to_table(targets_list: list) -> str:
# Set column width and check how much is space is left at the end
columnwidth = 30
try:
@@ -1140,10 +1168,10 @@ def print_targets_list(ns3_modules: list, ns3_programs: dict) -> None:
dead_space = terminal_width % columnwidth
# Filter the targets with names longer than the column width
large_items = list(filter(lambda x: len(x) >= columnwidth, l))
large_items = list(filter(lambda x: len(x) >= columnwidth, targets_list))
# Then filter the targets with names shorter than the column width
small_items = sorted(list(set(l) - set(large_items)))
small_items = sorted(list(set(targets_list) - set(large_items)))
prev_new_line = 0
output = "\n"