Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bottleneck in 072 and 074 fuzzers

See original GitHub issue

The data dumping process for fuzzers 072 and 074 is taking a huge part in the run-time, expecially for big parts (e.g. artix 200T).

For what regards fuzzer 074, the run-time to get the data is divided in tiles and nodes:

Vivado start time:
10:48:25
Tiles Job start time:
10:48:34
Tiles Job end time and Nodes Job start time:
11:32:58
Nodes Job end time:
11:36:14
Vivado end time and reduction start time:
11:36:14

The above is related to the zynq7010 part.

This is an issue, as it prevents scaling on bigger parts. There is the need to find a more optimal solution to dump all data necessary for the reduction step.

Issue Analytics

State:
Created 4 years ago
Comments:9 (9 by maintainers)

Top GitHub Comments

1reaction

marzoulcommented, Apr 24, 2022

Hi, the last comments may be a bit old, but the issue is still real 😉 Especially for 074, I have not looked at results of 072. This is when experimenting with virtex-7, chip 330T, the smallest one of that series.

Disk usage of 074 is 82 GB, and this is nearly exclusively the 174k very tiny json5 files. I did an experiment : concatenate all these, and compress with lz4 with fastest compression => result is one 4 GB file (to be compared to 40+ GB of file contents and 82 GB of actual disk usage). So a reduction of 40x. Given the very low CPU usage during most of 074 (1-3%, peaks at 8% of one CPU), I think that one of the issues at least is access to disk. Yes I have spinning HDD so this is exacerbated, but at least the issue is revealed 😉

Looking casually into the python code, it looks like these json files are accessed by bulk with processing interleaved, so it could make sense also for CPU, to have this packed+compressed storage. Everything would fit cached in RAM, too 😃 Perhaps use compressed files per-type of FPGA element (slice, SDP, PIP, etc) in case it better fits how the code accesses it, no problem.

Other issue for scalability, I monitored RAM usage => result is up to 66.5 GB of virtual memory. To my eyes, given the raw amount of FPGA elements and configuration bits, this is excessive. Casually looking into the python code again, I think that the issue is in implementation of database representation in the python code. Don’t hesitate to tell if I’m wrong - but it looks like generic maps indexed by strings are super nasty in python for RAM usage (and speed too of course, indirectly). A conversion of these computations to C++ could be appropriate. Of course, to consider only after evaluation of packed+compressed disk storage).

EDIT : The fuzzer 074 took 20 days to finish xD There was a bit of swapping involved, hence my focus on 074.

What do you think of these observations ?

0reactions

litghostcommented, Jan 28, 2020

@litghost That was the right call, run-time is now ~13 minutes for the tiles job

Ok, so rather than writing out the full timing info, just write the speed index. Then merge all the tile jsons (e.g. merging the speed index), then create a tcl script to back annotate the speed indices with the timing data originally dumps from the tcl script.

Top Results From Across the Web

JIGSAW: Efficient and Scalable Path Constraints Fuzzing

We also developed several optimization techniques to eliminate major bottlenecks during this process. Evaluation of our prototype JIGSAW shows that our approach.

FUZZIFICATION: Anti-Fuzzing Techniques - USENIX

Fuzzing is a software testing technique that aims to find soft- ware bugs automatically. It keeps running the program with randomly generated ...

Effective file format fuzzing - Black Hat

Each iteration may take much less than 1ms, potentially enabling huge iterations/s ratios. • In these cases, the out-of-process mode becomes a major...

Refined Grey-Box Fuzzing with Sivo - arXiv

A seed may have tens of KBs, and there may be thousands of seeds, therefore the full inference may quickly become a major...

LibAFL: A Framework to Build Modular and Reusable Fuzzers

the user to inspect the state of the fuzzer but introduce a bottleneck on disk operations. Most mainstream fuzzers [14, 47, 76] store...