Bottleneck in 072 and 074 fuzzers
See original GitHub issueThe data dumping process for fuzzers 072 and 074 is taking a huge part in the run-time, expecially for big parts (e.g. artix 200T).
For what regards fuzzer 074, the run-time to get the data is divided in tiles and nodes:
Vivado start time:
10:48:25
Tiles Job start time:
10:48:34
Tiles Job end time and Nodes Job start time:
11:32:58
Nodes Job end time:
11:36:14
Vivado end time and reduction start time:
11:36:14
The above is related to the zynq7010 part.
This is an issue, as it prevents scaling on bigger parts. There is the need to find a more optimal solution to dump all data necessary for the reduction step.
Issue Analytics
- State:
- Created 4 years ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
JIGSAW: Efficient and Scalable Path Constraints Fuzzing
We also developed several optimization techniques to eliminate major bottlenecks during this process. Evaluation of our prototype JIGSAW shows that our approach.
Read more >FUZZIFICATION: Anti-Fuzzing Techniques - USENIX
Fuzzing is a software testing technique that aims to find soft- ware bugs automatically. It keeps running the program with randomly generated ...
Read more >Effective file format fuzzing - Black Hat
Each iteration may take much less than 1ms, potentially enabling huge iterations/s ratios. • In these cases, the out-of-process mode becomes a major...
Read more >Refined Grey-Box Fuzzing with Sivo - arXiv
A seed may have tens of KBs, and there may be thousands of seeds, therefore the full inference may quickly become a major...
Read more >LibAFL: A Framework to Build Modular and Reusable Fuzzers
the user to inspect the state of the fuzzer but introduce a bottleneck on disk operations. Most mainstream fuzzers [14, 47, 76] store...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found

Hi, the last comments may be a bit old, but the issue is still real 😉 Especially for 074, I have not looked at results of 072. This is when experimenting with virtex-7, chip 330T, the smallest one of that series.
Disk usage of 074 is 82 GB, and this is nearly exclusively the 174k very tiny json5 files. I did an experiment : concatenate all these, and compress with lz4 with fastest compression => result is one 4 GB file (to be compared to 40+ GB of file contents and 82 GB of actual disk usage). So a reduction of 40x. Given the very low CPU usage during most of 074 (1-3%, peaks at 8% of one CPU), I think that one of the issues at least is access to disk. Yes I have spinning HDD so this is exacerbated, but at least the issue is revealed 😉
Looking casually into the python code, it looks like these json files are accessed by bulk with processing interleaved, so it could make sense also for CPU, to have this packed+compressed storage. Everything would fit cached in RAM, too 😃 Perhaps use compressed files per-type of FPGA element (slice, SDP, PIP, etc) in case it better fits how the code accesses it, no problem.
Other issue for scalability, I monitored RAM usage => result is up to 66.5 GB of virtual memory. To my eyes, given the raw amount of FPGA elements and configuration bits, this is excessive. Casually looking into the python code again, I think that the issue is in implementation of database representation in the python code. Don’t hesitate to tell if I’m wrong - but it looks like generic maps indexed by strings are super nasty in python for RAM usage (and speed too of course, indirectly). A conversion of these computations to C++ could be appropriate. Of course, to consider only after evaluation of packed+compressed disk storage).
EDIT : The fuzzer 074 took 20 days to finish xD There was a bit of swapping involved, hence my focus on 074.
What do you think of these observations ?
Ok, so rather than writing out the full timing info, just write the speed index. Then merge all the tile jsons (e.g. merging the speed index), then create a tcl script to back annotate the speed indices with the timing data originally dumps from the tcl script.