Speed accessing single resource
See original GitHub issueHi,
I’m really impressed by classgraph - classloading is tricky, and I’m glad to see a library like this make it easier 😃.
Is accessing a single non-class resource via classgraph an intended use-case? It’s clearly possible, but I’m finding that it is about an order of magnitude slower than loading directly from a ClassLoader
(on a variety of classpath sizes), and startup speed is in an important concern for my current situation. I’ve tried a variety of whitelisting, and disabling different types of scanning, but I’m not seeing the speed that I’m hoping to achieve.
private void processResourceClassloader(ClassLoader loader, String name) throws IOException {
final Enumeration<URL> urls = loader.getResources(name);
while (urls.hasMoreElements()) {
final URL url = urls.nextElement();
try (final InputStream in = url.openStream()) {
doFastOperation(in);
}
}
}
private void processResourceClassgraph(ClassLoader loader, String name) throws IOException {
try (final ScanResult scan = new ClassGraph()
.overrideClassLoaders(loader)
.disableNestedJarScanning()
.disableModuleScanning()
.whitelistClasspathElementsContainingResourcePath(name)
//.whitelistPaths(name)
.scan()) {
for (final Resource resource : scan.getResourcesWithPath(name)) {
try (final InputStream in = resource.open()) {
doFastOperation(in);
}
}
}
}
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
Will the single resource in try with resource statement be not ...
If a resource fails to initialize (that is, its initializer expression throws an exception), then all resources initialized so far by the try- ......
Read more >Operating System MCQ Part 2 (Multiple Choice Questions)
36) Which of the following method is used to prevent threads or processes from accessing a single resource? PCB; Semaphore; Job Scheduler; Non-Contiguous ......
Read more >Process Scheduling - Rutgers CS
This cache is slower than the high-speed per-core cache but still much faster than accessing main memory. Multiple processors in an SMP ...
Read more >What is Resource Contention? - TechTarget
Resource contention happens when demand exceeds supply for a certain resource. When multiple processes require the same resource, one process reaches the ...
Read more >Capacity of a Single Resource - Access Engineering
The book presents concepts and principles of operations management, with a strong emphasis on analytics and operations improvement. You will also get full ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’ve done a bunch of profiling / JVM optimizations over the years. JMH is a good tool for micro-benchmarking, and probably the best place to start, but it might not be the best for capturing startup speed behavior. Trying to profile application startup (or cold-path code) is a tougher problem - there are ways to attach JVM agents (https://github.com/jvm-profiling-tools/perf-map-agent is one example), or the java flight recorder, to get out timing data, which can be analyzed using a number of different methods (flame graphs are popular, but I don’t have too much experience w/ that style).
I’m happy to help dig in as timing permits. Cheers!
I did some extensive profiling of ClassGraph, and found a number of opportunities for optimization, mostly in how zipfile entries were being parsed (e.g. it was possible to defer parsing of modification date and permission information until the user requests that info for a
Resource
). ClassGraph is a bit faster now. (Released in 4.8.55.)However, unfortunately I don’t see any other major avenues for optimization. ClassGraph has to do additional work on startup that a regular classloader does not, for example starting up a thread pool, querying all classloaders using reflection to find classpath entries, and reading any directories, jarfile central directory entries, and modules on the classpath and module path for every scan performed (whereas classloaders can cache this information).
I tried to create an apples-to-apples comparison, by copying a jarfile 500x on disk (to prevent a Java classloader from cheating by caching the jarfile central directory info somewhere, either in the JRE, or the classloader, or the classloader’s parent). ClassGraph was called with
overrideClassLoaders(classLoader)
for a single target classLoader, which disables scanning of modules and other classloaders.The results indicate that ClassGraph 4.8.55 is about 4.5x slower than
URLClassLoader
at loading a single resource (this varies between 3.5-7x for large jars). I suspect the performance gap would shrink as you load more and more resources from the same jars, since the cost of reading the jarfile central directory can be amortized across all the resources read.Right now the biggest bottleneck in ClassGraph is a method that reads an unsigned short from a
byte[]
array using((arr[i + 1] & 0xff) << 8) | (arr[i] & 0xff)
, in order to read the jarfile central directory header for the jar. This takes almost 30% of the total scan time, and collectively all the central directory parsing code takes about half the total scan time. There’s no way to speed this up other than rewriting in native code, unfortunately. The JRE’s ownZipFile.getEntry()
does this work in native code, which is one of the main speed advantages. Unfortunately the JRE method can’t be used, because theZipFile
API uses a mutex around all API calls, and ClassGraph is parallelized.So I think this comes down to a question of whether you value ClassGraph’s flexibility (its ability to work with a wide range of different classloaders and classpath specification mechanisms), or the JRE’s own speed advantage, obtained by calling into native code.
However, if you are performing more than one scan, you can at least save some startup time in ClassGraph by providing your own
ExecutorService
to thescan
method. (Don’t forget to shut down theExecutorService
when you have finished with it.) More than 1 thread probably won’t help if you’re pulling a single resource from a single jarfile, but if you’re pulling multiple resources from multiple jarfiles, or additionally scanning classfiles, more threads should help. You can see this usage pattern in the benchmark code below.Unfortunately I think that’s about all that can be done for now unless native code is written to accelerate zipfile central directory parsing. Sorry about that!
You’re welcome to profile this further though if you want to see if anything else can be sped up for your specific usecase, and I’ll reopen this if you find more things that can be optimized.