question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Exclude Pulsar Functions Worker dependencies from Pulsar IO .nar files

See original GitHub issue

Is your enhancement request related to a problem? Please describe.

Currently the Pulsar IO .nar files are large in size. The total size of Pulsar IO files is 1952MB! Break down: https://gist.github.com/lhotari/810a543524e25457b521ac666913ad3c

Describe the solution you’d like

Exclude all Pulsar Functions Worker dependencies from Pulsar IO .nar files .

For example,

$ unzip -l ~/.m2/repository/org/apache/pulsar/pulsar-io-data-generator/2.8.0-SNAPSHOT/pulsar-io-data-generator-2.8.0-SNAPSHOT.nar |grep META-INF/bundled-dependencies | sort -k 4,4
        0  02-12-2021 07:04   META-INF/bundled-dependencies/
   183117  02-12-2021 07:04   META-INF/bundled-dependencies/aircompressor-0.16.jar
     4467  02-12-2021 07:04   META-INF/bundled-dependencies/aopalliance-1.0.jar
   449146  02-12-2021 07:04   META-INF/bundled-dependencies/async-http-client-2.12.1.jar
     9909  02-12-2021 07:04   META-INF/bundled-dependencies/async-http-client-netty-utils-2.12.1.jar
   566992  02-12-2021 07:04   META-INF/bundled-dependencies/avro-1.9.1.jar
    25683  02-12-2021 07:04   META-INF/bundled-dependencies/avro-protobuf-1.9.1.jar
   887800  02-12-2021 07:04   META-INF/bundled-dependencies/bcpkix-jdk15on-1.68.jar
  6031548  02-12-2021 07:04   META-INF/bundled-dependencies/bcprov-ext-jdk15on-1.68.jar
  5961178  02-12-2021 07:04   META-INF/bundled-dependencies/bcprov-jdk15on-1.68.jar
   146056  02-12-2021 07:04   META-INF/bundled-dependencies/bookkeeper-common-4.12.1.jar
    16852  02-12-2021 07:04   META-INF/bundled-dependencies/bookkeeper-common-allocator-4.12.1.jar
    19351  02-12-2021 07:04   META-INF/bundled-dependencies/bookkeeper-stats-api-4.12.1.jar
 11082557  02-12-2021 07:04   META-INF/bundled-dependencies/bouncy-castle-bc-2.8.0-SNAPSHOT-pkg.jar
   214381  02-12-2021 07:04   META-INF/bundled-dependencies/checker-qual-3.5.0.jar
    65366  02-12-2021 07:04   META-INF/bundled-dependencies/circe-checksum-4.12.1.jar
   284184  02-12-2021 07:04   META-INF/bundled-dependencies/commons-codec-1.10.jar
   615064  02-12-2021 07:04   META-INF/bundled-dependencies/commons-compress-1.19.jar
   362679  02-12-2021 07:04   META-INF/bundled-dependencies/commons-configuration-1.10.jar
   208700  02-12-2021 07:04   META-INF/bundled-dependencies/commons-io-2.5.jar
   284220  02-12-2021 07:04   META-INF/bundled-dependencies/commons-lang-2.6.jar
   494856  02-12-2021 07:04   META-INF/bundled-dependencies/commons-lang3-3.6.jar
    61829  02-12-2021 07:04   META-INF/bundled-dependencies/commons-logging-1.2.jar
  2213560  02-12-2021 07:04   META-INF/bundled-dependencies/commons-math3-3.6.1.jar
    23508  02-12-2021 07:04   META-INF/bundled-dependencies/cpu-affinity-4.12.1.jar
    13879  02-12-2021 07:04   META-INF/bundled-dependencies/error_prone_annotations-2.3.4.jar
     4617  02-12-2021 07:04   META-INF/bundled-dependencies/failureaccess-1.0.1.jar
   240255  02-12-2021 07:04   META-INF/bundled-dependencies/gson-2.8.6.jar
  2862361  02-12-2021 07:04   META-INF/bundled-dependencies/guava-30.1-jre.jar
   674028  02-12-2021 07:04   META-INF/bundled-dependencies/guice-4.1.0.jar
    42873  02-12-2021 07:04   META-INF/bundled-dependencies/guice-assistedinject-4.1.0.jar
    45012  02-12-2021 07:04   META-INF/bundled-dependencies/iban4j-3.2.1.jar
     8781  02-12-2021 07:04   META-INF/bundled-dependencies/j2objc-annotations-1.3.jar
    68167  02-12-2021 07:04   META-INF/bundled-dependencies/jackson-annotations-2.11.1.jar
   351575  02-12-2021 07:04   META-INF/bundled-dependencies/jackson-core-2.11.1.jar
  1419800  02-12-2021 07:04   META-INF/bundled-dependencies/jackson-databind-2.11.1.jar
    46983  02-12-2021 07:04   META-INF/bundled-dependencies/jackson-dataformat-yaml-2.11.1.jar
    79295  02-12-2021 07:04   META-INF/bundled-dependencies/jackson-module-jsonSchema-2.11.1.jar
   780265  02-12-2021 07:04   META-INF/bundled-dependencies/javassist-3.25.0-GA.jar
    78030  02-12-2021 07:04   META-INF/bundled-dependencies/javax.activation-1.2.0.jar
     2497  02-12-2021 07:04   META-INF/bundled-dependencies/javax.inject-1.jar
   127509  02-12-2021 07:04   META-INF/bundled-dependencies/javax.ws.rs-api-2.1.jar
     2254  02-12-2021 07:04   META-INF/bundled-dependencies/jcip-annotations-1.0.jar
   252020  02-12-2021 07:04   META-INF/bundled-dependencies/jctools-core-2.1.2.jar
   566323  02-12-2021 07:04   META-INF/bundled-dependencies/jetty-util-9.4.35.v20201120.jar
   273528  02-12-2021 07:04   META-INF/bundled-dependencies/jfairy-0.5.9.jar
   640724  02-12-2021 07:04   META-INF/bundled-dependencies/joda-time-2.10.1.jar
    19936  02-12-2021 07:04   META-INF/bundled-dependencies/jsr305-3.0.2.jar
     2199  02-12-2021 07:04   META-INF/bundled-dependencies/listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar
    24995  02-12-2021 07:04   META-INF/bundled-dependencies/memory-0.8.3.jar
   289921  02-12-2021 07:04   META-INF/bundled-dependencies/netty-buffer-4.1.51.Final.jar
   320174  02-12-2021 07:04   META-INF/bundled-dependencies/netty-codec-4.1.51.Final.jar
    61345  02-12-2021 07:04   META-INF/bundled-dependencies/netty-codec-dns-4.1.51.Final.jar
    36193  02-12-2021 07:04   META-INF/bundled-dependencies/netty-codec-haproxy-4.1.51.Final.jar
   617948  02-12-2021 07:04   META-INF/bundled-dependencies/netty-codec-http-4.1.51.Final.jar
   625057  02-12-2021 07:04   META-INF/bundled-dependencies/netty-common-4.1.51.Final.jar
   456702  02-12-2021 07:04   META-INF/bundled-dependencies/netty-handler-4.1.51.Final.jar
    21842  02-12-2021 07:04   META-INF/bundled-dependencies/netty-reactive-streams-2.0.4.jar
    33158  02-12-2021 07:04   META-INF/bundled-dependencies/netty-resolver-4.1.51.Final.jar
   151765  02-12-2021 07:04   META-INF/bundled-dependencies/netty-resolver-dns-4.1.51.Final.jar
  4017922  02-12-2021 07:04   META-INF/bundled-dependencies/netty-tcnative-boringssl-static-2.0.33.Final.jar
   473222  02-12-2021 07:04   META-INF/bundled-dependencies/netty-transport-4.1.51.Final.jar
   152317  02-12-2021 07:04   META-INF/bundled-dependencies/netty-transport-native-epoll-4.1.51.Final-linux-x86_64.jar
    33062  02-12-2021 07:04   META-INF/bundled-dependencies/netty-transport-native-unix-common-4.1.51.Final.jar
    56446  02-12-2021 07:04   META-INF/bundled-dependencies/netty-transport-native-unix-common-4.1.51.Final-linux-x86_64.jar
  1660960  02-12-2021 07:04   META-INF/bundled-dependencies/protobuf-java-3.11.4.jar
    73874  02-12-2021 07:04   META-INF/bundled-dependencies/protobuf-java-util-3.11.4.jar
    47021  02-12-2021 07:04   META-INF/bundled-dependencies/pulsar-client-admin-api-2.8.0-SNAPSHOT.jar
   141344  02-12-2021 07:04   META-INF/bundled-dependencies/pulsar-client-api-2.8.0-SNAPSHOT.jar
   657161  02-12-2021 07:04   META-INF/bundled-dependencies/pulsar-client-original-2.8.0-SNAPSHOT.jar
   877274  02-12-2021 07:04   META-INF/bundled-dependencies/pulsar-common-2.8.0-SNAPSHOT.jar
    38477  02-12-2021 07:04   META-INF/bundled-dependencies/pulsar-config-validation-2.8.0-SNAPSHOT.jar
    21681  02-12-2021 07:04   META-INF/bundled-dependencies/pulsar-functions-api-2.8.0-SNAPSHOT.jar
    23202  02-12-2021 07:04   META-INF/bundled-dependencies/pulsar-io-core-2.8.0-SNAPSHOT.jar
    28200  02-12-2021 07:04   META-INF/bundled-dependencies/pulsar-package-core-2.8.0-SNAPSHOT.jar
     9037  02-12-2021 07:04   META-INF/bundled-dependencies/pulsar-transaction-common-2.8.0-SNAPSHOT.jar
    11369  02-12-2021 07:04   META-INF/bundled-dependencies/reactive-streams-1.0.3.jar
   130999  02-12-2021 07:04   META-INF/bundled-dependencies/reflections-0.9.11.jar
   421509  02-12-2021 07:04   META-INF/bundled-dependencies/sketches-core-0.8.3.jar
    41203  02-12-2021 07:04   META-INF/bundled-dependencies/slf4j-api-1.7.25.jar
   284338  02-12-2021 07:04   META-INF/bundled-dependencies/snakeyaml-1.18.jar
    21782  02-12-2021 07:04   META-INF/bundled-dependencies/swagger-annotations-1.6.2.jar
    63777  02-12-2021 07:04   META-INF/bundled-dependencies/validation-api-1.1.0.Final.jar

pulsar-io-data-generator has a single unique dependency which is jfairy. This means that about 45MB of the dependencies are redundant in each pulsar-io .nar file.

These files won’t get used at all for classloading. It is safe to remove all dependencies that are part of Pulsar Functions Worker’s system classloader. The reason for this is that classloaders use parent-first lookups (by default, and also in Pulsar Functions Worker).

Additional context

Reducing the size of Pulsar IO .nar files would help reducing the pulsar-all Docker image size too. There will be benefits in the Pulsar (core) build, although PIP-62 covers moving Pulsar IO connectors from apache/pulsar repository to apache/pulsar-connectors .

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:7 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
freeznetcommented, Feb 18, 2021

@lhotari thanks for your detailed description. @sijie sure, i will create a pr to fix this issue.

0reactions
tisonkuncommented, Dec 7, 2022

Closed as stale and it seems resolved. I check the latest data generator nar is in size 11M.

Please open a new issue if it’s still relevant to the maintained versions.

Read more comments on GitHub >

github_iconTop Results From Across the Web

pulsar-client-admin-api has excessive transient dependencies
Exclude Pulsar Functions Worker dependencies from Pulsar IO .nar files #9572. Open. Reduce Pulsar IO Connectors size #9638.
Read more >
4 Pulsar Functions - Apache Pulsar in Action
In our previous chapter, we started to see how you can work with Pulsar using some of the ... Listing 4.2 Adding Pulsar...
Read more >
Get started with Pulsar Functions
This tutorial walks you through running a standalone Pulsar cluster on your machine, and then running your first Pulsar Function using that cluster....
Read more >
Mastering Apache Pulsar
Through detailed examples, you'll learn Pulsar's design principles, reliability guarantees, key APIs, and architecture details, including the ...
Read more >
Delta Lake Sink Connector for Apache Pulsar with miniO ...
connectors/pulsar-io-lakehouse-2.9.2.24.nar --tenant public --namespace default --name delta_sink --sink-config-file conf/deltalakesink.yml ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found