NullPointerException with multiple network interfaces and NAT
See original GitHub issueI am getting the following NullPointerException in ice4j when running the Jitsi stable-4627-1 Docker stack. As a result no communication through the video bridge is possible. I’ve verified that the same happens when replacing the bundled ice4j library with the latest ice4j version (commit 7016902b218fb354a008482a1b0584055c4b9612).
SEVERE: [54] org.ice4j.stack.NetAccessManager.handleFatalError: Unexpected Error!
java.lang.NullPointerException
at org.ice4j.socket.MergingDatagramSocket.initializeActive(MergingDatagramSocket.java:577)
at org.ice4j.ice.ComponentSocket.propertyChange(ComponentSocket.java:174)
at org.ice4j.ice.IceMediaStream.firePairPropertyChange(IceMediaStream.java:870)
at org.ice4j.ice.CandidatePair.nominate(CandidatePair.java:629)
at org.ice4j.ice.Agent.nominate(Agent.java:1847)
at org.ice4j.ice.DefaultNominator.strategyNominateFirstValid(DefaultNominator.java:144)
at org.ice4j.ice.DefaultNominator.propertyChange(DefaultNominator.java:120)
at org.ice4j.ice.IceMediaStream.firePairPropertyChange(IceMediaStream.java:870)
at org.ice4j.ice.CandidatePair.validate(CandidatePair.java:667)
at org.ice4j.ice.IceMediaStream.addToValidList(IceMediaStream.java:668)
at org.ice4j.ice.Agent.validatePair(Agent.java:1811)
at org.ice4j.ice.ConnectivityCheckClient.processSuccessResponse(ConnectivityCheckClient.java:638)
at org.ice4j.ice.ConnectivityCheckClient.processResponse(ConnectivityCheckClient.java:405)
at org.ice4j.stack.StunClientTransaction.handleResponse(StunClientTransaction.java:314)
at org.ice4j.stack.StunStack.handleMessageEvent(StunStack.java:1040)
at org.ice4j.stack.MessageProcessingTask.run(MessageProcessingTask.java:196)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Running the jvb Docker container means it runs behind a NAT and it has two network interfaces:
- bridge: where it can communicate with the internet
- meet.jitsi: internal network where it communicates with the other Jitsi components
In this setup I found one configuration where the exception does not happen:
- Disable
org.ice4j.ice.harvest.STUN_MAPPING_HARVESTER_ADDRESSES
- Set
org.ice4j.ice.harvest.NAT_HARVESTER_PUBLIC_ADDRESS
to the public IP - Set
org.ice4j.ice.harvest.NAT_HARVESTER_LOCAL_ADDRESS
to the internal IP of the bridge interface
Notably the exception still occurs when NAT_HARVESTER_LOCAL_ADDRESS
is set to the IP of the meet.jitsi interface.
I’ve done some debugging with the STUN server enabled and NAT_HARVESTER_PUBLIC_ADDRESS
/ NAT_HARVESTER_LOCAL_ADDRESS
unset.
Before the exception occurs, I noticed the following in ConnectivityCheckClient::processSuccessResponse():
-
checkedPair
contains the correct local IPCandidatePair (State=In-Progress Priority=7961835276064522239): LocalCandidate=candidate:2 1 udp 2130706431 <local IP of bridge interface> 10000 typ host RemoteCandidate=candidate:10000 1 udp 1853759231 <CLIENT-IP> 9952 typ prflx
-
validPair
contains the IP of the wrong (meet.jitsi) interfaceCandidatePair (State=Frozen Priority=7205771497833250302): LocalCandidate=candidate:3 1 udp 1677724415 <PUBLIC-IP> 10000 typ srflx raddr <local IP of meet.jitsi interface> rport 10000 RemoteCandidate=candidate:10000 1 udp 1853759231 <CLIENT-IP> 9952 typ prflx
Later in the stack trace ComponentSocket::propertyChange()
calls localCandidate.getCandidateIceSocketWrapper(remoteAddress)
which is SinglePortUdpHarvester$MyCandidate::getCandidateIceSocketWrapper()
.
The candidateSockets
in that object is empty thus it returns null. This eventually leads to the NullPointerException.
Please let me know if I can help in any way to further debug this.
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:20 (10 by maintainers)
Top GitHub Comments
Correct, with the PR applied I can only reproduce the NPE when ice4j is misconfigured.
The PR will also need https://github.com/jitsi/ice4j/pull/207 applied first, to remove the (unused) function that’s giving the ambiguous reference.
The NPE now only happens if you explicitly lie to the config, i.e. tell it the local address is one that can’t actually be used as the source of media, right? If so, I’m not too worried about that causing problems.