question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Nearest function not working for all cases on coffea 0.7.20

See original GitHub issue

Describe the bug Code that uses the nearest function with nanoEvents objects seems to have broken with the latest image. Code worked with earlier version of coffea (0.7.12)

To Reproduce Steps to reproduce the behavior:

  1. Here is a code snippet:
from coffea.nanoevents import NanoEventsFactory
import awkward as ak

import warnings
import numpy as np
warnings.filterwarnings("ignore", message="Found duplicate branch ")
warnings.filterwarnings("ignore", category=DeprecationWarning)
warnings.filterwarnings("ignore", message="Missing cross-reference index ")
warnings.filterwarnings("ignore", message="divide by zero encountered in log")
np.seterr(invalid="ignore")

events = NanoEventsFactory.from_root("root://cmsxrootd.fnal.gov//store/user/lpcpfnano/cmantill/v2_2/2017/HWW/GluGluHToWW_Pt-200ToInf_M-125_TuneCP5_MINLO_13TeV-powheg-pythia8/GluGluHToWW_Pt-200ToInf_M-125\
/220629_181652/0000/nano_mc2017_1-1.root").events()

higgs = events.GenPart[abs(events.GenPart.pdgId)==25 & events.GenPart.hasFlags(['fromHardProcess', 'isLastCopy'])]
higgs = higgs[ ak.all(abs(higgs.children.pdgId) == 24, axis=2) ]
jet = ak.firsts(events.FatJet)

# this works                                                                                                                                                                                                
print(events.FatJet.nearest(higgs))

# but this doesn't                                                                                                                                                                                          
print(jet.nearest(higgs))
print(jet.nearest(ak.firsts(higgs)))
  1. run on local multiprocessing
  2. See error:
Traceback (most recent call last):
  File "test.py", line 19, in <module>
    print(jet.nearest(ak.firsts(higgs)))
  File "/opt/conda/lib/python3.8/site-packages/coffea/nanoevents/methods/vector.py", line 730, in nearest
    mval, (a, b) = self.metric_table(other, axis, metric, return_combinations=True)
  File "/opt/conda/lib/python3.8/site-packages/coffea/nanoevents/methods/vector.py", line 694, in metric_table
    awkward.cartesian([self, other], axis=axis, nested=True)
  File "/opt/conda/lib/python3.8/site-packages/awkward/operations/structure.py", line 3562, in cartesian
    out = ak._util.broadcast_and_apply(
  File "/opt/conda/lib/python3.8/site-packages/awkward/_util.py", line 1172, in broadcast_and_apply
    out = apply(broadcast_pack(inputs, isscalar), 0, user)
  File "/opt/conda/lib/python3.8/site-packages/awkward/_util.py", line 925, in apply
    outcontent = apply(nextinputs, depth + 1, user)
  File "/opt/conda/lib/python3.8/site-packages/awkward/_util.py", line 881, in apply
    outcontent = apply(nextinputs, depth, user)
  File "/opt/conda/lib/python3.8/site-packages/awkward/_util.py", line 925, in apply
    outcontent = apply(nextinputs, depth + 1, user)
  File "/opt/conda/lib/python3.8/site-packages/awkward/_util.py", line 881, in apply
    outcontent = apply(nextinputs, depth, user)
  File "/opt/conda/lib/python3.8/site-packages/awkward/_util.py", line 1077, in apply
    raise ValueError(
ValueError: cannot broadcast records because keys don't match:
    DDX_jetNSecondaryVertices, DDX_jetNTracks, DDX_tau1_flightDistance2dSig, DDX_tau1_trackEtaRel_0, DDX_tau1_trackEtaRel_1, DDX_tau1_trackEtaRel_2, DDX_tau1_trackSip3dSig_0, DDX_tau1_trackSip3dSig_1, DDX_tau1_vertexDeltaR, DDX_tau1_vertexEnergyRatio, DDX_tau1_vertexMass, DDX_tau2_flightDistance2dSig, DDX_tau2_trackEtaRel_0, DDX_tau2_trackEtaRel_1, DDX_tau2_trackEtaRel_3, DDX_tau2_trackSip3dSig_0, DDX_tau2_trackSip3dSig_1, DDX_tau2_vertexEnergyRatio, DDX_tau2_vertexMass, DDX_trackSip2dSigAboveBottom_0, DDX_trackSip2dSigAboveBottom_1, DDX_trackSip2dSigAboveCharm, DDX_trackSip3dSig_0, DDX_trackSip3dSig_1, DDX_trackSip3dSig_2, DDX_trackSip3dSig_3, DDX_z_ratio, Proba, area, btagCSVV2, btagDDBvLV2, btagDDCvBV2, btagDDCvLV2, btagDeepB, btagDeepB_b, btagDeepB_bb, btagDeepL, btagHbb, deepTagMD_H4qvsQCD, deepTagMD_HbbvsQCD, deepTagMD_TvsQCD, deepTagMD_WvsQCD, deepTagMD_ZHbbvsQCD, deepTagMD_ZHccvsQCD, deepTagMD_ZbbvsQCD, deepTagMD_ZvsQCD, deepTagMD_bbvsLight, deepTagMD_ccvsLight, deepTag_H, deepTag_QCD, deepTag_QCDothers, deepTag_TvsQCD, deepTag_WvsQCD, deepTag_ZvsQCD, electronIdx3SJ, eta, genJetAK8Idx, genJetAK8IdxG, hadronFlavour, jetId, lsf3, mass, msoftdrop, muonIdx3SJ, n2b1, n3b1, nBHadrons, nCHadrons, nConstituents, pFCandsIdxG, particleNetMD_QCD, particleNetMD_Xbb, particleNetMD_Xcc, particleNetMD_Xqq, particleNet_H4qvsQCD, particleNet_HbbvsQCD, particleNet_HccvsQCD, particleNet_QCD, particleNet_TvsQCD, particleNet_WvsQCD, particleNet_ZvsQCD, particleNet_mass, phi, pt, rawFactor, subJetIdx1, subJetIdx1G, subJetIdx2, subJetIdx2G, subJetIdxG, tau1, tau2, tau3, tau4
    childrenIdxG, distinctChildrenIdxG, distinctParentIdxG, eta, genPartIdxMother, genPartIdxMotherG, mass, pdgId, phi, pt, status, statusFlags

Expected behavior I expected nearest to be able to return the GenParticle nearest to my candidate jet.

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:20

github_iconTop GitHub Comments

1reaction
lgraycommented, Nov 17, 2022

I’m able to reproduce this even with the test files in coffea, and something really strange is happening with ak.firsts in this case. However, I think I found a better and more robust way to do your selection in the future, using indices rather than ak.firsts which I think has some knockon effects for what you’re trying to accomplish.

from coffea.nanoevents import NanoEventsFactory
import awkward as ak

import warnings
import numpy as np

events = NanoEventsFactory.from_root("root://cmsxrootd.fnal.gov//store/user/lpcpfnano/cmantill/v2_2/2017/HWW/GluGluHToWW_Pt-200ToInf_M-125_TuneCP5_MINLO_13TeV-powheg-pythia8/GluGluHToWW_Pt-200ToInf_M-125\
/220629_181652/0000/nano_mc2017_1-1.root").events()

higgs = events.GenPart[abs(events.GenPart.pdgId)==25 & events.GenPart.hasFlags(['fromHardProcess', 'isLastCopy'])]
higgs = higgs[ ak.all(abs(higgs.children.pdgId) == 24, axis=2) ]
jet = events.FatJet[:, :1]

# this works                                                                                                                                                                                                
print(events.FatJet.nearest(higgs))

# but this also works                                                                                                                                                                                
print(jet.nearest(higgs))
print(jet.nearest(higgs[:, :1]))

This will always work and does exactly what you were doing before.

0reactions
lgraycommented, Nov 18, 2022

@cmantill FYI - I tried running the reproducer in coffeateam/coffea-base-cc7:0.7.20-fastjet-3.4.0.0rc2 and it failed in the same way as with the latest image.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Issues · CoffeaTeam/coffea - GitHub
Basic tools and wrappers for enabling not-too-alien syntax when running ... Nearest function not working for all cases on coffea 0.7.20 bug Something...
Read more >
Installing coffea — coffea 0.7.20 documentation - GitHub Pages
Coffea is a python package distributed via PyPI. A python installation is required to use coffea. Python version 3.6 or newer is required....
Read more >
Fundamentals of Corporate Finance, 4e, GE (Berk/DeMarzo ...
Answer: Dividends are periodic payments given out by the firm to shareholders. It is not necessary for a firm to declare dividends, but...
Read more >
Insights from the genomes of 4 diploid Camelina spp.
Specifically, we examined all regions of synteny (determined by nucmer), determined if a Helitron was detected in the region (by EAHelitron using the...
Read more >
Organically managed coffee agroforests have larger soil ...
PDF | The cultivation of crops in the presence of trees (agroforestry) and organic agriculture are management strategies thought to reduce ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found