Unable to recreate onnx speedups demonstrated in 04-onnx-export.ipynb on mac or linux
See original GitHub issueEnvironment info
transformers
version: 3.1.0- Platform: Mac OS Mojave + Ubuntu 18.04.4
- Python version: 3.7.7
- PyTorch version (GPU?): 1.6.0
- Tensorflow version (GPU?): na
- Using GPU in script?: no
- Using distributed or parallel set-up in script?: no
Who can help
Information
Model I am using (Bert, XLNet …): bert-base-uncased
The problem arises when using:
- the official example scripts: (give details below)
- my own modified scripts: (give details below)
I am running the /notebooks/04-onnx-export.ipynb example
The tasks I am working on is:
- an official GLUE/SQUaD task: (give the name)
- my own task or dataset: (give details below)
I am using the example data in the notebook
To reproduce
Steps to reproduce the behavior:
- Within the notebook add
torch.set_num_threads(1)
- Replace
environ["OMP_NUM_THREADS"] = str(cpu_count(logical=True))
withenviron["OMP_NUM_THREADS"] = "1"
- Run the 04-onnx-export.ipynb example notebook
I am trying to recreate the speedups shown in this example notebook.
Note that without step 1 above I found pytorch to be considerably faster than onnx as presumably it was using more threads than onnx, step 2 doesn’t seem to impact the results but I set it for completeness (ensuring every thing is on the same number of threads)
Actual results on a Macbook Pro:
with hardware:
machdep.cpu.max_basic: 22
machdep.cpu.max_ext: 2147483656
machdep.cpu.vendor: GenuineIntel
machdep.cpu.brand_string: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz
machdep.cpu.family: 6
machdep.cpu.model: 94
machdep.cpu.extmodel: 5
machdep.cpu.extfamily: 0
machdep.cpu.stepping: 3
machdep.cpu.feature_bits: 9221959987971750911
machdep.cpu.leaf7_feature_bits: 43806655 0
machdep.cpu.leaf7_feature_bits_edx: 2617255424
machdep.cpu.extfeature_bits: 1241984796928
machdep.cpu.signature: 329443
machdep.cpu.brand: 0
machdep.cpu.features: FPU VME DE PSE TSC MSR PAE MCE CX8 APIC SEP MTRR PGE MCA CMOV PAT PSE36 CLFSH DS ACPI MMX FXSR SSE SSE2 SS HTT TM PBE SSE3 PCLMULQDQ DTES64 MON DSCPL VMX EST TM2 SSSE3 FMA CX16 TPR PDCM SSE4.1 SSE4.2 x2APIC MOVBE POPCNT AES PCID XSAVE OSXSAVE SEGLIM64 TSCTMR AVX1.0 RDRAND F16C
machdep.cpu.leaf7_features: RDWRFSGS TSC_THREAD_OFFSET SGX BMI1 HLE AVX2 SMEP BMI2 ERMS INVPCID RTM FPU_CSDS MPX RDSEED ADX SMAP CLFSOPT IPT MDCLEAR TSXFA IBRS STIBP L1DF SSBD
machdep.cpu.extfeatures: SYSCALL XD 1GBPAGE EM64T LAHF LZCNT PREFETCHW RDTSCP TSCI
machdep.cpu.logical_per_package: 16
machdep.cpu.cores_per_package: 8
machdep.cpu.microcode_version: 220
machdep.cpu.processor_flag: 5
machdep.cpu.mwait.linesize_min: 64
machdep.cpu.mwait.linesize_max: 64
machdep.cpu.mwait.extensions: 3
machdep.cpu.mwait.sub_Cstates: 286531872
machdep.cpu.thermal.sensor: 1
machdep.cpu.thermal.dynamic_acceleration: 1
machdep.cpu.thermal.invariant_APIC_timer: 1
machdep.cpu.thermal.thresholds: 2
machdep.cpu.thermal.ACNT_MCNT: 1
machdep.cpu.thermal.core_power_limits: 1
machdep.cpu.thermal.fine_grain_clock_mod: 1
machdep.cpu.thermal.package_thermal_intr: 1
machdep.cpu.thermal.hardware_feedback: 0
machdep.cpu.thermal.energy_policy: 1
machdep.cpu.xsave.extended_state: 31 832 1088 0
machdep.cpu.xsave.extended_state1: 15 832 256 0
machdep.cpu.arch_perf.version: 4
machdep.cpu.arch_perf.number: 4
machdep.cpu.arch_perf.width: 48
machdep.cpu.arch_perf.events_number: 7
machdep.cpu.arch_perf.events: 0
machdep.cpu.arch_perf.fixed_number: 3
machdep.cpu.arch_perf.fixed_width: 48
machdep.cpu.cache.linesize: 64
machdep.cpu.cache.L2_associativity: 4
machdep.cpu.cache.size: 256
machdep.cpu.tlb.inst.large: 8
machdep.cpu.tlb.data.small: 64
machdep.cpu.tlb.data.small_level1: 64
machdep.cpu.address_bits.physical: 39
machdep.cpu.address_bits.virtual: 48
machdep.cpu.core_count: 4
machdep.cpu.thread_count: 8
machdep.cpu.tsc_ccc.numerator: 216
machdep.cpu.tsc_ccc.denominator: 2
I obtained even worse results on a linux machine:
with hardware:
processor : 11
vendor_id : GenuineIntel
cpu family : 6
model : 63
model name : Intel(R) Core(TM) i7-5820K CPU @ 3.30GHz
stepping : 2
microcode : 0x43
cpu MHz : 1199.433
cache size : 15360 KB
physical id : 0
siblings : 12
core id : 5
cpu cores : 6
apicid : 11
initial apicid : 11
fpu : yes
fpu_exception : yes
cpuid level : 15
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts md_clear flush_l1d
bugs : cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf mds swapgs itlb_multihit
bogomips : 6596.76
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
Expected behavior
Expected to speed speedup from using onnx as in the example:
I know this is hardware specific but having tested it on two machines I wonder if there is some config not included in the example that I am missing or some other issue?
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
@erees1, your observation is correct.
It is recommended to use default setting (do not set the option intra_op_num_threads) for general usage.
onnxruntime-gpu package is not built with OpenMP, so OMP_NUM_THREADS does not have effect. If cpu cores >= 16, user might try intra_op_num_threads =16 explicitly.
For onnxruntime package,
options.intra_op_num_threads = 1
was advised for version = 1.2.0 at the time that notebook created. User could set OMP_NUM_THREADS etc environment variable before importing onnxruntime to control the intra op thread number. For version >= 1.3.0, it is recommended to use default intra_op_num_threads.@mfuntowicz, could you help update the setting in the notebook like the following?
Before:
After:
Thanks for the help, I think that clears things up! - closing the issue.