fp16 compatibility StopIteration on Multiple GPU's: Text Classification of MultiNLI Sentences using BERT
See original GitHub issueHello! I hope you’re doing great. On the other hand, I had this issue while running this jupyter notebook Text Classification of MultiNLI Sentences using BERT. Environment: On-Premises Computer: Macbook Pro 16" CPU: intel i9 9980HK RAM: 64GB GPU: 2 x TITAN RTX 24GB in RAM GPU Enclosure: 2 x Razer Core X Chrome Thunderbold 3 (1 connected to the left and one connected to the right) Conda Version: conda 4.8.3 Python Version: Python 3.7.7
Packages Installed:
Name Version Build Channel
_anaconda_depends 2020.02 py37_0
_pytorch_select 1.1.0 cpu
_r-mutex 1.0.0 anacondar_1
_tflow_select 2.1.0 gpu
absl-py 0.9.0 py37_0
alabaster 0.7.12 py37_0
anaconda custom py37_1
anaconda-client 1.7.2 py37_0
anaconda-project 0.8.4 py_0
argh 0.26.2 py37_0
asn1crypto 1.3.0 py37_0
astor 0.8.0 py37_0
astroid 2.3.3 py37_0
astropy 4.0.1.post1 py37he774522_1
atomicwrites 1.4.0 py_0
attrs 19.3.0 py_0
autopep8 1.4.4 py_0
babel 2.8.0 py_0
backcall 0.1.0 py37_0
backports 1.0 py_2
backports.shutil_get_terminal_size 1.0.0 py37_2
bcrypt 3.1.7 py37he774522_0
beautifulsoup4 4.9.0 py37_0
bitarray 1.2.1 py37he774522_0
bkcharts 0.2 py37_0
blas 1.0 mkl
bleach 3.1.4 py_0
blinker 1.4 py37_0
blis 0.4.1 pypi_0 pypi
blosc 1.16.3 h7bd577a_0
bokeh 2.0.2 py37_0
boto 2.49.0 py37_0
boto3 1.13.13 pypi_0 pypi
botocore 1.16.13 pypi_0 pypi
bottleneck 1.3.2 py37h2a96729_0
brotli 1.0.7 pypi_0 pypi
bzip2 1.0.8 he774522_0
ca-certificates 2020.4.5.1 hecc5488_0 conda-forge
cached-property 1.5.1 pypi_0 pypi
cachetools 3.1.1 py_0
catalogue 1.0.0 pypi_0 pypi
certifi 2020.4.5.1 py37hc8dfbb8_0 conda-forge
cffi 1.14.0 py37h7a1dbc1_0
chardet 3.0.4 py37_1003
click 7.1.2 py_0
cloudpickle 1.4.1 py_0
clyent 1.2.2 py37_1
colorama 0.4.3 py_0
comtypes 1.1.7 py37_0
console_shortcut 0.1.1 4
contextlib2 0.6.0.post1 py_0
cryptography 2.9.2 py37h7a1dbc1_0
cssselect 1.1.0 pypi_0 pypi
cudatoolkit 10.1.243 h74a9793_0
cudnn 7.6.5 cuda10.1_0
curl 7.69.1 h2a8f88b_0
cycler 0.10.0 py37_0
cymem 2.0.3 pypi_0 pypi
cython 0.29.17 py37ha925a31_0
cytoolz 0.10.1 py37he774522_0
dash 1.12.0 pypi_0 pypi
dash-core-components 1.10.0 pypi_0 pypi
dash-cytoscape 0.1.1 pypi_0 pypi
dash-html-components 1.0.3 pypi_0 pypi
dash-renderer 1.4.1 pypi_0 pypi
dash-table 4.7.0 pypi_0 pypi
dask 2.16.0 py_0
dask-core 2.16.0 py_0
decorator 4.4.2 py_0
defusedxml 0.6.0 py_0
diff-match-patch 20181111 py_0
dill 0.3.1.1 pypi_0 pypi
distributed 2.16.0 py37_0
docutils 0.15.2 pypi_0 pypi
entrypoints 0.3 py37_0
et_xmlfile 1.0.1 py37_0
fastcache 1.1.0 py37he774522_0
filelock 3.0.12 py_0
flake8 3.7.9 py37_0
flask 1.1.2 pypi_0 pypi
flask-compress 1.5.0 pypi_0 pypi
freetype 2.9.1 ha9979f8_1
fsspec 0.7.1 py_0
future 0.18.2 py37_0
gast 0.2.2 py37_0
get_terminal_size 1.0.0 h38e98db_0
gevent 20.5.0 pypi_0 pypi
glob2 0.7 py_0
google-auth 1.14.1 py_0
google-auth-oauthlib 0.4.1 py_2
google-pasta 0.2.0 py_0
greenlet 0.4.15 py37hfa6e2cd_0
grpcio 1.27.2 py37h351948d_0
h5py 2.10.0 py37h5e291fa_0
hdf5 1.10.4 h7ebc959_0
heapdict 1.0.1 py_0
html5lib 1.0.1 py37_0
hypothesis 5.11.0 py_0
icc_rt 2019.0.0 h0cc432a_1
icu 58.2 ha925a31_3
idna 2.9 py_1
imagecodecs 2020.2.18 pypi_0 pypi
imageio 2.8.0 py_0
imagesize 1.2.0 py_0
importlib_metadata 1.5.0 py37_0
intel-openmp 2020.1 216
interpret 0.1.22 pypi_0 pypi
interpret-community 0.11.1 pypi_0 pypi
interpret-core 0.1.21 pypi_0 pypi
interpret-text 0.1.1 pypi_0 pypi
intervaltree 3.0.2 py_0
ipykernel 5.1.4 py37h39e3cac_0
ipython 7.13.0 py37h5ca1d4c_0
ipython_genutils 0.2.0 py37_0
ipywidgets 7.5.1 py_0
isort 4.3.21 py37_0
itsdangerous 1.1.0 py37_0
jdcal 1.4.1 py_0
jedi 0.15.2 py37_0
jinja2 2.11.2 py_0
jmespath 0.10.0 pypi_0 pypi
joblib 0.14.1 py_0
jpeg 9b hb83a4c4_2
json5 0.9.4 py_0
jsonschema 3.2.0 py37_0
jupyter 1.0.0 py37_7
jupyter_client 6.1.3 py_0
jupyter_console 6.1.0 py_0
jupyter_contrib_core 0.3.3 py_2 conda-forge
jupyter_contrib_nbextensions 0.5.1 py37_0 conda-forge
jupyter_core 4.6.3 py37_0
jupyter_highlight_selected_word 0.2.0 py37_1000 conda-forge
jupyter_latex_envs 1.4.4 py37_1000 conda-forge
jupyter_nbextensions_configurator 0.4.1 py37_0 conda-forge
jupyterlab 1.2.6 pyhf63ae98_0
jupyterlab_server 1.1.1 py_0
keras 2.3.1 0
keras-applications 1.0.8 py_0
keras-base 2.3.1 py37_0
keras-preprocessing 1.1.0 py_1
keyring 21.1.1 py37_2
kiwisolver 1.2.0 py37h74a9793_0
krb5 1.17.1 hc04afaa_0
lazy-object-proxy 1.4.3 py37he774522_0
libarchive 3.3.3 h0643e63_5
libcurl 7.69.1 h2a8f88b_0
libiconv 1.15 h1df5818_7
liblief 0.10.1 ha925a31_0
libpng 1.6.37 h2a8f88b_0
libprotobuf 3.11.4 h7bd577a_0
libsodium 1.0.16 h9d3ae62_0
libspatialindex 1.9.3 h33f27b4_0
libssh2 1.9.0 h7a1dbc1_1
libtiff 4.1.0 h56a325e_0
libxml2 2.9.9 h464c3ec_0
libxslt 1.1.33 h579f668_0
lime 0.2.0.0 pypi_0 pypi
llvmlite 0.32.1 py37ha925a31_0
locket 0.2.0 py37_1
lxml 4.5.1 pypi_0 pypi
lz4-c 1.8.1.2 h2fa13f4_0
lzo 2.10 he774522_2
m2w64-bwidget 1.9.10 2
m2w64-bzip2 1.0.6 6
m2w64-expat 2.1.1 2
m2w64-fftw 3.3.4 6
m2w64-flac 1.3.1 3
m2w64-gcc-libgfortran 5.3.0 6
m2w64-gcc-libs 5.3.0 7
m2w64-gcc-libs-core 5.3.0 7
m2w64-gettext 0.19.7 2
m2w64-gmp 6.1.0 2
m2w64-gsl 2.1 2
m2w64-libiconv 1.14 6
m2w64-libjpeg-turbo 1.4.2 3
m2w64-libogg 1.3.2 3
m2w64-libpng 1.6.21 2
m2w64-libsndfile 1.0.26 2
m2w64-libsodium 1.0.10 2
m2w64-libtiff 4.0.6 2
m2w64-libvorbis 1.3.5 2
m2w64-libwinpthread-git 5.0.0.4634.697f757 2
m2w64-libxml2 2.9.3 4
m2w64-mpfr 3.1.4 4
m2w64-openblas 0.2.19 1
m2w64-pcre 8.38 2
m2w64-speex 1.2rc2 3
m2w64-speexdsp 1.2rc3 3
m2w64-tcl 8.6.5 3
m2w64-tk 8.6.5 3
m2w64-tktable 2.10 5
m2w64-wineditline 2.101 5
m2w64-xz 5.2.2 2
m2w64-zeromq 4.1.4 2
m2w64-zlib 1.2.8 10
markdown 3.1.1 py37_0
markupsafe 1.1.1 py37he774522_0
matplotlib 3.1.3 py37_0
matplotlib-base 3.1.3 py37h64f37c6_0
mccabe 0.6.1 py37_1
menuinst 1.4.16 py37he774522_0
mistune 0.8.4 py37he774522_0
mkl 2020.1 216
mkl-service 2.3.0 py37hb782905_0
mkl_fft 1.0.15 py37h14836fe_0
mkl_random 1.1.0 py37h675688f_0
mock 4.0.2 py_0
more-itertools 8.2.0 py_0
mpmath 1.1.0 py37_0
msgpack-python 1.0.0 py37h74a9793_1
msys2-conda-epoch 20160418 1
multipledispatch 0.6.0 py37_0
murmurhash 1.0.2 pypi_0 pypi
nbconvert 5.6.1 py37_0
nbformat 5.0.6 py_0
networkx 2.4 py_0
ninja 1.9.0 py37h74a9793_0
nltk 3.4.5 py37_0
nose 1.3.7 py37_2
notebook 6.0.3 py37_0
numba 0.49.1 py37h47e9c7a_0
numexpr 2.7.1 py37h25d0782_0
numpy 1.18.1 py37h93ca92e_0
numpy-base 1.18.1 py37hc3f5095_1
numpydoc 0.9.2 py_0
oauthlib 3.1.0 py_0
olefile 0.46 py37_0
openpyxl 3.0.3 py_0
openssl 1.1.1g he774522_0 conda-forge
opt_einsum 3.1.0 py_0
packaging 20.4 pypi_0 pypi
pandas 1.0.3 py37h47e9c7a_0
pandoc 2.2.3.2 0
pandocfilters 1.4.2 py37_1
paramiko 2.7.1 py_0
parsel 1.6.0 pypi_0 pypi
parso 0.5.2 py_0
partd 1.1.0 py_0
path 13.1.0 py37_0
path.py 12.4.0 0
pathlib2 2.3.5 py37_0
pathtools 0.1.2 py_1
patsy 0.5.1 py37_0
pep8 1.7.1 py37_0
pexpect 4.8.0 py37_0
pickleshare 0.7.5 py37_0
pillow 5.4.1 pypi_0 pypi
pip 20.0.2 py37_3
pkginfo 1.5.0.1 py37_0
plac 1.1.3 pypi_0 pypi
plotly 4.7.1 pypi_0 pypi
pluggy 0.13.1 py37_0
ply 3.11 py37_0
powershell_shortcut 0.0.1 3
preshed 3.0.2 pypi_0 pypi
prometheus_client 0.7.1 py_0
prompt-toolkit 3.0.4 py_0
prompt_toolkit 3.0.4 0
protobuf 3.11.4 py37h33f27b4_0
psutil 5.7.0 py37he774522_0
py 1.8.1 py_0
py-lief 0.10.1 py37ha925a31_0
pyasn1 0.4.8 py_0
pyasn1-modules 0.2.7 py_0
pycodestyle 2.5.0 py37_0
pycosat 0.6.3 py37he774522_0
pycparser 2.20 py_0
pycrypto 2.6.1 py37hfa6e2cd_9
pycurl 7.43.0.5 py37h7a1dbc1_0
pydantic 1.5.1 pypi_0 pypi
pydocstyle 4.0.1 py_0
pyflakes 2.1.1 py37_0
pygments 2.6.1 py_0
pyjwt 1.7.1 py37_0
pylint 2.4.4 py37_0
pynacl 1.3.0 py37h62dcd97_0
pyodbc 4.0.30 py37ha925a31_0
pyopenssl 19.1.0 py37_0
pyparsing 2.4.7 py_0
pyqt 5.9.2 py37h6538335_2
pyreadline 2.1 py37_1
pyrsistent 0.16.0 py37he774522_0
pysocks 1.7.1 py37_0
pytables 3.6.1 py37h1da0976_0
pytest 5.4.2 py37_0
pytest-arraydiff 0.3 py37h39e3cac_0
pytest-astropy 0.8.0 py_0
pytest-astropy-header 0.1.2 py_0
pytest-doctestplus 0.5.0 py_0
pytest-openfiles 0.5.0 py_0
pytest-remotedata 0.3.2 py37_0
python 3.7.7 h81c818b_4
python-dateutil 2.8.1 py_0
python-jsonrpc-server 0.3.4 py_0
python-language-server 0.31.10 py37_0
python-libarchive-c 2.9 py_0
python_abi 3.7 1_cp37m conda-forge
pytorch 1.5.0 py3.7_cuda101_cudnn7_0 pytorch
pytorch-pretrained-bert 0.6.2 pypi_0 pypi
pytz 2020.1 py_0
pywavelets 1.1.1 py37he774522_0
pywin32 227 py37he774522_1
pywin32-ctypes 0.2.0 py37_1000
pywinpty 0.5.7 py37_0
pyyaml 5.3.1 py37he774522_0
pyzmq 18.1.1 py37ha925a31_0
qdarkstyle 2.8.1 py_0
qt 5.9.7 vc14h73c81de_0
qtawesome 0.7.0 py_0
qtconsole 4.7.4 py_0
qtpy 1.9.0 py_0
r-askpass 1.0 r36_0
r-assertthat 0.2.1 r36h6115d3f_0
r-backports 1.1.4 r36h6115d3f_0
r-base 3.6.1 hf18239d_1
r-base64enc 0.1_3 r36h6115d3f_4
r-bh 1.69.0_1 r36h6115d3f_0
r-boot 1.3_20 r36h6115d3f_0
r-broom 0.5.2 r36h6115d3f_0
r-callr 3.2.0 r36h6115d3f_0
r-caret 6.0_83 r36h6115d3f_0
r-cellranger 1.1.0 r36h6115d3f_0
r-class 7.3_15 r36h6115d3f_0
r-cli 1.1.0 r36h6115d3f_0
r-clipr 0.6.0 r36h6115d3f_0
r-cluster 2.0.8 r36h6115d3f_0
r-codetools 0.2_16 r36h6115d3f_0
r-colorspace 1.4_1 r36h6115d3f_0
r-crayon 1.3.4 r36h6115d3f_0
r-curl 3.3 r36h6115d3f_0
r-data.table 1.12.2 r36h6115d3f_0
r-dbi 1.0.0 r36h6115d3f_0
r-dbplyr 1.4.0 r36h6115d3f_0
r-dichromat 2.0_0 r36h6115d3f_4
r-digest 0.6.18 r36h6115d3f_0
r-dplyr 0.8.0.1 r36h6115d3f_0
r-ellipsis 0.1.0 r36h6115d3f_0
r-essentials 3.6.0 r36_0
r-evaluate 0.13 r36h6115d3f_0
r-fansi 0.4.0 r36h6115d3f_0
r-forcats 0.4.0 r36h6115d3f_0
r-foreach 1.4.4 r36h6115d3f_0
r-foreign 0.8_71 r36h6115d3f_0
r-formatr 1.6 r36h6115d3f_0
r-fs 1.2.7 r36h6115d3f_0
r-generics 0.0.2 r36h6115d3f_0
r-ggplot2 3.1.1 r36h6115d3f_0
r-glmnet 2.0_16 r36h6115d3f_0
r-glue 1.3.1 r36h6115d3f_0
r-gower 0.2.0 r36h6115d3f_0
r-gtable 0.3.0 r36h6115d3f_0
r-haven 2.1.0 r36h6115d3f_0
r-hexbin 1.27.2 r36h6115d3f_0
r-highr 0.8 r36h6115d3f_0
r-hms 0.4.2 r36h6115d3f_0
r-htmltools 0.3.6 r36h6115d3f_0
r-htmlwidgets 1.3 r36h6115d3f_0
r-httpuv 1.5.1 r36h6115d3f_0
r-httr 1.4.0 r36h6115d3f_0
r-ipred 0.9_8 r36h6115d3f_0
r-irdisplay 0.7.0 r36h6115d3f_0
r-irkernel 0.8.15 r36_0
r-iterators 1.0.10 r36h6115d3f_0
r-jsonlite 1.6 r36h6115d3f_0
r-kernsmooth 2.23_15 r36h6115d3f_4
r-knitr 1.22 r36h6115d3f_0
r-labeling 0.3 r36h6115d3f_4
r-later 0.8.0 r36h6115d3f_0
r-lattice 0.20_38 r36h6115d3f_0
r-lava 1.6.5 r36h6115d3f_0
r-lazyeval 0.2.2 r36h6115d3f_0
r-lubridate 1.7.4 r36h6115d3f_0
r-magrittr 1.5 r36h6115d3f_4
r-maps 3.3.0 r36h6115d3f_0
r-markdown 0.9 r36h6115d3f_0
r-mass 7.3_51.3 r36h6115d3f_0
r-matrix 1.2_17 r36h6115d3f_0
r-mgcv 1.8_28 r36h6115d3f_0
r-mime 0.6 r36h6115d3f_0
r-modelmetrics 1.2.2 r36h6115d3f_0
r-modelr 0.1.4 r36h6115d3f_0
r-munsell 0.5.0 r36h6115d3f_0
r-nlme 3.1_139 r36h6115d3f_0
r-nnet 7.3_12 r36h6115d3f_0
r-numderiv 2016.8_1 r36h6115d3f_0
r-openssl 1.3 r36h6115d3f_0
r-pbdzmq 0.3_3 r36h6115d3f_0
r-pillar 1.3.1 r36h6115d3f_0
r-pkgconfig 2.0.2 r36h6115d3f_0
r-plogr 0.2.0 r36h6115d3f_0
r-plyr 1.8.4 r36h6115d3f_0
r-prettyunits 1.0.2 r36h6115d3f_0
r-processx 3.3.0 r36h6115d3f_0
r-prodlim 2018.04.18 r36h6115d3f_0
r-progress 1.2.0 r36h6115d3f_0
r-promises 1.0.1 r36h6115d3f_0
r-ps 1.3.0 r36h6115d3f_0
r-purrr 0.3.2 r36h6115d3f_0
r-quantmod 0.4_14 r36h6115d3f_0
r-r6 2.4.0 r36h6115d3f_0
r-randomforest 4.6_14 r36h6115d3f_0
r-rbokeh 0.6.3 r36_0
r-rcolorbrewer 1.1_2 r36h6115d3f_0
r-rcpp 1.0.1 r36h6115d3f_0
r-rcpproll 0.3.0 r36h6115d3f_0
r-readr 1.3.1 r36h6115d3f_0
r-readxl 1.3.1 r36h6115d3f_0
r-recipes 0.1.5 r36h6115d3f_0
r-recommended 3.6.0 r36_0
r-rematch 1.0.1 r36h6115d3f_0
r-repr 0.19.2 r36h6115d3f_0
r-reprex 0.2.1 r36h6115d3f_0
r-reshape2 1.4.3 r36h6115d3f_0
r-rlang 0.3.4 r36h6115d3f_0
r-rmarkdown 1.12 r36h6115d3f_0
r-rpart 4.1_15 r36h6115d3f_0
r-rstudioapi 0.10 r36h6115d3f_0
r-rvest 0.3.3 r36h6115d3f_0
r-scales 1.0.0 r36h6115d3f_0
r-selectr 0.4_1 r36h6115d3f_0
r-shiny 1.3.2 r36h6115d3f_0
r-sourcetools 0.1.7 r36h6115d3f_0
r-spatial 7.3_11 r36h6115d3f_4
r-squarem 2017.10_1 r36h6115d3f_0
r-stringi 1.4.3 r36h6115d3f_0
r-stringr 1.4.0 r36h6115d3f_0
r-survival 2.44_1.1 r36h6115d3f_0
r-sys 3.2 r36h6115d3f_0
r-tibble 2.1.1 r36h6115d3f_0
r-tidyr 0.8.3 r36h6115d3f_0
r-tidyselect 0.2.5 r36h6115d3f_0
r-tidyverse 1.2.1 r36h6115d3f_0
r-timedate 3043.102 r36h6115d3f_0
r-tinytex 0.12 r36h6115d3f_0
r-ttr 0.23_4 r36h6115d3f_0
r-utf8 1.1.4 r36h6115d3f_0
r-uuid 0.1_2 r36h6115d3f_4
r-viridislite 0.3.0 r36h6115d3f_0
r-whisker 0.3_2 r36h6115d3f_4
r-withr 2.1.2 r36h6115d3f_0
r-xfun 0.6 r36h6115d3f_0
r-xml2 1.2.0 r36h6115d3f_0
r-xtable 1.8_4 r36h6115d3f_0
r-xts 0.11_2 r36h6115d3f_0
r-yaml 2.2.0 r36h6115d3f_0
r-zoo 1.8_5 r36h6115d3f_0
regex 2020.5.14 pypi_0 pypi
requests 2.23.0 py37_0
requests-oauthlib 1.3.0 py_0
retrying 1.3.3 pypi_0 pypi
rope 0.17.0 py_0
rsa 4.0 py_0
rtree 0.9.4 py37h21ff451_1
ruamel_yaml 0.15.87 py37he774522_0
s3transfer 0.3.3 pypi_0 pypi
sacremoses 0.0.43 pypi_0 pypi
salib 1.3.11 pypi_0 pypi
scikit-image 0.17.2 pypi_0 pypi
scikit-learn 0.22.1 py37h6288b17_0
scipy 1.4.1 py37h9439919_0
scrapbook 0.2.0 pypi_0 pypi
seaborn 0.10.1 py_0
send2trash 1.5.0 py37_0
sentencepiece 0.1.90 pypi_0 pypi
setuptools 46.4.0 py37_0
shap 0.29.3 pypi_0 pypi
simplegeneric 0.8.1 py37_2
singledispatch 3.4.0.3 py37_0
sip 4.19.8 py37h6538335_0
six 1.14.0 py37_0
snappy 1.1.7 h777316e_3
snowballstemmer 2.0.0 py_0
sortedcollections 1.1.2 py37_0
sortedcontainers 2.1.0 py37_0
soupsieve 2.0 py_0
spacy 2.2.4 pypi_0 pypi
sphinx 3.0.3 py_0
sphinxcontrib 1.0 py37_1
sphinxcontrib-applehelp 1.0.2 py_0
sphinxcontrib-devhelp 1.0.2 py_0
sphinxcontrib-htmlhelp 1.0.3 py_0
sphinxcontrib-jsmath 1.0.1 py_0
sphinxcontrib-qthelp 1.0.3 py_0
sphinxcontrib-serializinghtml 1.1.4 py_0
sphinxcontrib-websupport 1.2.1 py_0
spyder 4.1.3 py37_0
spyder-kernels 1.9.1 py37_0
sqlalchemy 1.3.16 py37he774522_0
sqlite 3.31.1 h2a8f88b_1
srsly 1.0.2 pypi_0 pypi
statsmodels 0.11.0 py37he774522_0
sympy 1.5.1 py37_0
tbb 2020.0 h74a9793_0
tblib 1.6.0 py_0
tensorboard 2.1.0 py3_0
tensorflow 2.1.0 gpu_py37h7db9008_0
tensorflow-base 2.1.0 gpu_py37h55f5790_0
tensorflow-estimator 2.1.0 pyhd54b08b_0
tensorflow-gpu 2.1.0 h0d30ee6_0
termcolor 1.1.0 py37_1
terminado 0.8.3 py37_0
testpath 0.4.4 py_0
thinc 7.4.0 pypi_0 pypi
tifffile 2020.5.11 pypi_0 pypi
tk 8.6.8 hfa6e2cd_0
tokenizers 0.0.11 pypi_0 pypi
toolz 0.10.0 py_0
torchvision 0.6.0 py37_cu101 pytorch
tornado 6.0.4 py37he774522_1
tqdm 4.46.0 py_0
traitlets 4.3.3 py37_0
transformers 2.4.1 pypi_0 pypi
treeinterpreter 0.2.2 pypi_0 pypi
typing_extensions 3.7.4.1 py37_0
ujson 1.35 py37hfa6e2cd_0
unicodecsv 0.14.1 py37_0
urllib3 1.25.8 py37_0
vc 14.1 h0510ff6_4
vs2015_runtime 14.16.27012 hf0eaf9b_1
w3lib 1.22.0 pypi_0 pypi
wasabi 0.6.0 pypi_0 pypi
watchdog 0.10.2 py37_0
wcwidth 0.1.9 py_0
webencodings 0.5.1 py37_1
werkzeug 1.0.1 pypi_0 pypi
wheel 0.34.2 py37_0
widgetsnbextension 3.5.1 py37_0
win_inet_pton 1.1.0 py37_0
win_unicode_console 0.5 py37_0
wincertstore 0.2 py37_0
winpty 0.4.3 4
wrapt 1.12.1 py37he774522_1
xgboost 1.1.0 pypi_0 pypi
xlrd 1.2.0 py37_0
xlsxwriter 1.2.8 py_0
xlwings 0.19.0 py37_0
xlwt 1.3.0 py37_0
xz 5.2.5 h62dcd97_0
yaml 0.1.7 hc54c509_2
yapf 0.28.0 py_0
zeromq 4.3.1 h33f27b4_3
zict 2.0.0 py_0
zipp 3.1.0 py_0
zlib 1.2.11 h62dcd97_4
zstd 1.3.7 h508b16e_0
When running this line:
with Timer() as t:
classifier.fit(token_ids=tokens_train,
input_mask=mask_train,
labels=labels_train,
num_epochs=NUM_EPOCHS,
batch_size=BATCH_SIZE,
verbose=True)
print("[Training time: {:.3f} hrs]".format(t.interval / 3600))
I got the following stack trace:
t_total value of -1 results in schedule not being applied
Iteration: 0%| | 0/79 [00:00<?, ?it/s]
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
<ipython-input-34-18e84990dbbe> in <module>
5 num_epochs=NUM_EPOCHS,
6 batch_size=BATCH_SIZE,
----> 7 verbose=True)
8
9 print("[Training time: {:.3f} hrs]".format(t.interval / 3600))
C:\ProgramData\Anaconda3\envs\EmilioDL\lib\site-packages\interpret_text\experimental\common\utils_bert.py in fit(self, token_ids, input_mask, labels, token_type_ids, num_gpus, num_epochs, batch_size, lr, warmup_proportion, verbose)
550 token_type_ids=token_type_ids_batch,
551 attention_mask=mask_batch,
--> 552 labels=None,
553 )
554 loss = loss_func(y_h, y_batch).mean()
C:\ProgramData\Anaconda3\envs\EmilioDL\lib\site-packages\torch\nn\modules\module.py in __call__(self, *input, **kwargs)
548 result = self._slow_forward(*input, **kwargs)
549 else:
--> 550 result = self.forward(*input, **kwargs)
551 for hook in self._forward_hooks.values():
552 hook_result = hook(self, input, result)
C:\ProgramData\Anaconda3\envs\EmilioDL\lib\site-packages\torch\nn\parallel\data_parallel.py in forward(self, *inputs, **kwargs)
153 return self.module(*inputs[0], **kwargs[0])
154 replicas = self.replicate(self.module, self.device_ids[:len(inputs)])
--> 155 outputs = self.parallel_apply(replicas, inputs, kwargs)
156 return self.gather(outputs, self.output_device)
157
C:\ProgramData\Anaconda3\envs\EmilioDL\lib\site-packages\torch\nn\parallel\data_parallel.py in parallel_apply(self, replicas, inputs, kwargs)
163
164 def parallel_apply(self, replicas, inputs, kwargs):
--> 165 return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
166
167 def gather(self, outputs, output_device):
C:\ProgramData\Anaconda3\envs\EmilioDL\lib\site-packages\torch\nn\parallel\parallel_apply.py in parallel_apply(modules, inputs, kwargs_tup, devices)
83 output = results[i]
84 if isinstance(output, ExceptionWrapper):
---> 85 output.reraise()
86 outputs.append(output)
87 return outputs
C:\ProgramData\Anaconda3\envs\EmilioDL\lib\site-packages\torch\_utils.py in reraise(self)
393 # (https://bugs.python.org/issue2651), so we work around it.
394 msg = KeyErrorMessage(msg)
--> 395 raise self.exc_type(msg)
StopIteration: Caught StopIteration in replica 0 on device 0.
Original Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\EmilioDL\lib\site-packages\torch\nn\parallel\parallel_apply.py", line 60, in _worker
output = module(*input, **kwargs)
File "C:\ProgramData\Anaconda3\envs\EmilioDL\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Anaconda3\envs\EmilioDL\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 989, in forward
_, pooled_output = self.bert(input_ids, token_type_ids, attention_mask, output_all_encoded_layers=False)
File "C:\ProgramData\Anaconda3\envs\EmilioDL\lib\site-packages\torch\nn\modules\module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "C:\ProgramData\Anaconda3\envs\EmilioDL\lib\site-packages\pytorch_pretrained_bert\modeling.py", line 727, in forward
extended_attention_mask = extended_attention_mask.to(dtype=next(self.parameters()).dtype) # fp16 compatibility
StopIteration
With one GPU the code runs flawlessly, but with 2 GPU’s it doesn’t run.
Please let me know if you need additional information.
Issue Analytics
- State:
- Created 3 years ago
- Comments:5
Top Results From Across the Web
Examples — pytorch-transformers 1.0.0 documentation
How to use gradient-accumulation, multi-gpu training, distributed training, optimize on CPU and 16-bits training to train Bert models. Fine-tuning with ...
Read more >Multi Class Text Classification With Deep Learning Using BERT
Most of the researchers submit their research papers to academic conference because its a faster way of making the results available.
Read more >https://openi.pcl.ac.cn/keyam/PanGu-Alpha-GPU/comm...
To demonstrate how the code scales with multiple GPUs we consider the following ... finetunes the BERT model for evaluation with the [MultiNLI...
Read more >pytorch 实现bert模型- CSDN
LongTensor of shape [batch_size, sequence_length] with the token types indices selected in [0, 1]. Type 0 corresponds to a `sentence A` and type...
Read more >Accelerate BERT inference with DeepSpeed-Inference on GPUs
2. Load vanilla BERT model and set baseline; 3. Optimize BERT for GPU using DeepSpeed InferenceEngine; 4. Evaluate the performance and speed ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @EVMartinez ,
Apologies for the delayed response. I haven’t had a chance to reproduce the error but I did a quick search in the huggingface repo and it looks it is an issue with the pytorch version (1.5). https://github.com/huggingface/transformers/issues/4189
There are 2 options:
Note: Feel free to add new issues related to nlp_recipes github repo in that repo!
I will also close this issue soon as it is not related to the explainers itself!
Janhavi
Thank you for your suggestion @janhavi13 I tried the sample notebook, when executing step 12, it started to use the 2 GPU’s then suddenly stopped and I got the following stack trace: