Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Simultaneous Machine Translation-MMA]:Building 'alignment_train_cuda_binding' extension

See original GitHub issue

Bug

Building ‘alignment_train_cuda_binding’ extension, CUB building problem – Simultaneous Machine Translation Example

To Reproduce & Code sample

There is example how to use MMA model(Simultaneous Machine Translation) on page: https://github.com/pytorch/fairseq/blob/main/examples/simultaneous_translation/docs/ende-mma.md

First, I follow the main page to install fairseq.

    git clone https://github.com/pytorch/fairseq
    cd fairseq
    pip install --editable ./

I successfully installed fairseq.

Then, I tried to run example code:

    fairseq-train \
    data-bin/wmt17_en_de \
    --simul-type waitk \
    --waitk-lagging 3 \
    --mass-preservation \
    --criterion label_smoothed_cross_entropy \
    --max-update 50000 \
    --arch transformer_monotonic_iwslt_de_en \
    --save-dir checkpoints/monotonic_wmt_en_de \
    --optimizer adam \
    --adam-betas '(0.9, 0.98)' \
    --lr-scheduler 'inverse_sqrt' \
    --warmup-init-lr 1e-7  \
    --warmup-updates 4000 \
    --lr 5e-4 \
    --stop-min-lr 1e-9 \
    --clip-norm 0.0 \
    --weight-decay 0.0001\
    --dropout 0.3 \
    --label-smoothing 0.1\
    --max-tokens 3584 \

But it doesn’t work after commit with error, see error

    ModuleNotFoundError: NO module named 'alignment_train_cuda_binding'

(I noticed that this is a new module updated before 30 days)

To build relative extension package and pass the code ‘from alignment_train_cuda_binding import alignment_train_cuda’
I set the CUDA_HOME path in ~/.bashrc and implemented the code in terminal

     python setup.py build_ext --inplace

but when building ‘alignment_train_cuda_binding’ extension, see error

     fatal error: cub/cub.cuh:no such file or directory
     #include <cub/cub.cuh>
     compilation terminated.

And I google this issue, someone say it should lack CUB package.

So I git clone latest version CUB and put it in path: ‘/usr/local/cuda-10.2/targets/x86_64-linux/include/cub’

    git clone https://github.com/NVIDIA/cub
    /usr/local/cuda-10.2/targets/x86_64-linux/include/cub

Again, I implemented the code in terminal

     python setup.py build_ext --inplace

But a new problem happened, see error

    building 'alignment_train_cuda_binding' extension
    /usr/local/cuda-10.2/bin/nvcc -I xxx
.............
    /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/block/../iterator/cache_modified_input_iterator.cuh(116):error: a class or namespace qualified name is required

    /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/block/../iterator/cache_modified_input_iterator.cuh(116):error: qualified name is not allowed

    /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/block/../iterator/cache_modified_input_iterator.cuh(116):error: expected a ";"

    /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/agent/agent_merge_sort.cuh(80):error: a class or namespace qualified name is required

    /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/agent/agent_merge_sort.cuh(80):error: qualified name is not allowed

..............

error: command '/usr/local/cuda-10.2/bin/nvcc' failed with exit status 1

Expected behavior

Please help me to solve this issue.Can you tell me how to solve the problem? Thanks a lot!

I guess:

whether the cuda10.2 don’t support this module ?
And should I try to download a old version CUB library, and which version?
or other methods? maybe I can install a old version fairseq(0.10.0) which don’t need module named ‘alignment_train_cuda_binding’.

Environment

fairseq Version : main brach ;1.0.0a0+2380a6e (confused number)
PyTorch Version : 1.10+cu
OS : Ubuntu 18.04
How you installed fairseq : pip install --editable ./
Python version : 3.6.8 virtualenv
CUDA/cuDNN version : cuda 10.2 / cuDNN temporary empty
GPU models and configuration : Quadro RTX 5000
Any other relevant information :

Additional context

<Sorry, because of privacy, I cannot upload code and picture of my error>

Issue Analytics

State:
Created 2 years ago
Comments:15

Top GitHub Comments

2reactions

EricLinacommented, Mar 10, 2022

I find the reason! that is because I use the units_to_segment from the enja.agent , actually , I should use yours:

    def units_to_segment(self, unit_queue, states):
        """
        queue: stores bpe tokens.
        server: accept words.

        Therefore, we need merge subwords into word. we find the first
        subword that starts with BOW_PREFIX, then merge with subwords
        prior to this subword, remove them from queue, send to server.
        """

        # Merge sub word to full word.
        tgt_dict = self.dict["tgt"]

        # if segment starts with eos, send EOS
        if tgt_dict.eos() == unit_queue[0]:
            return DEFAULT_EOS

        # if force finish, there will be None's
        segment = []
        if None in unit_queue.value:
            unit_queue.value.remove(None)

        src_len = len(states.units.source)
        if (
            (len(unit_queue) > 0 and tgt_dict.eos() == unit_queue[-1])
            or len(states.units.target) > self.max_len
        ):
            hyp = tgt_dict.string(
                unit_queue,
                "sentencepiece",
            )
            if self.pre_tokenizer is not None:
                hyp = self.pre_tokenizer.decode(hyp)
            return [hyp] + [DEFAULT_EOS]

        for index in unit_queue:
            token = tgt_dict.string([index])
            if token.startswith(BOW_PREFIX):
                if len(segment) == 0:
                    segment += [token.replace(BOW_PREFIX, "")]
                else:
                    for j in range(len(segment)):
                        unit_queue.pop()

                    string_to_return = ["".join(segment)]

                    if tgt_dict.eos() == unit_queue[0]:
                        string_to_return += [DEFAULT_EOS]

                    return string_to_return
            else:
                segment += [token.replace(BOW_PREFIX, "")]

        return None

1reaction

me301commented, Feb 4, 2022

I found that git reset --hard dd3bd3c0497ae9a7ae7364404a6b0a4c501780b3 could solve this problem

Instead of git reset, git checkout is better.

git checkout dd3bd3c0497ae9a7ae7364404a6b0a4c501780b3

and git checkout main to go back to the main branch.

Top Results From Across the Web

[Simultaneous Machine Translation-MMA]:Building ... - GitHub

[Simultaneous Machine Translation-MMA]:Building 'alignment_train_cuda_binding' extension #4085.

Modeling Dual Read/Write Paths for Simultaneous Machine ...

In the necessity evaluation, our method surpasses 'Wait-k' and 'MMA', and starts translation much closer to the aligned source word, which shows ...

arXiv:2203.09072v1 [cs.CL] 17 Mar 2022

Simultaneous machine translation (SiMT) out- puts translation while receiving the stream ... MMA since separating translation and alignment.

CUDA C++ Programming Guide - NVIDIA Documentation Center

This scalable programming model allows the GPU architecture to span a wide market range by simply scaling the number of multiprocessors and memory...

Untitled

Perfect book binding equipment, Lokasi toko h&m di jakarta, Soffici plus catalogo, Bloodseeker dota 2 guide build? Codigo postal caucel merida yucatan, ...