question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[Simultaneous Machine Translation-MMA]:Building 'alignment_train_cuda_binding' extension

See original GitHub issue

Bug

Building ‘alignment_train_cuda_binding’ extension, CUB building problem – Simultaneous Machine Translation Example

To Reproduce & Code sample

There is example how to use MMA model(Simultaneous Machine Translation) on page: https://github.com/pytorch/fairseq/blob/main/examples/simultaneous_translation/docs/ende-mma.md

  1. First, I follow the main page to install fairseq.
    git clone https://github.com/pytorch/fairseq
    cd fairseq
    pip install --editable ./

I successfully installed fairseq.

  1. Then, I tried to run example code:
    fairseq-train \
    data-bin/wmt17_en_de \
    --simul-type waitk \
    --waitk-lagging 3 \
    --mass-preservation \
    --criterion label_smoothed_cross_entropy \
    --max-update 50000 \
    --arch transformer_monotonic_iwslt_de_en \
    --save-dir checkpoints/monotonic_wmt_en_de \
    --optimizer adam \
    --adam-betas '(0.9, 0.98)' \
    --lr-scheduler 'inverse_sqrt' \
    --warmup-init-lr 1e-7  \
    --warmup-updates 4000 \
    --lr 5e-4 \
    --stop-min-lr 1e-9 \
    --clip-norm 0.0 \
    --weight-decay 0.0001\
    --dropout 0.3 \
    --label-smoothing 0.1\
    --max-tokens 3584 \

But it doesn’t work after commit with error, see error

    ModuleNotFoundError: NO module named 'alignment_train_cuda_binding'

(I noticed that this is a new module updated before 30 days)

  1. To build relative extension package and pass the code ‘from alignment_train_cuda_binding import alignment_train_cuda’
  2. I set the CUDA_HOME path in ~/.bashrc and implemented the code in terminal
     python setup.py build_ext --inplace 

but when building ‘alignment_train_cuda_binding’ extension, see error

     fatal error: cub/cub.cuh:no such file or directory
     #include <cub/cub.cuh>
     compilation terminated.

And I google this issue, someone say it should lack CUB package.

  1. So I git clone latest version CUB and put it in path: ‘/usr/local/cuda-10.2/targets/x86_64-linux/include/cub’
    git clone https://github.com/NVIDIA/cub
    /usr/local/cuda-10.2/targets/x86_64-linux/include/cub
  1. Again, I implemented the code in terminal
     python setup.py build_ext --inplace 

But a new problem happened, see error

    building 'alignment_train_cuda_binding' extension
    /usr/local/cuda-10.2/bin/nvcc -I xxx
.............
    /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/block/../iterator/cache_modified_input_iterator.cuh(116):error: a class or namespace qualified name is required

    /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/block/../iterator/cache_modified_input_iterator.cuh(116):error: qualified name is not allowed

    /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/block/../iterator/cache_modified_input_iterator.cuh(116):error: expected a ";"

    /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/agent/agent_merge_sort.cuh(80):error: a class or namespace qualified name is required

    /usr/local/cuda-10.2/bin/../targets/x86_64-linux/include/cub/agent/agent_merge_sort.cuh(80):error: qualified name is not allowed

..............

error: command '/usr/local/cuda-10.2/bin/nvcc' failed with exit status 1

Expected behavior

Please help me to solve this issue.Can you tell me how to solve the problem? Thanks a lot!

I guess:

  1. whether the cuda10.2 don’t support this module ?
  2. And should I try to download a old version CUB library, and which version?
  3. or other methods? maybe I can install a old version fairseq(0.10.0) which don’t need module named ‘alignment_train_cuda_binding’.

Environment

  • fairseq Version : main brach ;1.0.0a0+2380a6e (confused number)
  • PyTorch Version : 1.10+cu
  • OS : Ubuntu 18.04
  • How you installed fairseq : pip install --editable ./
  • Python version : 3.6.8 virtualenv
  • CUDA/cuDNN version : cuda 10.2 / cuDNN temporary empty
  • GPU models and configuration : Quadro RTX 5000
  • Any other relevant information :

Additional context

<Sorry, because of privacy, I cannot upload code and picture of my error>

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:15

github_iconTop GitHub Comments

2reactions
EricLinacommented, Mar 10, 2022

I find the reason! that is because I use the units_to_segment from the enja.agent , actually , I should use yours:

    def units_to_segment(self, unit_queue, states):
        """
        queue: stores bpe tokens.
        server: accept words.

        Therefore, we need merge subwords into word. we find the first
        subword that starts with BOW_PREFIX, then merge with subwords
        prior to this subword, remove them from queue, send to server.
        """

        # Merge sub word to full word.
        tgt_dict = self.dict["tgt"]

        # if segment starts with eos, send EOS
        if tgt_dict.eos() == unit_queue[0]:
            return DEFAULT_EOS

        # if force finish, there will be None's
        segment = []
        if None in unit_queue.value:
            unit_queue.value.remove(None)

        src_len = len(states.units.source)
        if (
            (len(unit_queue) > 0 and tgt_dict.eos() == unit_queue[-1])
            or len(states.units.target) > self.max_len
        ):
            hyp = tgt_dict.string(
                unit_queue,
                "sentencepiece",
            )
            if self.pre_tokenizer is not None:
                hyp = self.pre_tokenizer.decode(hyp)
            return [hyp] + [DEFAULT_EOS]

        for index in unit_queue:
            token = tgt_dict.string([index])
            if token.startswith(BOW_PREFIX):
                if len(segment) == 0:
                    segment += [token.replace(BOW_PREFIX, "")]
                else:
                    for j in range(len(segment)):
                        unit_queue.pop()

                    string_to_return = ["".join(segment)]

                    if tgt_dict.eos() == unit_queue[0]:
                        string_to_return += [DEFAULT_EOS]

                    return string_to_return
            else:
                segment += [token.replace(BOW_PREFIX, "")]

        return None
1reaction
me301commented, Feb 4, 2022

I found that git reset --hard dd3bd3c0497ae9a7ae7364404a6b0a4c501780b3 could solve this problem

Instead of git reset, git checkout is better.

git checkout dd3bd3c0497ae9a7ae7364404a6b0a4c501780b3

and git checkout main to go back to the main branch.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[Simultaneous Machine Translation-MMA]:Building ... - GitHub
[Simultaneous Machine Translation-MMA]:Building 'alignment_train_cuda_binding' extension #4085.
Read more >
Modeling Dual Read/Write Paths for Simultaneous Machine ...
In the necessity evaluation, our method surpasses 'Wait-k' and 'MMA', and starts translation much closer to the aligned source word, which shows ...
Read more >
arXiv:2203.09072v1 [cs.CL] 17 Mar 2022
Simultaneous machine translation (SiMT) out- puts translation while receiving the stream ... MMA since separating translation and alignment.
Read more >
CUDA C++ Programming Guide - NVIDIA Documentation Center
This scalable programming model allows the GPU architecture to span a wide market range by simply scaling the number of multiprocessors and memory...
Read more >
Untitled
Perfect book binding equipment, Lokasi toko h&m di jakarta, Soffici plus catalogo, Bloodseeker dota 2 guide build? Codigo postal caucel merida yucatan, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found