question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] KeyError 'start' when getting captions from a video

See original GitHub issue

I keep getting KeyError: 'start' when I try to get a caption from a video in a playlist.

To Reproduce Here is the code I am trying to test:

import pytube
from pytube import Playlist, YouTube

url = "https://www.youtube.com/watch?v=vKA4w2O61Xo&list=PLkahZjV5wKe8WFEwvs69V7JO-Cx57rZ8W"
p = Playlist(url)

for v in p.videos[:3]:
    print("trying to get captions for:", v.title)
    print(v.captions["a.en"].generate_srt_captions())

This code used to print the caption before updating pytube, but now it breaks with the following trace:

KeyError                                  Traceback (most recent call last)
~\test_pytube.py in <module>
     10 for v in p.videos[:3]:
     11     print("trying to get captions for:", v.title)
---> 12     print(v.captions["a.en"].generate_srt_captions())

~\AppData\Roaming\Python\Python38\site-packages\pytube\captions.py in generate_srt_captions(s
elf)
     49         recompiles them into the "SubRip Subtitle" format.
     50         """
---> 51         return self.xml_caption_to_srt(self.xml_captions)
     52
     53     @staticmethod

~\AppData\Roaming\Python\Python38\site-packages\pytube\captions.py in xml_caption_to_srt(self
, xml_captions)
     81             except KeyError:
     82                 duration = 0.0
---> 83             start = float(child.attrib["start"])
     84             end = start + duration
     85             sequence_number = i + 1  # convert from 0-indexed to 1.

KeyError: 'start'

System information Please provide the following information:

  • Python version: Python 3.8.5
  • Pytube version: 11.0.0
  • Command used to install pytube: pip install -U pytube

Issue Analytics

  • State:open
  • Created 2 years ago
  • Reactions:4
  • Comments:18 (1 by maintainers)

github_iconTop GitHub Comments

16reactions
maksimbolonkincommented, Sep 26, 2021

Apparently YouTube changed their captions format.

Here’s my version of the function for the function in captions.py:

def xml_caption_to_srt(self, xml_captions: str) -> str:
      """Convert xml caption tracks to "SubRip Subtitle (srt)".

      :param str xml_captions:
          XML formatted caption tracks.
      """
      segments = []
      root = ElementTree.fromstring(xml_captions)[1]
      i=0
      for child in list(root):
          if child.tag == 'p':
              caption = ''
              if len(list(child))==0:
                  continue
              for s in list(child):
                  if s.tag == 's':
                      caption += ' ' + s.text
              caption = unescape(caption.replace("\n", " ").replace("  ", " "),)
              try:
                  duration = float(child.attrib["d"])/1000.0
              except KeyError:
                  duration = 0.0
              start = float(child.attrib["t"])/1000.0
              end = start + duration
              sequence_number = i + 1  # convert from 0-indexed to 1.
              line = "{seq}\n{start} --> {end}\n{text}\n".format(
                  seq=sequence_number,
                  start=self.float_to_srt_time_format(start),
                  end=self.float_to_srt_time_format(end),
                  text=caption,
              )
              segments.append(line)
              i += 1
      return "\n".join(segments).strip()
6reactions
geomags3commented, Oct 24, 2021

@maksimbolonkin I changed a bit your code, so now it works for me as well 👌 The issue I’ve found was caused by the fact that some captions are located inside of <p> tag

    def xml_caption_to_srt(self, xml_captions: str) -> str:
        """Convert xml caption tracks to "SubRip Subtitle (srt)".

        :param str xml_captions:
        XML formatted caption tracks.
        """
        segments = []
        root = ElementTree.fromstring(xml_captions)
        i=0
        for child in list(root.iter("body"))[0]:
            if child.tag == 'p':
                caption = ''
                if len(list(child))==0:
                    # instead of 'continue'
                    caption = child.text
                for s in list(child):
                    if s.tag == 's':
                        caption += ' ' + s.text
                caption = unescape(caption.replace("\n", " ").replace("  ", " "),)
                try:
                    duration = float(child.attrib["d"])/1000.0
                except KeyError:
                    duration = 0.0
                start = float(child.attrib["t"])/1000.0
                end = start + duration
                sequence_number = i + 1  # convert from 0-indexed to 1.
                line = "{seq}\n{start} --> {end}\n{text}\n".format(
                    seq=sequence_number,
                    start=self.float_to_srt_time_format(start),
                    end=self.float_to_srt_time_format(end),
                    text=caption,
                )
                segments.append(line)
                i += 1
        return "\n".join(segments).strip()

So to fix this bug we can just replace xml_caption_to_srt inside of pytube/captions.py/Caption class with current code. Hope it’s gonna work for everyone 👍

Read more comments on GitHub >

github_iconTop Results From Across the Web

I found a Key error of 'start' when using pytube.caption
Everything was fine and I managed to download the video and its caption by xml_captions. However, when I tried to covert it into...
Read more >
How to Handle a Python KeyError
The usual solution is to use . get() . If the KeyError is raised from a failed dictionary key lookup in your own...
Read more >
keyerror in Python – How to Fix Dictionary Error
When working with dictionaries in Python, a KeyError gets raised when you try to access an item that doesn't exist in a Python...
Read more >
Python KeyError
For example, if we obtain an error from a dictionary in our own code, we may use the .get() method to get either...
Read more >
Troubleshoot captions-related issues – Help Center
If you receive the “Unexpected Text Track Upload Type” error, double-check to make sure that you are uploading a file type we accept...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found