Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

IndexError: single positional indexer is out-of-bounds

See original GitHub issue

Describe the bug Trying to train a custom object detector when I get the error listed in the title. I think it’s because it’s not reading in my folders with the images and labels? I’ve tried 'labels' ,'/labels' and '/labels/'

Code and Data

from detecto import core, utils, visualize

# Images and XML files in separate folders
dataset = core.Dataset('labels/', 'images/')

image, target = dataset[0]
print(image, target)

model = core.Model(['bat', 'batter', 'pitch', 'field', 'player', 'scoreboard'])

model.fit(dataset)


# Specify the path to your image
image = utils.read_image('images/image0.jpg')
predictions = model.predict(image)

# predictions format: (labels, boxes, scores)
labels, boxes, scores = predictions

print(labels) 
print(boxes)
print(scores)

Stacktrace

Traceback (most recent call last):
  File "c:/Users/julis/Documents/ap-cricket/functions/train.py", line 9, in <module>
    image, target = dataset[0]
  File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\detecto\core.py", line 148, in __getitem__
    img_name = os.path.join(self._root_dir, self._csv.iloc[idx, 0])
  File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 873, in __getitem__
    return self._getitem_tuple(key)
  File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 1443, in _getitem_tuple
    self._has_valid_tuple(tup)
  File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 702, in _has_valid_tuple
    self._validate_key(k, i)
  File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 1352, in _validate_key
    self._validate_integer(key, axis)
  File "C:\Users\julis\AppData\Local\Programs\Python\Python38\lib\site-packages\pandas\core\indexing.py", line 1437, in _validate_integer
    raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds

Environment:

OS: Windows 10
Python version: 3.8
Detecto version:
torch version: 1.5.0
torchvision version : 0.6.0

Additional context Image name is : ‘image0.jpg’ Label name is: ‘image0.xml’

Issue Analytics

State:
Created 3 years ago
Comments:6 (2 by maintainers)

Top GitHub Comments

1reaction

bgriffencommented, Feb 25, 2022

I’m also caught up at the moment but will take a look when I can too!

1reaction

bgriffencommented, Feb 24, 2022

This has come up again in another revisit of detecto – Out of curiosity @alankbi, is there a way to not make the strict requirement on image_id having to be both sequential and starting at 0 in both the training and test data? I would have thought after train_test_split, you should be able to just give both dataframes to DataLoader but it seems I then have to reassign all of the image_ids to satisfy that requirement. Presumably, the image_ids can just be abstracted away and just be created on the fly based on the filename column in each train/test dataframe going to the DataLoader. e.g. just like this here. i.e.

df['image_id'] = df.groupby(['filename']).ngroup()

I’m likely missing something behind the scenes that doesn’t allow this, though. On my most recent issue relating to this (I think)…

Begin iterating over training dataset
  0%|                                                                                                        | 0/20 [00:00<?, ?it/s]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-0d2a0526aafd> in <module>
      7       d.df[column] = d.df[column].astype('int32')
      8 
----> 9 d.train()

myprogram.py in train(self, epochs)
    146 
    147     def train(self,epochs=10):
--> 148         self.losses = self.model.fit(self.train_loader,self.test_loader,epochs=epochs,verbose=True)
    149 
    150     def save(self):

~/anaconda3/lib/python3.8/site-packages/detecto-1.2.1-py3.8.egg/detecto/core.py in fit(self, dataset, val_dataset, epochs, learning_rate, momentum, weight_decay, gamma, lr_step_size, verbose)
    516 
    517             iterable = tqdm(dataset, position=0, leave=True) if verbose else dataset
--> 518             for images, targets in iterable:
    519                 self._convert_to_int_labels(targets)
    520                 images, targets = self._to_device(images, targets)

~/anaconda3/lib/python3.8/site-packages/tqdm/std.py in __iter__(self)
   1176 
   1177         try:
-> 1178             for obj in iterable:
   1179                 yield obj
   1180                 # Update and possibly print the progressbar.

~/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py in __next__(self)
    519             if self._sampler_iter is None:
    520                 self._reset()
--> 521             data = self._next_data()
    522             self._num_yielded += 1
    523             if self._dataset_kind == _DatasetKind.Iterable and \

~/anaconda3/lib/python3.8/site-packages/torch/utils/data/dataloader.py in _next_data(self)
    559     def _next_data(self):
    560         index = self._next_index()  # may raise StopIteration
--> 561         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    562         if self._pin_memory:
    563             data = _utils.pin_memory.pin_memory(data)

~/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

~/anaconda3/lib/python3.8/site-packages/torch/utils/data/_utils/fetch.py in <listcomp>(.0)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

~/anaconda3/lib/python3.8/site-packages/detecto-1.2.1-py3.8.egg/detecto/core.py in __getitem__(self, idx)
    151         object_entries = self._csv.loc[self._csv['image_id'] == idx]
    152 
--> 153         img_name = os.path.join(self._root_dir, object_entries.iloc[0, 0])
    154         image = read_image(img_name)
    155 

~/anaconda3/lib/python3.8/posixpath.py in join(a, *p)
     88                 path += sep + b
     89     except (TypeError, AttributeError, BytesWarning):
---> 90         genericpath._check_arg_types('join', a, *p)
     91         raise
     92     return path

~/anaconda3/lib/python3.8/genericpath.py in _check_arg_types(funcname, *args)
    150             hasbytes = True
    151         else:
--> 152             raise TypeError(f'{funcname}() argument must be str, bytes, or '
    153                             f'os.PathLike object, not {s.__class__.__name__!r}') from None
    154     if hasstr and hasbytes:

TypeError: join() argument must be str, bytes, or os.PathLike object, not 'int64

test.csv

width	height	cluster	xmin	ymin	xmax	ymax	image_id	filename
53	133	1	453	274	505	407	0	Position001.jpg
410	189	0	145	238	555	427	1	Position046.jpg
62	127	0	444	273	506	400	1	Position046.jpg
…

and train.csv

width	height	cluster	xmin	ymin	xmax	ymax	image_id	filename
53	133	1	453	274	505	407	0	Position001.jpg
410	189	0	145	238	555	427	1	Position046.jpg
62	127	0	444	273	506	400	1	Position046.jpg
56	123	0	200	265	256	388	1	Position046.jpg
413	192	0	148	226	562	418	2	Position028.jpg
…

As a check…

In [2]: d.train_dataset
Out[2]: <detecto.core.Dataset at 0x7f2af6a5b5e0>

In [3]: d.train_dataset[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-c58fa18afada> in <module>
----> 1 d.train_dataset[0]

~/anaconda3/lib/python3.8/site-packages/detecto-1.2.1-py3.8.egg/detecto/core.py in __getitem__(self, idx)
    151         object_entries = self._csv.loc[self._csv['image_id'] == idx]
    152 
--> 153         img_name = os.path.join(self._root_dir, object_entries.iloc[0, 0])
    154         image = read_image(img_name)
    155 

~/anaconda3/lib/python3.8/posixpath.py in join(a, *p)
     88                 path += sep + b
     89     except (TypeError, AttributeError, BytesWarning):
---> 90         genericpath._check_arg_types('join', a, *p)
     91         raise
     92     return path

~/anaconda3/lib/python3.8/genericpath.py in _check_arg_types(funcname, *args)
    150             hasbytes = True
    151         else:
--> 152             raise TypeError(f'{funcname}() argument must be str, bytes, or '
    153                             f'os.PathLike object, not {s.__class__.__name__!r}') from None
    154     if hasstr and hasbytes:

TypeError: join() argument must be str, bytes, or os.PathLike object, not 'int64'

These being assigned here in my program…

        self.train_dataset = Dataset(self.fout_train,transform=self.default_trans)
        self.test_dataset = Dataset(self.fout_test) 
        
        self.train_loader = DataLoader(self.train_dataset, batch_size=batch_size, shuffle=True)
        self.test_loader = DataLoader(self.test_dataset, shuffle=True)

Update It comes down to iloc indexing. So when the comments/docs link say the csv file contains…

CSV file contains: filename, width, height, class, xmin, ymin, xmax, ymax

The actual order of those columns matters because of

--> 153 img_name = os.path.join(self._root_dir, object_entries.iloc[0, 0]) in core.py

General comment though - shouldn’t this just be object_entries.loc[0, 'filename'] so it’s agnostic to what order the columns are put in. Also, curious as to people’s thoughts on the nrgoup() function being run to just generate the image_ids in the backend?