question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Duplicates returned when listing nodes files

See original GitHub issue

This was originally reported in https://github.com/ropensci/osfr/issues/150 by @doomlab.

The reprex here shows that the same file, 421_Lu.pdf, is returned twice when listing files in the Local IRB directory within this project.

I’ve confirmed that the duplicate entries are coming from the OSF API, across different pages of results:

#!/usr/bin/env bash

set -e

TOKEN="$OSF_PAT"
NODE="ycn7z"
ID="6113d75ae3801305b39612a8"
LIMIT=2

# Retrieve name and path attributes from JSON response
JQ_FILTER='.data[].attributes | "\(.name) \(.path)"'

for i in $(seq 1 $LIMIT); do
  echo "Retrieving page $i"
  curl --silent \
    "https://api.osf.io/v2/nodes/$NODE/files/osfstorage/$ID/?page=$i" \
    -H "Authorization: Bearer $TOKEN" \
    -H 'Accept-Header: application/vnd.api+json' \
    -H 'Content-Type: application/json; charset=utf-8' \
    | jq $JQ_FILTER
done

## Retrieving page 1
## "97_Pfuhl.pdf /6163f0e5fd5b230191983824"
## "1897_Parker.pdf /616440dfc5565801d34b71bf"
## "1698_Butt.pdf /616513bbc5565802014b9ae6"
## "1970_Pavlović.pdf /617436dae572ea00b13a7285"
## "1560_Irrazabal.pdf /618281a0a30f8100cdaa071d"
## "1867_Oner.pdf /6184db04bfb47d00a3ef50dd"
## "169_Montefinese.pdf /6186148c25f90a004a0f6aa6"
## "87_Vaughn.docx /619548800b0c1e01a27fdae5"
## "35_Stewart.pdf /6197ca37ef62980009f5c789"
## "421_Lu.pdf /6161fcd9fd5b2301429849b3"                  <-- copy 1
##
## Retrieving page 2
## "423_Arriaga.pdf /619d017da83c2001650e8e53"
## "761_Papadatou-Pastou.pdf /619df2886977cd010f496498"
## "712_Davis.pdf /61a7d30d4d4ce5018476e569"
## "1574_Al-Hoorie.pdf /61b89ac6da0b1b0488d05546"
## "206_Ergiyen.pdf /61cc42f3da632006e1fe6f4a"
## "437_Peker.pdf /61fc2630370e6c002bf3d6cc"
## "104_Stieger.pdf /620e3a2511da1c05cdf57647"
## "238_Martínez.pdf /620f7666d9b6cf0144b90449"
## "1052_Parzuchowski.pdf /6220fbccc064270378d90ce5"
## "421_Lu.pdf /6161fcd9fd5b2301429849b3"                  <-- copy 2

The waterbutler IDs are identical so this does seem like a possible bug.

Let me know if you need any more information.

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
doomlabcommented, Oct 22, 2022

Thanks @aaronwolen - I will note that when I run the same code I get a different file duplicated. And the duplicated file sometimes changes, usually when I update/upload a new file. You can see my reprex here.

0reactions
Johnetordoffcommented, Oct 24, 2022

@aaronwolen

I didn’t even know about the sort param. Is it documented somewhere and I missed it?

It is not documented, unfortunately this param is not implemented consistently over all the endpoints it’s applied. Some queries, legacy endpoints and attributes haven’t been QA’ed for accurate sorting, so they remain undocumented.

What other attributes can we sort on?

The default sorting behavior for list view is to allow the user to sort on any of the attributes returned in JSON payload. For example https://api.osf.io/v2/users/ allows you to sort on full_name, given_name, middle_names, family_name, suffix, date_registered, active, tiimezone, locale, social, employment and education.

Will sorting on any attribute solve the issue?

I did not check, as I’ve written this behavior is not guaranteed to be accurate or consistent.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Directory.GetFiles Returning Duplicate File Names
GetFiles("my_directory_name_on_network") and it's just fine for most of the 11320 files in the directory, but one of them is in there twice. Has ......
Read more >
Remove duplicates from a sorted linked list - GeeksforGeeks
Write a function that takes a list sorted in non-decreasing order and deletes any duplicate nodes from the list. The list should only...
Read more >
Filter duplicate files in NifI - Cloudera Community - 305963
I have getSFTP processor which runs on 3 nodes. The getSFTP previously was running on primary node. As my node - 305963.
Read more >
Duplicate file poling from the shared directory in a clustered ...
All 3 Mule instances within the cluster pick up files form the same folder. The issue is that it's writing duplicate messages to...
Read more >
Ignoring Duplicate Elements - XSLT Cookbook [Book] - O'Reilly
This code fails because position( ) returns the position after sorting, but the contents of $products has not been sorted; instead, an inaccessible...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found