Duplicates returned when listing nodes files
See original GitHub issueThis was originally reported in https://github.com/ropensci/osfr/issues/150 by @doomlab.
The reprex here shows that the same file, 421_Lu.pdf
, is returned twice when listing files in the Local IRB directory within this project.
I’ve confirmed that the duplicate entries are coming from the OSF API, across different pages of results:
#!/usr/bin/env bash
set -e
TOKEN="$OSF_PAT"
NODE="ycn7z"
ID="6113d75ae3801305b39612a8"
LIMIT=2
# Retrieve name and path attributes from JSON response
JQ_FILTER='.data[].attributes | "\(.name) \(.path)"'
for i in $(seq 1 $LIMIT); do
echo "Retrieving page $i"
curl --silent \
"https://api.osf.io/v2/nodes/$NODE/files/osfstorage/$ID/?page=$i" \
-H "Authorization: Bearer $TOKEN" \
-H 'Accept-Header: application/vnd.api+json' \
-H 'Content-Type: application/json; charset=utf-8' \
| jq $JQ_FILTER
done
## Retrieving page 1
## "97_Pfuhl.pdf /6163f0e5fd5b230191983824"
## "1897_Parker.pdf /616440dfc5565801d34b71bf"
## "1698_Butt.pdf /616513bbc5565802014b9ae6"
## "1970_Pavlović.pdf /617436dae572ea00b13a7285"
## "1560_Irrazabal.pdf /618281a0a30f8100cdaa071d"
## "1867_Oner.pdf /6184db04bfb47d00a3ef50dd"
## "169_Montefinese.pdf /6186148c25f90a004a0f6aa6"
## "87_Vaughn.docx /619548800b0c1e01a27fdae5"
## "35_Stewart.pdf /6197ca37ef62980009f5c789"
## "421_Lu.pdf /6161fcd9fd5b2301429849b3" <-- copy 1
##
## Retrieving page 2
## "423_Arriaga.pdf /619d017da83c2001650e8e53"
## "761_Papadatou-Pastou.pdf /619df2886977cd010f496498"
## "712_Davis.pdf /61a7d30d4d4ce5018476e569"
## "1574_Al-Hoorie.pdf /61b89ac6da0b1b0488d05546"
## "206_Ergiyen.pdf /61cc42f3da632006e1fe6f4a"
## "437_Peker.pdf /61fc2630370e6c002bf3d6cc"
## "104_Stieger.pdf /620e3a2511da1c05cdf57647"
## "238_Martínez.pdf /620f7666d9b6cf0144b90449"
## "1052_Parzuchowski.pdf /6220fbccc064270378d90ce5"
## "421_Lu.pdf /6161fcd9fd5b2301429849b3" <-- copy 2
The waterbutler IDs are identical so this does seem like a possible bug.
Let me know if you need any more information.
Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Directory.GetFiles Returning Duplicate File Names
GetFiles("my_directory_name_on_network") and it's just fine for most of the 11320 files in the directory, but one of them is in there twice. Has ......
Read more >Remove duplicates from a sorted linked list - GeeksforGeeks
Write a function that takes a list sorted in non-decreasing order and deletes any duplicate nodes from the list. The list should only...
Read more >Filter duplicate files in NifI - Cloudera Community - 305963
I have getSFTP processor which runs on 3 nodes. The getSFTP previously was running on primary node. As my node - 305963.
Read more >Duplicate file poling from the shared directory in a clustered ...
All 3 Mule instances within the cluster pick up files form the same folder. The issue is that it's writing duplicate messages to...
Read more >Ignoring Duplicate Elements - XSLT Cookbook [Book] - O'Reilly
This code fails because position( ) returns the position after sorting, but the contents of $products has not been sorted; instead, an inaccessible...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thanks @aaronwolen - I will note that when I run the same code I get a different file duplicated. And the duplicated file sometimes changes, usually when I update/upload a new file. You can see my reprex here.
@aaronwolen
It is not documented, unfortunately this param is not implemented consistently over all the endpoints it’s applied. Some queries, legacy endpoints and attributes haven’t been QA’ed for accurate sorting, so they remain undocumented.
The default sorting behavior for list view is to allow the user to sort on any of the
attributes
returned in JSON payload. For example https://api.osf.io/v2/users/ allows you to sort onfull_name
,given_name
,middle_names
,family_name
,suffix
,date_registered
,active
,tiimezone
,locale
,social
,employment
andeducation
.I did not check, as I’ve written this behavior is not guaranteed to be accurate or consistent.