question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

What decides loading files order at file input plugin?

See original GitHub issue

When I use embulk-input-file(embedded plugin) with following files

imitation_matter__c_load1_4_all.csv
imitation_matter__c_load1_4_minimal_null.csv
imitation_matter__c_load2_10000_minimal_null.csv
imitation_matter__c_load3_11000_minimal_null.csv

then “Loading files” order is different in server A and B. (both environments are CentOS 6.4 / Java 1.8.0_112 / embulk 0.8.15)

  • Server A
imitation_matter__c_load1_4_all.csv
imitation_matter__c_load1_4_minimal_null.csv
imitation_matter__c_load2_10000_minimal_null.csv
imitation_matter__c_load3_11000_minimal_null.csv
  • Server B
imitation_matter__c_load1_4_all.csv
imitation_matter__c_load2_10000_minimal_null.csv
imitation_matter__c_load3_11000_minimal_null.csv
imitation_matter__c_load1_4_minimal_null.csv

“Loading files” order seems to be the same find command(find <path> -name "imitation_matter__c_*") result. And find command result is different in server A and B.

I expected “Loading files” order is decided by filename order but it’ not. so what decides loading files order?(or order differs just in the logs?)

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
noissefnoccommented, Dec 1, 2016
  • locale settings: same (checked by locale command)
  • max_threads: 1: same result (it’s not multi-thread case)
  • filesystem: same (ext4)

I also check different servers and I find filesystem mount options are different with different sort order.

  • ? : Server A(I don’t have permission to check) and Windows7
imitation_matter__c_load1_4_all.csv
imitation_matter__c_load1_4_minimal_null.csv
imitation_matter__c_load2_10000_minimal_null.csv
imitation_matter__c_load3_11000_minimal_null.csv
  • default : Server B order
imitation_matter__c_load1_4_all.csv
imitation_matter__c_load2_10000_minimal_null.csv
imitation_matter__c_load3_11000_minimal_null.csv
imitation_matter__c_load1_4_minimal_null.csv
  • default,nobarrier :
imitation_matter__c_load3_10004_minimal_null.csv
imitation_matter__c_load2_10000_minimal_null.csv
imitation_matter__c_load1_4_minimal_null.csv
imitation_matter__c_load3_11000_minimal_null.csv
imitation_matter__c_load1_4_all.csv
  • default,relatime,nobarrier :
imitation_matter__c_load2_10000_minimal_null.csv
imitation_matter__c_load3_11000_minimal_null.csv
imitation_matter__c_load1_4_all.csv
imitation_matter__c_load1_4_minimal_null.csv
imitation_matter__c_load3_10004_minimal_null.csv

I think this may affect sort order.

Anyway

  • This may be not embulk issue
  • “Loading files” order with single-thread follows find (also ls -U) order.

So I close this issue. Thanks @hiroyuki-sato @uu59

0reactions
hiroyuki-satocommented, Dec 1, 2016

@noissefnoc

Thank you for sharing the information.

Could you clarify the “Loading files” meanings?

It is the following in embulk run? Correct?

Loading files [/private/tmp/hoge/csv/sample_01.csv.gz, /private/tmp/hoge/csv/sample_02.csv.gz]

I think, this list will sort after output the logs.

“Loading files” output part. https://github.com/embulk/embulk/blob/master/embulk-standards/src/main/java/org/embulk/standards/LocalFileInputPlugin.java#L64

sort part. https://github.com/embulk/embulk/blob/master/embulk-standards/src/main/java/org/embulk/standards/LocalFileInputPlugin.java#L92

Read more comments on GitHub >

github_iconTop Results From Across the Web

File input plugin | Logstash Reference [8.5] - Elastic
The plugin loops between discovering new files and processing each discovered file. Discovered files have a lifecycle, they start off in the ...
Read more >
Bootstrap File Input Options - © Kartik - Krajee JQuery Plugins
All the options to the bootstrap file input plugin can be passed typically via the javascript object at plugin initialization. Depending on your...
Read more >
How to select multiple files with <input type="file">?
New answer: In HTML5 you can add the multiple attribute to select more than 1 file. <input type="file" name="filefield" multiple="multiple">. Old answer:.
Read more >
Using HTML File Input for Uploading Native iOS/Android Files
In this tutorial, we discuss how the HTML file input element works on iOS and Android.
Read more >
telegraf/CONFIGURATION.md at master · influxdata ... - GitHub
View the default telegraf.conf config file with all available plugins. ... telegraf config --input-filter cpu:mem:net:swap --output-filter influxdb:kafka.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found