question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SSH] Error while indexing content from /home/administrateur : Auth fail

See original GitHub issue

(Je me permet de faire ce poste en français, il me sera plus simple de m’expliquer) Bonjour, Tout d’abord, merci pour le travail fourni sur cet outil M. Pilato.

Afin de remplacer la solution mise en place aujourd’hui dans mon entreprise (SolR + Manifold), nous souhaiterions utiliser elasticstack + fscrawler car nous travaillons sur un volume d’indexation de documents de + de 10M et plus du double de mails.

J’effectue donc en ce moment même des tests sur une Debian. Le problème rencontré, est le suivant :

[administrateur@srv-elastic-pack-test:~/.fscrawler/test$ fscrawler test --trace
11:17:13,162 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [1/doc.json] already exists
11:17:13,164 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [1/folder.json] already exists
11:17:13,164 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [1/_settings.json] already exists
11:17:13,164 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [2/doc.json] already exists
11:17:13,165 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [2/folder.json] already exists
11:17:13,165 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [2/_settings.json] already exists
11:17:13,165 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [5/doc.json] already exists
11:17:13,165 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [5/folder.json] already exists
11:17:13,165 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [5/_settings.json] already exists
11:17:13,171 DEBUG [f.p.e.c.f.FsCrawler] Starting job [test]...
11:17:13,296 TRACE [f.p.e.c.f.FsCrawler] settings used for this crawler: [{
  "name" : "test",
  "fs" : {
    "url" : "/home/administrateur",
    "update_rate" : "1m",
    "includes" : [ "*.doc" ],
    "json_support" : false,
    "filename_as_id" : false,
    "add_filesize" : true,
    "remove_deleted" : true,
    "add_as_inner_object" : false,
    "store_source" : false,
    "index_content" : true,
    "attributes_support" : false,
    "raw_metadata" : true,
    "xml_support" : false,
    "index_folders" : true,
    "lang_detect" : false
  },
  "server" : {
    "hostname" : "192.168.37.41",
    "port" : 22,
    "username" : "administrateur",
    "protocol" : "ssh"
  },
  "elasticsearch" : {
    "nodes" : [ {
      "host" : "127.0.0.1",
      "port" : 9200,
      "scheme" : "HTTP"
    } ],
    "type" : "doc",
    "bulk_size" : 100,
    "flush_interval" : "5s"
  },
  "rest" : {
    "scheme" : "HTTP",
    "host" : "127.0.0.1",
    "port" : 8080,
    "endpoint" : "fscrawler"
  }
}]
11:17:13,300 INFO  [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
11:17:13,300 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler started in watch mode. It will run unless you stop it with CTRL+C.
11:17:13,699 DEBUG [f.p.e.c.f.c.ElasticsearchClient] findVersion()
11:17:13,774 TRACE [f.p.e.c.f.c.ElasticsearchClient] get server response: {name=node-0, cluster_name=es-test, cluster_uuid=E_iblWbUTU6xjk3eqgC1hA, version={number=5.1.2, build_hash=c8c4c16, build_date=2017-01-11T20:18:39.146Z, build_snapshot=false, lucene_version=6.3.0}, tagline=You Know, for Search}
11:17:13,775 DEBUG [f.p.e.c.f.c.ElasticsearchClient] findVersion() -> [5.1.2]
11:17:13,775 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Using elasticsearch >= 5, so we use [stored_fields] as fields option
11:17:13,775 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Using elasticsearch >= 5, so we can use ingest node feature
11:17:13,776 DEBUG [f.p.e.c.f.c.BulkProcessor] Creating a bulk processor with size [100], flush [5s], pipeline [null]
11:17:13,779 DEBUG [f.p.e.c.f.c.ElasticsearchClient] findVersion()
11:17:13,781 TRACE [f.p.e.c.f.c.ElasticsearchClient] get server response: {name=node-0, cluster_name=es-test, cluster_uuid=E_iblWbUTU6xjk3eqgC1hA, version={number=5.1.2, build_hash=c8c4c16, build_date=2017-01-11T20:18:39.146Z, build_snapshot=false, lucene_version=6.3.0}, tagline=You Know, for Search}
11:17:13,781 DEBUG [f.p.e.c.f.c.ElasticsearchClient] findVersion() -> [5.1.2]
11:17:13,782 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] FS crawler connected to an elasticsearch [5.1.2] node.
11:17:13,782 DEBUG [f.p.e.c.f.c.ElasticsearchClient] create index [test]
11:17:13,782 TRACE [f.p.e.c.f.c.ElasticsearchClient] index settings: [{
  "settings": {
    "analysis": {
      "analyzer": {
        "fscrawler_path": {
          "tokenizer": "fscrawler_path"
        }
      },
      "tokenizer": {
        "fscrawler_path": {
          "type": "path_hierarchy"
        }
      }
    }
  }
}
]
11:17:13,859 TRACE [f.p.e.c.f.c.ElasticsearchClient] index already exists. Ignoring error...
11:17:13,860 DEBUG [f.p.e.c.f.c.ElasticsearchClient] is existing type [test]/[doc]
11:17:13,864 TRACE [f.p.e.c.f.c.ElasticsearchClient] get index metadata response: {test={aliases={}, mappings={folder={properties={encoded={type=keyword, store=true}, name={type=keyword, store=true}, real={type=keyword, store=true}, root={type=keyword, store=true}, virtual={type=keyword, store=true}}}, doc={properties={attachment={type=binary}, attributes={properties={group={type=keyword}, owner={type=keyword}}}, content={type=text}, file={properties={checksum={type=keyword}, content_type={type=keyword}, extension={type=keyword}, filename={type=keyword}, filesize={type=long}, indexed_chars={type=long}, indexing_date={type=date, format=dateOptionalTime}, last_modified={type=date, format=dateOptionalTime}, url={type=keyword, index=false}}}, meta={properties={author={type=text}, date={type=date, format=dateOptionalTime}, keywords={type=text}, language={type=keyword}, title={type=text}}}, object={type=object}, path={properties={encoded={type=keyword}, real={type=keyword, fields={tree={type=text, analyzer=fscrawler_path, fielddata=true}}}, root={type=keyword}, virtual={type=keyword, fields={tree={type=text, analyzer=fscrawler_path, fielddata=true}}}}}}}}, settings={index={number_of_shards=5, provided_name=test, creation_date=1486462496211, analysis={analyzer={fscrawler_path={tokenizer=fscrawler_path}}, tokenizer={fscrawler_path={type=path_hierarchy}}}, number_of_replicas=1, uuid=9uoBSiB0TbGTjREyIWTFdw, version={created=5010299}}}}}
11:17:13,865 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Mapping [test]/[doc] already exists.
11:17:13,865 DEBUG [f.p.e.c.f.c.ElasticsearchClient] is existing type [test]/[folder]
11:17:13,869 TRACE [f.p.e.c.f.c.ElasticsearchClient] get index metadata response: {test={aliases={}, mappings={folder={properties={encoded={type=keyword, store=true}, name={type=keyword, store=true}, real={type=keyword, store=true}, root={type=keyword, store=true}, virtual={type=keyword, store=true}}}, doc={properties={attachment={type=binary}, attributes={properties={group={type=keyword}, owner={type=keyword}}}, content={type=text}, file={properties={checksum={type=keyword}, content_type={type=keyword}, extension={type=keyword}, filename={type=keyword}, filesize={type=long}, indexed_chars={type=long}, indexing_date={type=date, format=dateOptionalTime}, last_modified={type=date, format=dateOptionalTime}, url={type=keyword, index=false}}}, meta={properties={author={type=text}, date={type=date, format=dateOptionalTime}, keywords={type=text}, language={type=keyword}, title={type=text}}}, object={type=object}, path={properties={encoded={type=keyword}, real={type=keyword, fields={tree={type=text, analyzer=fscrawler_path, fielddata=true}}}, root={type=keyword}, virtual={type=keyword, fields={tree={type=text, analyzer=fscrawler_path, fielddata=true}}}}}}}}, settings={index={number_of_shards=5, provided_name=test, creation_date=1486462496211, analysis={analyzer={fscrawler_path={tokenizer=fscrawler_path}}, tokenizer={fscrawler_path={type=path_hierarchy}}}, number_of_replicas=1, uuid=9uoBSiB0TbGTjREyIWTFdw, version={created=5010299}}}}}
11:17:13,869 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Mapping [test]/[folder] already exists.
11:17:13,871 DEBUG [f.p.e.c.f.FsCrawlerImpl] creating fs crawler thread [test] for [/home/administrateur] every [1m]
11:17:13,879 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler started for [test] for [/home/administrateur] every [1m]
11:17:13,880 DEBUG [f.p.e.c.f.FsCrawlerImpl] Fs crawler thread [test] is now running. Run #1...
11:17:13,881 DEBUG [f.p.e.c.f.f.FileAbstractor] Opening SSH connection to administrateur@192.168.37.41
11:17:19,162 WARN  [f.p.e.c.f.FsCrawlerImpl] Error while indexing content from /home/administrateur: Auth fail
11:17:19,162 WARN  [f.p.e.c.f.FsCrawlerImpl] Error while closing the connection: java.lang.NullPointerException
11:17:19,163 DEBUG [f.p.e.c.f.FsCrawlerImpl] Fs crawler is going to sleep for 1m
^C11:18:09,811 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [test]
11:18:09,813 DEBUG [f.p.e.c.f.FsCrawlerImpl] Fs crawler is now waking up again...
11:18:09,813 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread [test] is now marked as closed...
11:18:09,813 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
11:18:09,814 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler Rest service stopped
11:18:09,814 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] Closing Elasticsearch client manager
11:18:09,814 DEBUG [f.p.e.c.f.c.BulkProcessor] Closing BulkProcessor

J’ai effectué plusieurs tentatives différentes :

  • Avec le nom à la place de l’ip
  • Test sur les serveurs de prod et de test
  • Test vers une machine en local sans firewall (test plus haut)
  • Test avec différents chemins
  • Test avec la version 2.3 SNAPSHOT de fscrawler et la 2.2

Mes versions sont les suivantes :

  • Java SDK 1.8.0_121
  • ElastickPack 5.1.2
  • fscrawler 2.2 (echec) puis test avec 2.3 SNAPSHOT (echec)

Questions : 1 : Y a-t-il la possibilité de spécifier pour un même job plusieurs urls ? 2 : Le crawler parcours-t-il l’arborescence complètement du chemin spécifié ou juste les fichiers dans le dossier spécifié ?

  /**principal_folder**
    /folder
      files
    files

ou

  /**principal_folder**
    files

Merci pour l’aide que vous allez m’apporter ! Cordialement, MS

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
MaxenceSAUNIERcommented, Mar 1, 2017

Je confirme que cela fonctionne. Merci !

0reactions
dadoonetcommented, Feb 21, 2017

Normalement c’est fixé avec #329.

Il faudrait essayer la dernière version SNAPSHOT pour confirmer. Merci !

Read more comments on GitHub >

github_iconTop Results From Across the Web

Why do I receive an "Auth fail" error when interacting with a Git ...
When trying to clone my Git remote in MATLAB over SSH or when trying to interact with (push, pull, etc.) an already successfully...
Read more >
[ANNOUNCEMENT] - Elasticsearch File System Crawler 2.3 ...
[SSH] Error while indexing content from /home/administrateur : Auth fail Issue: 316. Thanks to Moltroon. tesseract usage for OCR Issue: 314.
Read more >
"Auth Fail" Git error - Looker Community
Auth fail means either a) that git repo isn't open to the network so Looker can't SSH to it at all, no matter...
Read more >
"Auth Failed" error with EGit and GitHub - eclipse
ssh /id_rsa ) and I've no problem in connect by ssh or git console client. Some blogs says that is a problem with...
Read more >
SSH authentication fails : IDEA-214582 - JetBrains YouTrack
Move the same config/cert to DG, I get "SSH: Auth fail"" ... files Elapsed time on auto-detect: 0 ms 2016-07-14 09:14:26,838 [ 39299]...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found