Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[SSH] Error while indexing content from /home/administrateur : Auth fail

See original GitHub issue

(Je me permet de faire ce poste en français, il me sera plus simple de m’expliquer) Bonjour, Tout d’abord, merci pour le travail fourni sur cet outil M. Pilato.

Afin de remplacer la solution mise en place aujourd’hui dans mon entreprise (SolR + Manifold), nous souhaiterions utiliser elasticstack + fscrawler car nous travaillons sur un volume d’indexation de documents de + de 10M et plus du double de mails.

J’effectue donc en ce moment même des tests sur une Debian. Le problème rencontré, est le suivant :

[administrateur@srv-elastic-pack-test:~/.fscrawler/test$ fscrawler test --trace
11:17:13,162 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [1/doc.json] already exists
11:17:13,164 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [1/folder.json] already exists
11:17:13,164 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [1/_settings.json] already exists
11:17:13,164 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [2/doc.json] already exists
11:17:13,165 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [2/folder.json] already exists
11:17:13,165 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [2/_settings.json] already exists
11:17:13,165 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [5/doc.json] already exists
11:17:13,165 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [5/folder.json] already exists
11:17:13,165 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] Mapping [5/_settings.json] already exists
11:17:13,171 DEBUG [f.p.e.c.f.FsCrawler] Starting job [test]...
11:17:13,296 TRACE [f.p.e.c.f.FsCrawler] settings used for this crawler: [{
  "name" : "test",
  "fs" : {
    "url" : "/home/administrateur",
    "update_rate" : "1m",
    "includes" : [ "*.doc" ],
    "json_support" : false,
    "filename_as_id" : false,
    "add_filesize" : true,
    "remove_deleted" : true,
    "add_as_inner_object" : false,
    "store_source" : false,
    "index_content" : true,
    "attributes_support" : false,
    "raw_metadata" : true,
    "xml_support" : false,
    "index_folders" : true,
    "lang_detect" : false
  },
  "server" : {
    "hostname" : "192.168.37.41",
    "port" : 22,
    "username" : "administrateur",
    "protocol" : "ssh"
  },
  "elasticsearch" : {
    "nodes" : [ {
      "host" : "127.0.0.1",
      "port" : 9200,
      "scheme" : "HTTP"
    } ],
    "type" : "doc",
    "bulk_size" : 100,
    "flush_interval" : "5s"
  },
  "rest" : {
    "scheme" : "HTTP",
    "host" : "127.0.0.1",
    "port" : 8080,
    "endpoint" : "fscrawler"
  }
}]
11:17:13,300 INFO  [f.p.e.c.f.FsCrawlerImpl] Starting FS crawler
11:17:13,300 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler started in watch mode. It will run unless you stop it with CTRL+C.
11:17:13,699 DEBUG [f.p.e.c.f.c.ElasticsearchClient] findVersion()
11:17:13,774 TRACE [f.p.e.c.f.c.ElasticsearchClient] get server response: {name=node-0, cluster_name=es-test, cluster_uuid=E_iblWbUTU6xjk3eqgC1hA, version={number=5.1.2, build_hash=c8c4c16, build_date=2017-01-11T20:18:39.146Z, build_snapshot=false, lucene_version=6.3.0}, tagline=You Know, for Search}
11:17:13,775 DEBUG [f.p.e.c.f.c.ElasticsearchClient] findVersion() -> [5.1.2]
11:17:13,775 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Using elasticsearch >= 5, so we use [stored_fields] as fields option
11:17:13,775 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Using elasticsearch >= 5, so we can use ingest node feature
11:17:13,776 DEBUG [f.p.e.c.f.c.BulkProcessor] Creating a bulk processor with size [100], flush [5s], pipeline [null]
11:17:13,779 DEBUG [f.p.e.c.f.c.ElasticsearchClient] findVersion()
11:17:13,781 TRACE [f.p.e.c.f.c.ElasticsearchClient] get server response: {name=node-0, cluster_name=es-test, cluster_uuid=E_iblWbUTU6xjk3eqgC1hA, version={number=5.1.2, build_hash=c8c4c16, build_date=2017-01-11T20:18:39.146Z, build_snapshot=false, lucene_version=6.3.0}, tagline=You Know, for Search}
11:17:13,781 DEBUG [f.p.e.c.f.c.ElasticsearchClient] findVersion() -> [5.1.2]
11:17:13,782 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] FS crawler connected to an elasticsearch [5.1.2] node.
11:17:13,782 DEBUG [f.p.e.c.f.c.ElasticsearchClient] create index [test]
11:17:13,782 TRACE [f.p.e.c.f.c.ElasticsearchClient] index settings: [{
  "settings": {
    "analysis": {
      "analyzer": {
        "fscrawler_path": {
          "tokenizer": "fscrawler_path"
        }
      },
      "tokenizer": {
        "fscrawler_path": {
          "type": "path_hierarchy"
        }
      }
    }
  }
}
]
11:17:13,859 TRACE [f.p.e.c.f.c.ElasticsearchClient] index already exists. Ignoring error...
11:17:13,860 DEBUG [f.p.e.c.f.c.ElasticsearchClient] is existing type [test]/[doc]
11:17:13,864 TRACE [f.p.e.c.f.c.ElasticsearchClient] get index metadata response: {test={aliases={}, mappings={folder={properties={encoded={type=keyword, store=true}, name={type=keyword, store=true}, real={type=keyword, store=true}, root={type=keyword, store=true}, virtual={type=keyword, store=true}}}, doc={properties={attachment={type=binary}, attributes={properties={group={type=keyword}, owner={type=keyword}}}, content={type=text}, file={properties={checksum={type=keyword}, content_type={type=keyword}, extension={type=keyword}, filename={type=keyword}, filesize={type=long}, indexed_chars={type=long}, indexing_date={type=date, format=dateOptionalTime}, last_modified={type=date, format=dateOptionalTime}, url={type=keyword, index=false}}}, meta={properties={author={type=text}, date={type=date, format=dateOptionalTime}, keywords={type=text}, language={type=keyword}, title={type=text}}}, object={type=object}, path={properties={encoded={type=keyword}, real={type=keyword, fields={tree={type=text, analyzer=fscrawler_path, fielddata=true}}}, root={type=keyword}, virtual={type=keyword, fields={tree={type=text, analyzer=fscrawler_path, fielddata=true}}}}}}}}, settings={index={number_of_shards=5, provided_name=test, creation_date=1486462496211, analysis={analyzer={fscrawler_path={tokenizer=fscrawler_path}}, tokenizer={fscrawler_path={type=path_hierarchy}}}, number_of_replicas=1, uuid=9uoBSiB0TbGTjREyIWTFdw, version={created=5010299}}}}}
11:17:13,865 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Mapping [test]/[doc] already exists.
11:17:13,865 DEBUG [f.p.e.c.f.c.ElasticsearchClient] is existing type [test]/[folder]
11:17:13,869 TRACE [f.p.e.c.f.c.ElasticsearchClient] get index metadata response: {test={aliases={}, mappings={folder={properties={encoded={type=keyword, store=true}, name={type=keyword, store=true}, real={type=keyword, store=true}, root={type=keyword, store=true}, virtual={type=keyword, store=true}}}, doc={properties={attachment={type=binary}, attributes={properties={group={type=keyword}, owner={type=keyword}}}, content={type=text}, file={properties={checksum={type=keyword}, content_type={type=keyword}, extension={type=keyword}, filename={type=keyword}, filesize={type=long}, indexed_chars={type=long}, indexing_date={type=date, format=dateOptionalTime}, last_modified={type=date, format=dateOptionalTime}, url={type=keyword, index=false}}}, meta={properties={author={type=text}, date={type=date, format=dateOptionalTime}, keywords={type=text}, language={type=keyword}, title={type=text}}}, object={type=object}, path={properties={encoded={type=keyword}, real={type=keyword, fields={tree={type=text, analyzer=fscrawler_path, fielddata=true}}}, root={type=keyword}, virtual={type=keyword, fields={tree={type=text, analyzer=fscrawler_path, fielddata=true}}}}}}}}, settings={index={number_of_shards=5, provided_name=test, creation_date=1486462496211, analysis={analyzer={fscrawler_path={tokenizer=fscrawler_path}}, tokenizer={fscrawler_path={type=path_hierarchy}}}, number_of_replicas=1, uuid=9uoBSiB0TbGTjREyIWTFdw, version={created=5010299}}}}}
11:17:13,869 DEBUG [f.p.e.c.f.c.ElasticsearchClient] Mapping [test]/[folder] already exists.
11:17:13,871 DEBUG [f.p.e.c.f.FsCrawlerImpl] creating fs crawler thread [test] for [/home/administrateur] every [1m]
11:17:13,879 INFO  [f.p.e.c.f.FsCrawlerImpl] FS crawler started for [test] for [/home/administrateur] every [1m]
11:17:13,880 DEBUG [f.p.e.c.f.FsCrawlerImpl] Fs crawler thread [test] is now running. Run #1...
11:17:13,881 DEBUG [f.p.e.c.f.f.FileAbstractor] Opening SSH connection to administrateur@192.168.37.41
11:17:19,162 WARN  [f.p.e.c.f.FsCrawlerImpl] Error while indexing content from /home/administrateur: Auth fail
11:17:19,162 WARN  [f.p.e.c.f.FsCrawlerImpl] Error while closing the connection: java.lang.NullPointerException
11:17:19,163 DEBUG [f.p.e.c.f.FsCrawlerImpl] Fs crawler is going to sleep for 1m
^C11:18:09,811 DEBUG [f.p.e.c.f.FsCrawlerImpl] Closing FS crawler [test]
11:18:09,813 DEBUG [f.p.e.c.f.FsCrawlerImpl] Fs crawler is now waking up again...
11:18:09,813 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread [test] is now marked as closed...
11:18:09,813 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler thread is now stopped
11:18:09,814 DEBUG [f.p.e.c.f.FsCrawlerImpl] FS crawler Rest service stopped
11:18:09,814 DEBUG [f.p.e.c.f.c.ElasticsearchClientManager] Closing Elasticsearch client manager
11:18:09,814 DEBUG [f.p.e.c.f.c.BulkProcessor] Closing BulkProcessor

J’ai effectué plusieurs tentatives différentes :

Avec le nom à la place de l’ip
Test sur les serveurs de prod et de test
Test vers une machine en local sans firewall (test plus haut)
Test avec différents chemins
Test avec la version 2.3 SNAPSHOT de fscrawler et la 2.2

Mes versions sont les suivantes :

Java SDK 1.8.0_121
ElastickPack 5.1.2
fscrawler 2.2 (echec) puis test avec 2.3 SNAPSHOT (echec)

Questions : 1 : Y a-t-il la possibilité de spécifier pour un même job plusieurs urls ? 2 : Le crawler parcours-t-il l’arborescence complètement du chemin spécifié ou juste les fichiers dans le dossier spécifié ?

  /**principal_folder**
    /folder
      files
    files

  /**principal_folder**
    files

Merci pour l’aide que vous allez m’apporter ! Cordialement, MS

Issue Analytics

State:
Created 7 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

MaxenceSAUNIERcommented, Mar 1, 2017

Je confirme que cela fonctionne. Merci !

0reactions

dadoonetcommented, Feb 21, 2017

Normalement c’est fixé avec #329.

Il faudrait essayer la dernière version SNAPSHOT pour confirmer. Merci !

Top Results From Across the Web

Why do I receive an "Auth fail" error when interacting with a Git ...

When trying to clone my Git remote in MATLAB over SSH or when trying to interact with (push, pull, etc.) an already successfully...

[ANNOUNCEMENT] - Elasticsearch File System Crawler 2.3 ...

[SSH] Error while indexing content from /home/administrateur : Auth fail Issue: 316. Thanks to Moltroon. tesseract usage for OCR Issue: 314.

"Auth Fail" Git error - Looker Community

Auth fail means either a) that git repo isn't open to the network so Looker can't SSH to it at all, no matter...

"Auth Failed" error with EGit and GitHub - eclipse

ssh /id_rsa ) and I've no problem in connect by ssh or git console client. Some blogs says that is a problem with...

SSH authentication fails : IDEA-214582 - JetBrains YouTrack

Move the same config/cert to DG, I get "SSH: Auth fail"" ... files Elapsed time on auto-detect: 0 ms 2016-07-14 09:14:26,838 [ 39299]...

Troubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.

Start Free

Top Related Reddit Thread

No results found

Top Related Tweet

No results found

Top Related Dev.to Post

No results found

[SSH] Error while indexing content from /home/administrateur : Auth fail

Issue Analytics

Top GitHub Comments

Top Results From Across the Web

Top Related Medium Post

Top Related StackOverflow Question

Troubleshoot Live Code

Top Related Reddit Thread

Top Related Hackernoon Post

Top Related Tweet

Top Related Dev.to Post

Top Related Hashnode Post

Indexing Json document via bulk indexing folders also

content is null when creating the most simple job