question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

content is null when creating the most simple job

See original GitHub issue

Hello. Thank you for this package. I’m trying it, but I keep getting null in the content, even for a plain text file containing plain text shadi. Could you give me some pointers on how I can get the content to show up? Other than the plain text file, I’d like to index .xlsx, .xls, and .pdf formats.

Here is my job settings file:

{
  "name" : "sic_list",
  "fs" : {
    "url" : "/data/fscrawler/files",
    "update_rate": "1m",
    "indexed_chars": "100%"
  },
  "elasticsearch" : {
    "index" : "sic_list",
    "type": "doc",
    "nodes" : [
      { "host" : "myhost.com", "port" : 9200 }
    ]
  }
}

and here is an excerpt from my --trace output

10:18:31,678 DEBUG [f.p.e.c.f.u.FsCrawlerUtil] filename = [test.txt], includes = [null], excludes = [null]
10:18:31,679 TRACE [f.p.e.c.f.u.FsCrawlerUtil] no rules
10:18:31,679 DEBUG [f.p.e.c.f.FsCrawlerImpl] [test.txt] can be indexed: [true]
10:18:31,679 DEBUG [f.p.e.c.f.FsCrawlerImpl]   - file: test.txt
10:18:31,680 DEBUG [f.p.e.c.f.FsCrawlerImpl] fetching content from [/data/fscrawler/files],[test.txt]
10:18:31,680 DEBUG [f.p.e.c.f.FsCrawlerImpl] Indexing in ES sic_list, doc, 57e81419ed4fa6aa86d668bb9e28674
10:18:31,681 TRACE [f.p.e.c.f.FsCrawlerImpl] JSon indexed : {
  "content" : null,
  "attachment" : null,
  "meta" : {
    "author" : null,
    "title" : null,
    "date" : null,
    "keywords" : null,
    "raw" : null
  },
  "file" : {
    "content_type" : null,
    "last_modified" : "2017-01-21T10:10:03Z",
    "indexing_date" : "2017-01-21T10:18:31.680Z",
    "filesize" : null,
    "filename" : "test.txt",
    "url" : "file:///data/fscrawler/files/test.txt",
    "indexed_chars" : null,
    "checksum" : null
  },
  "path" : {
    "encoded" : "6113a2c108ffc50c1fd761817d96ca7",
    "root" : "6113a2c108ffc50c1fd761817d96ca7",
    "virtual" : "",
    "real" : "/data/fscrawler/files/test.txt"
  },
  "attributes" : null
}

I’m running fscrawler from a dockerfile

FROM openjdk:alpine
RUN apk add --update openssl
RUN  wget https://repo1.maven.org/maven2/fr/pilato/elasticsearch/crawler/fscrawler/2.1/fscrawler-2.1.zip \
  && unzip fscrawler-2.1.zip
RUN mkdir ~/.fscrawler
WORKDIR ./fscrawler-2.1
ENTRYPOINT cp /data/fscrawler/home/* ~/.fscrawler -r \
        && bin/fscrawler --trace sic_list

with the following docker command

  docker build -t fscrawler build/fscrawler/
  docker run -it --rm --name fscrawler-siclist \
    -v /home/shadi/sic_lists/:/data/fscrawler/files/:ro \
    -v "${PWD}"/home/:/data/fscrawler/home/:ro \
    fscrawler

and all the files are readable by the same user launching fscrawler

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:15 (15 by maintainers)

github_iconTop GitHub Comments

1reaction
dadoonetcommented, Jan 24, 2017

Ha! Indeed it’s actually true by default but only if you generate the job with FS Crawler. If you do it manually, it’s actually false.

Thanks a lot for finding this nasty bug.

I know there are some others which I’m going to fix now.

1reaction
shadiakiki1986commented, Jan 24, 2017

Btw, I took the freedom to open a couple of issues, which I saw while testing fscrawler, separately. I hope you don’t mind 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

A quick and thorough guide to 'null' - freeCodeCamp
by Christian Neumanns A quick and thorough guide to 'null': what it is, and how you should use it What is the meaning...
Read more >
Post parameter is always null - Stack Overflow
If you post a model your model needs to have an empty/default constructor, otherwise the model can't be created, obviously. Be careful while...
Read more >
null - JavaScript - MDN Web Docs - Mozilla
The null value represents the intentional absence of any object value. It is one of JavaScript's primitive values and is treated as falsy...
Read more >
and ??= operators - null-coalescing operators - Microsoft Learn
The `??` and `??=` operators are the C# null-coalescing operators. They return the value of the left-hand operand if it isn't null.
Read more >
How to SELECT Records With No NULL Values in MySQL
By far the simplest and most straightforward method for ensuring a particular column's result set doesn't contain NULL values is to use the...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found