question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Introduce Attachment type for attachment mapping

See original GitHub issue

The mapper-attachment plugin allows indexing and searching of documents in various different formats such as PDF, Word, etc.

This is exposed in NEST with AttachmentProperty, which can be used to map the properties of a CLR object to the attachment field mapping.

In its simplest form, indexing an attachment need pass a base64 encoded string of the document as the attachment field:

PUT http://localhost:9200/docs/document/1?pretty=true&refresh=true 
{
  "id": 1,
  "title": "Some document",
  "file": "some base64 encoded string"
}

If the attachment mapping has specified other metadata fields to indexed such as content_type, language, etc. these will be extracted from the content field and indexed as requested (NOTE: they do not exist in source, only in the index).

The plugin also allows explicit metadata field values to be passed when indexing, by using the name of the metadata field prefixed with an underscore

PUT http://localhost:9200/docs/document/2?pretty=true&refresh=true 
{
  "id": 2,
  "title": "Another document",
  "file": {
    "_content":  "some base64 encoded string",
    "_content_type": "text/plain"
  }
}

Having explicit metadata fields affects the structure of the source returned from results e.g.


POST http://localhost:9200/docs/document/_search?pretty=true 
{
  "query": {
    "match": {
      "file.content": {
        "query": "NEST mapper"
      }
    }
  }
}

Status: 200
{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.01125201,
    "hits" : [ {
      "_index" : "docs",
      "_type" : "document",
      "_id" : "2",
      "_score" : 0.01125201,
      "_source" : {
        "id" : 2,
        "title" : "Another document",
        "file" : {
          "_content" : "some base64 encoded string",
          "_content_type" : "text/plain"
     }
      }
    }, {
      "_index" : "docs",
      "_type" : "document",
      "_id" : "1",
      "_score" : 0.01125201,
      "_source" : {
        "id" : 1,
        "title" : "Some document",
        "file" : "some base64 encoded string"
      }
    } ]
  }
}

Here, the first document has an explicit metadata field for _content_type, and has therefore passed the attachment content to be indexed as _content. The second document passed the attachment content against the name of the attachment field.

(using explicit metadata fields does not affect the attachment mapping i.e. new fields beginning with underscore are not added to the mapping)

The extracted values are available in the hits

POST http://localhost:9200/docs/document/_search?pretty=true 
{
  "fields": [
    "file.content_type"
  ],
  "query": {
    "match": {
      "file.content_type": {
        "query": "pdf"
      }
    }
  }
}

Status: 200
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 0.19178301,
    "hits" : [ {
      "_index" : "docs",
      "_type" : "document",
      "_id" : "1",
      "_score" : 0.19178301,
      "fields" : {
        "file.content_type" : [ "application/pdf" ]
      }
    } ]
  }
}

I think we should introduce an Attachment type to make using the mapper-attachment plugin easier with NEST to

  1. Handle the serialization of properties when indexing
  2. Handle the deserialization of content with any additional explicit meta fields
  3. Handle using any of the Attachment type properties to for strongly typed access to fields.

Thoughts?

/cc @Mpdreamz, @gmarz

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:16 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
russcamcommented, Jun 16, 2016

@nasreekar, the mapper attachments plugin extracts metadata fields from the file where they are available (and the plugin can extract them); some of the metadata fields can be explicitly sent when indexing the document and as far as I know, these are the following (in addition to content which is sent as _content).

  • content type (sent explicitly as _content_type)
  • name (sent explicitly as _name)
  • language (sent explicitly as _language)

@dadoonet, is this correct?

If you wish to index other data alongside an attachment, then you create other properties on your Document class as you have done.

I’ve just merged a PR into 2.x that should make it a little easier to use the attachment plugin with NEST.. This will go into the next 2.x release

Read more comments on GitHub >

github_iconTop Results From Across the Web

Attachment Styles in Therapy: 6 Worksheets & Handouts
This article introduces attachment theory before exploring attachment styles and the potential to change them.
Read more >
Attachment Styles and Trigger Mapping - YouTube
WHAT ATTACHMENT STYLE ARE YOU?⭐ Take the quiz: http://bit.ly/4LuvStylesYT ======== Have you ever felt panicked or overwhelmed by your ...
Read more >
8 Managing MIME Attachment Types
Creating a MIME Attachment Map​​ The MIME standard has established conventions and names for many common attachment types, such as GIFs, PostScript documents,...
Read more >
Add or remove file attachments—ArcGIS Pro | Documentation
Add an attachment · On the Edit tab, in the Selection group, click Attributes Attributes · Click Select Select · Expand the selected...
Read more >
The Future of Attachments for Elasticsearch and .NET
From NEST 2.3.3 onwards, we've introduced an Attachment type to make working with attachments a much smoother experience.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found