question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[proposal] Specifying input/output formats and (natural) languages

See original GitHub issue

I am currently missing the ability to specify what types of files the SoftwareApplication consumes or produces. I think this is important software metadata. I would want to propose adding something like:

  • inputFormat - (Text) - Media type, typically MIME format of a file consumed as input (in whatever way) by the application
  • outputFormat - (Text) - Media type, typically MIME format of a file produced as output (in whatever way) by the application

Also, if the input concerns any kind of human text or speech, adding a language identifier is very desirable, for which I’d suggest something like:

  • inputLanguage (Text or schema:Language) - Supported natural language for input data
  • outputLanguage (Text or schema:Language) - Supported natural language for output data

Context: I’m producing codemeta metadata for a lot of NLP tools.

Producing complex profiles of input and output is most probably well beyond the scope of the codemeta initiative and best left to things like OpenAPI/swagger, but I think some very simple basics should be in place.

What do you think?

Issue Analytics

  • State:open
  • Created 5 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
proyconcommented, May 24, 2022

Yes, you’re right when you say that you need to know which of the inputs/outputs accepts/produces which files if you really want the specification to be actionable, this was indeed more intended as a high level description that can for example be used for users to discover tools and for them to make some assessment whether the tools are suitable for them based on possible input/output data.

And then most of the time, having the output format is not useful. For instance, I can select that the output is “CSV” and you would not know what to do with it, besides opening it.

Well, that’s already something, at least you can use it to decide with what to open it.

Yeah, add it to the profile as well if you want.

I’ll draft up something in our https://github.com/SoftwareUnderstanding/software_types repo

1reaction
proyconcommented, Jun 15, 2018

@cboettig Thanks for your reaction! I understand the need to stay as close to schema.org as possible, and you’re undoubtedly more at home in their conventions than I am. I really like your idea of having an attribute (inputData/outputData? consumesData/producesData?) take the full CreativeWork (or derivatives) types, that may be more elegant than what I suggested. Then in turn I can indeed just use the inLanguage and encodingFormat, so I’d gladly go along with that.

Alternatively, schema.org has availableLanguage (A language someone may use with or at the item, service or place) which could perhaps be stretched to mean what I suggested (but leaves the MIME type issue open still). I also found that EntryPoint (see also #183), does have contentType and encodingType (for describing web API endpoints), which in a way is already more specific (and too specific for my use) than what I propose.

I wouldn’t want to go the entire way of describing the entire software API of course, but the notion of software consuming some kind of data and producing another (either one or more) is so central (also outside of NLP) that I think it wouldn’t be out of place.

For the moment, listing input or output MIME types in the keywords might make them more visible.

Yeah, but I’m more concerned about proper semantics than visibility. I developed a codemeta-based portal (example: https://webservices-lst.science.ru.nl, source: https://github.com/proycon/labirinto), so I can easily implement whatever proper solution is agreed on.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Natural Language Semantics Markup Language - W3C
This specification describes markup for representing natural language semantics, and forms part of the proposals for the W3C Speech Interface Framework.
Read more >
(PDF) Proposal for using NLP interchange format for question ...
Proposal for using NLP interchange format for question answering in ... accepts input as natural language form and the output is in SPARQL ......
Read more >
Natural Language Assessment: A New Framework to Promote ...
In this blog, we introduce an important natural language understanding ... from input question, answer and expectation to assessment output.
Read more >
A pipeline proposal
This describes an unimplemented XML pipeline language. ... Inputs and outputs are named in order that they can be distinguished. Each input is...
Read more >
Structured prediction as translation be - OpenReview
of Translation between Augmented Natural Languages (TANL). ... input/output formats for all structured prediction tasks in Section 4.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found