[proposal] Specifying input/output formats and (natural) languages
See original GitHub issueI am currently missing the ability to specify what types of files the SoftwareApplication consumes or produces. I think this is important software metadata. I would want to propose adding something like:
inputFormat
- (Text) - Media type, typically MIME format of a file consumed as input (in whatever way) by the applicationoutputFormat
- (Text) - Media type, typically MIME format of a file produced as output (in whatever way) by the application
Also, if the input concerns any kind of human text or speech, adding a language identifier is very desirable, for which I’d suggest something like:
inputLanguage
(Text or schema:Language) - Supported natural language for input dataoutputLanguage
(Text or schema:Language) - Supported natural language for output data
Context: I’m producing codemeta metadata for a lot of NLP tools.
Producing complex profiles of input and output is most probably well beyond the scope of the codemeta initiative and best left to things like OpenAPI/swagger, but I think some very simple basics should be in place.
What do you think?
Issue Analytics
- State:
- Created 5 years ago
- Comments:6 (1 by maintainers)
Top Results From Across the Web
Natural Language Semantics Markup Language - W3C
This specification describes markup for representing natural language semantics, and forms part of the proposals for the W3C Speech Interface Framework.
Read more >(PDF) Proposal for using NLP interchange format for question ...
Proposal for using NLP interchange format for question answering in ... accepts input as natural language form and the output is in SPARQL ......
Read more >Natural Language Assessment: A New Framework to Promote ...
In this blog, we introduce an important natural language understanding ... from input question, answer and expectation to assessment output.
Read more >A pipeline proposal
This describes an unimplemented XML pipeline language. ... Inputs and outputs are named in order that they can be distinguished. Each input is...
Read more >Structured prediction as translation be - OpenReview
of Translation between Augmented Natural Languages (TANL). ... input/output formats for all structured prediction tasks in Section 4.
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, you’re right when you say that you need to know which of the inputs/outputs accepts/produces which files if you really want the specification to be actionable, this was indeed more intended as a high level description that can for example be used for users to discover tools and for them to make some assessment whether the tools are suitable for them based on possible input/output data.
Well, that’s already something, at least you can use it to decide with what to open it.
I’ll draft up something in our https://github.com/SoftwareUnderstanding/software_types repo
@cboettig Thanks for your reaction! I understand the need to stay as close to schema.org as possible, and you’re undoubtedly more at home in their conventions than I am. I really like your idea of having an attribute (
inputData
/outputData
?consumesData
/producesData
?) take the fullCreativeWork
(or derivatives) types, that may be more elegant than what I suggested. Then in turn I can indeed just use theinLanguage
andencodingFormat
, so I’d gladly go along with that.Alternatively, schema.org has
availableLanguage
(A language someone may use with or at the item, service or place) which could perhaps be stretched to mean what I suggested (but leaves the MIME type issue open still). I also found thatEntryPoint
(see also #183), does havecontentType
andencodingType
(for describing web API endpoints), which in a way is already more specific (and too specific for my use) than what I propose.I wouldn’t want to go the entire way of describing the entire software API of course, but the notion of software consuming some kind of data and producing another (either one or more) is so central (also outside of NLP) that I think it wouldn’t be out of place.
Yeah, but I’m more concerned about proper semantics than visibility. I developed a codemeta-based portal (example: https://webservices-lst.science.ru.nl, source: https://github.com/proycon/labirinto), so I can easily implement whatever proper solution is agreed on.