Parser specification recommendations
See original GitHub issueHey Alex,
Thanks for putting this up on HN! I threw together a really basic parser last night and ran into a couple things that I think are worth clarifying:
1. The spec should require that every item is a line.
That means each line must end with a trailing \n
. This helps resolve ambiguity in the specification so far. This also implies that every item, including the last item in the file, needs a trailing \n
to parse. That’s pretty standard for just about any text editor and many other text file formats, so it should be fine.
2. Regarding annotations in the following form:
Items:
- Item Y
[I belong to Item Y]
- [I belong to Item Z]
Item Z
I think the second case, an annotation “belonging to” it’s child is problematic. My recommendation is to drop it, because I think it’s difficult for humans to understand, let alone a parser.
If you did not want to drop that, then I would propose it be reinterpreted as follows:
Items:
- Item Y
[I belong to Item Y]
- [I am an item with no value, only this annotation]
I'm a child of this value-less item.
Again I think just requiring that a line have a value makes more sense, but if you feel it’s critical to keep the format shown above then I think a “valueless” item makes more sense than inverting the relationship of values and annotations. It’s confusing to both humans and machines.
3. Annotation types:
UPDATE: based on the discussion below I ended up reconsidering a bunch of these details and instead landed on there being only one kind of annotation, an object like {key: "optional", value: "required"}
. Tasks only make sense as a special type of Item. Thus any readers who made it this far can skip past this part 😃
In your guidebook you show examples like this:
I'm a plain old item
[and I'm an annotation]
and
The Crying of Lot 49
[author: Thomas Pynchon]
[publication year: 1966]
[publisher: J. B. Lippincott & Co.]
But in your sample area your examples show that annotations stored as a hash, with the “index” as the keys.
This several cases unclear.
- How should a note with no “index” (ie something before a colon) be stored?
- What happens if two annotations are given with the same “index”?
BTW as an aside I think you might want to call those “tags” or “keys” bc. “index” is a little confusing, but… that’s not critical.
Combining this with the other annotation types, it becomes difficult to reason about how the underlying data should be represented. I would propose the following:
- There are three kinds of Annotations: note, index, and task.
- All annotations are enclosed in square brackets
[...]
- A task is a square bracket enclosing exactly one character:
[ ]
,[x]
,[✓]
, followed by any amount of text and terminated by a\n
- An “index” (again, name is weird lol), is an annotation with characters separated by a colon:
[index: content]
- An “index” may occur only once per item. If an index is repeated, the last occurrence will be used.
- Any other annotation is considered a note.
That would result in the following:
Big Item
[this is just a note]
[this is another note]
[author: me]
[category: code example]
[category: margin example]
[x] Put this on Github
[ ] Get it adopted?
Everything above is an annotation on Big Item, but I am a child.
Of course big item can have multiple children.
And the JSON representation of this would look like this (omitting the raw stuff):
{
"value": "Big Item",
"annotations": {
"notes": ["this is just a note", "this is another note"],
"indices": {
"author": "me",
"category": "margin example"
},
"tasks": [
{ "done": true, "value": "Put this on Github" },
{ "done": false, "value": "Get this adopted?" },
]
},
"children": [
{ "value": "Everything above is an annotation on Big Item, but I am a child." },
{ "value": "Of course big item can have multiple children."}
]
}
I have a few other thoughts but those were the big ones and I know this is a lot to digest, so let me know what you think about all the above. Thanks for sharing your project, this is very cool 😃
Issue Analytics
- State:
- Created 3 years ago
- Reactions:2
- Comments:10 (8 by maintainers)
Top GitHub Comments
Additionally, the parser should probably strip leading and trailing whitespace in annotation values so that
[a: b]
is parsed asa
andb
instead ofa
and•b
.@mtsknn and others on this thread, I got far enough with my own implementation to have a working parser that can read Margin and write JSON. It’s not baked enough for wide distribution yet, but it has tests and covers all the cases we’ve talked about. I went with the
key, value
structure for annotations we discussed above.The code is here: https://github.com/burlesona/margin-rb
The divergences from Alex’s implementation that I’m aware of ended up being as follows:
Note that I realized when farther into implementing this that it doesn’t really make sense for a task to be considered an annotation, rather a task is just an item with an extra
done
field.To make this easy for consumers to work with I added the
type
field on items to indicate if it’s a regularitem
or atask
.Happy to hear any feedback you all have. I’ve got reasonable test coverage now but will likely add tests for more cases soon, as well as a CLI.