D★Mark¶ ↑
Denis Defreyne <denis@stoneship.org>
CAUTION: D★Mark is experimental — use at your own risk!
_D★Mark_ is a language for marking up prose. It facilitates writing semantically meaningful text, without limiting itself to the semantics provided by HTML or Markdown.
Here’s an example of D★Mark:
- source
h2. Patterns
para. Patterns are used to find items and layouts based on their identifier. They come in three varieties:
list.
item. glob patterns item. regular expression patterns item. legacy patterns
para. A glob pattern that matches every item is %pattern{/*/}. A glob pattern that matches every item/layout with the extension %filename{md} is %glob{/*/.md}.
Samples¶ ↑
The `samples/` directory contains some sample D★Mark files. They can be processed by invoking the appropriate script with the same filename. For example:
.… % bundle exec ruby samples/trivial.rb <p>I’m a trivial example!</p> .…
Structure of a D★Mark document¶ ↑
_D★Mark_ knows two constructs:
- Block-level elements
-
Every non-blank line of a D★Mark document corresponds to a block. A block can be a paragraph, a list, a header, a source code listing, or more. They start with the name of the element, a period, a space character, followed by the content. For example:
+
- source
para. Patterns are used to find items and layouts based on their identifier. They come in three varieties.
- Inline elements
-
Inside a block, text can be marked up using inline elements, which start with a percentage sign, the name of the element, and the content within braces. For example, `%emph{crazy}` is an `emph` element with the content `crazy`.
Block-level elements can be nested. To do so, indent the nested block two spaces deeper than the enclosing block. For example, the following defines a `list` element with three `item` elements inside it:
- source
list.
item. glob patterns item. regular expression patterns item. legacy patterns
Block-level elements can also include plain text. In this case, the content is not wrapped inside a nested block-level element. This is particularly useful for source code listing. For example:
- source
identifier = Nanoc::Identifier.new('/about.md') identifier.without_ext # => "/about" identifier.ext # => "md"
Block-level elements and inline elements are identical in the tree representation of D★Mark. This means that any inline element can be rewritten as a block-level element.
NOTE: To do: Elaborate on the distinction and similarity of block-level and inline elements.
NOTE: To do: Describe escaping rules.
Attributes¶ ↑
Both block and inline elements can also have attributes. Attributes are enclosed in square brackets after the element name, as a comma-separated list of key-value pairs separated by an equal sign. The value part, along with the equal sign, can be omitted, in which case the value will be equal to the key name.
For example:
-
`%code{Nanoc::VERSION}` is an inline `code` element with the `lang` attribute set to `ruby`.
-
`%only{Refer to the release notes for details.}` is an inline `only` element with the `web` attribute set to `web`.
-
`h2. All about donkeys` is a block-level `h2` element with the `id` attribute set to `donkey`.
-
`p. This is a paragraph that only readers of the book will see.` is a block-level `para` element with the `print` attribute set to `print`.
NOTE: The behavior of keys with missing values might change to default to booleans rather than to the key name.
Goals¶ ↑
- Be extensible
-
D★Mark defines only the syntax of the markup language, and doesn’t bother with semantics. It does not prescribe which element names are valid in the context of a vocabulary, because it does not come with a vocabulary.
- Be simple
-
Simplicity implies being easy to write and easy to parse. D★Mark eschews ambiguity and aims to have a short formal syntactical definition. This also means that it is easy to syntax highlight.
- Be compact
-
Introduce as little extra syntax as possible.
Comparison with other languages¶ ↑
D★Mark takes inspiration from a variety of other languages.
- HTML
-
HTML is syntactically unambiguous, but comparatively more verbose than other languages. It also prescribes only a small set of elements, which makes it awkward to use for prose that requires more thorough markup. It is possible use `span` or `div` elements with custom classes, but this approach turns an already verbose language into something even more verbose.
+
- source,html
<p>A glob pattern that matches every item is <span class=“pattern attr-kind-glob”>/*/</span>.</p>
+
- source,d-mark
para. A glob pattern that matches every item is %pattern{/*/}.
- XML
-
Similar to HTML, with the major difference that XML does not prescribe a set of elements.
+
- source,xml
<para>A glob pattern that matches every item is <pattern kind=“glob”>/*/</pattern>.</para>
+
- source,d-mark
para. A glob pattern that matches every item is %pattern{/*/}.
- Markdown
-
Markdown has a compact syntax, but is complex and ambiguous, as evidenced by the many different mutually incompatible implementations. It prescribes a small set of elements (smaller even than HTML). It supports embedding raw HTML, which in theory makes it possible to combine the best of both worlds, but in practice leads to markup that is harder to read than either Markdown or HTML separately, and occasionally trips up the parser and syntax highlighter.
+
- source
A glob pattern that matches every item is <span class=“glob attr-kind-glob”>/*/</span>.
+
- source,d-mark
para. A glob pattern that matches every item is %pattern{/*/}.
- AsciiDoc
-
AsciiDoc, along with its AsciiDoctor variant, are syntactically unambiguous, but complex languages. They prescribe a comparatively large set of elements which translates well to DocBook and HTML. They do not support custom markup or embedding raw HTML, which makes them harder t use for prose that requires more complex markup.
+ _(No example, as this example cannot be represented with AsciiDoc.)_
- TeX, LaTeX
-
TeX is a turing-complete programming language, as opposed to a markup language, intended for typesetting. This makes it impractical for using it as the source for converting it to other formats. Its syntax is simple and compact, and served as an inspiration for D★Mark.
+
- source,latex
A glob pattern that matches every item is pattern[glob]{/*/}.
+
- source,d-mark
para. A glob pattern that matches every item is %pattern{/*/}.
- JSON, YAML
-
JSON and YAML are data interchange formats rather than markup languages, and thus are not well-suited for marking up prose.
+
- source,json
[
"A glob pattern that matches every item is ", ["pattern", {"kind": "glob"}, ["/**/*"]], "."
]
+
- source,d-mark
para. A glob pattern that matches every item is %pattern{/*/}.
Specification¶ ↑
NOTE: To do: write this section.
Programmatic usage¶ ↑
Handling a D★Mark file consists of two stages: parsing and translating.
The parsing stage converts text into a list of nodes. Construct a parser with the tokens as input, and call `#run` to get the list of nodes.
- source,ruby
content = File.read(ARGV) nodes =
DMark::Parser.new(content)
.run
The translating stage is not the responsibility of D★Mark. A translator is part of the domain of the source text, and D★Mark only deals with syntax rather than semantics. A translator will run over the tree and convert it into something else (usually another string). To do so, handle each node type (`DMark::ElementNode` or `String`). For example, the following translator will convert the tree into something that resembles XML:
- source,ruby
class MyXMLLikeTranslator <
DMark::Translator
def handle(node) case node when String out << node when DMark::Parser::ElementNode out << "<#{node.name}>" handle_children(node) out << "</#{node.name}>" end end
end
result = MyXMLLikeTranslator.new(nodes).run puts result
-