class MarkdownIt::HTMLEntities

TODO this list needs to be brought up to same level as the WC3 document www.w3.org/TR/xml-entity-names/byalpha.html

Constants

MAPPINGS

This table added by Philip (flip) Kromer <flip@infochimps.org> using the mapping by John Cowan <cowan@ccil.org> (25 July 1997) at

ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MISC/SGML.TXT

The following table maps SGML character entities from various public sets (namely, ISOamsa, ISOamsb, ISOamsc, ISOamsn, ISOamso, ISOamsr, ISObox, ISOcyr1, ISOcyr2, ISOdia, ISOgrk1, ISOgrk2, ISOgrk3, ISOgrk4, ISOlat1, ISOlat2, ISOnum, ISOpub, ISOtech, HTMLspecial, HTMLsymbol) to corresponding Unicode characters.

The table has five tab-separated fields:

:bare   => SGML character entity name
:hex    => Unicode 2.0 character code
:entity => SGML character entity
:type   => SGML public entity set
:udesc  => Unicode 2.0 character name (UPPER CASE)

Entries which don't have Unicode equivalents have “0x????” for :hex and a lower case :udesc (from the public entity set DTD).

For reasons I (flip) don't understand, the source file mapped &apos; to 0x02BC rather than its XML definition of 0x027. I've added a line specifying 0x027; the 'original' is commented out. en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

The mapping is not reversible, because many distinctions are unified away in Unicode, particularly between mathematical symbols. To make it reversible, one symbol was arbitrarily chosen to encode from hex using these rules:

  • if it's also an XHTML 1.0 entity, use its XHTML reverse mapping.

  • otherwise, just use the first entity encountered,

  • avoiding the &b.foo; type entities

The table is sorted case-blind by SGML character entity name.

The contents of this table are drawn from various sources, and are in the public domain.

SKIP_DUP_ENCODINGS