In traditional printing environments, clients rely on font downloads when they are not sure a given character is embedded in the printer. As printing moves to small clients, downloading may not be an option and clients have a need to know what characters are available in a given device.
There are many published named character repertoires, and a small client will not know about them all. For interoperability, this document defines a small set of character repertoires as "preferred," so that a complying client and printer can interoperate with only knowledge of those repertoires. It also defines a naming convention so that a printer may advertise support for these and other named repertoires.
The primary target of this document is printing using languages based on XML or HTML (for example, XHTML-Print). It will be less applicable to traditional PDLs (PCL, PostScript, etc.) because they tend to have very language-specific mechanisms for managing character repertoires.
In Unicode and W3C documents, the term "character set" usually refers to a method of encoding a (possibly very large) set of characters, e.g. UTF-8. This tells how to encode a given character if it is present, but doesn't define which characters in that space are actually in use.
The term "character repertoire" is used here to indicate a subset of characters that is actually present. It is convenient to specify a character repertoire using Unicode characters; however in principle a character repertoire could be encoded in a different encoding.
This specification derives repertoire names from several sources. To avoid ambiguity, the PWG name for a repertoire indicates the source.
The categories of naming are:
Source | Examples |
Unicode chart | Unicode Latin 1 Unicode Cyrillic |
Unicode Unihan database | Unihan JIS X 0208 Unihan KPS 10721-2000 |
IANA charset registry | Charset iso-8859-1 |
Vendor extension | Vendor Oak Floral |
Unicode charts are as described in:
http://www.unicode.org/charts
Unicode Unihan database mapped character set names are as described in:
http://www.unicode.org/charts/unihan.html
IANA charset names are as described in:
http://www.iana.org/assignments/character-sets
Note that IANA charsets are in a variety of encodings, not necessarily Unicode. If a non-Unicode repertoire is used in a Unicode context, the implication is that the corresponding Unicode codepoints are used. Mappings are available for most IANA charsets, but this is outside the scope of this document.
In matching names, the client should consider these rules:
Names are case-insensitive, so a letter should match its upper/lower case equivalent
Space, hyphen, and underscore characters are interchangeable
Individual transport protocols may place further restrictions on the use of upper/lower case, and the use of space, hyphen, and underscore characters.
The following are the PWG Preferred Repertoires:
The last four support PRC, Japan, Korea, and Taiwan respectively.
A conforming printer must follow these rules:
As a result, a client may get good results with only knowledge of the preferred repertoires.
There is no requirement that every supported character is represented in some repertoire; a printer may support specific characters without advertising them. In some languages (e.g. those based on XHTML) certain characters are implicitly supported (e.g. as built-in character entities), without being advertised in any repertoire.
Printing protocols (outside of this document) specify how a print client learns about the supported repertoires in a printer. Once it knows, a client may choose to use this knowledge in any of these ways:
Registration Procedures for Additional Names
Internationalization Considerations
Security Considerations
References
Author's Address
Additional Contributors