March 17, 2003
Copyright 2003 Printer Working Group, All Rights
Reserved.
XTHML is a trademark of the World Wide
Web Consortium.
In traditional printing environments, clients rely on font downloads when they are not sure a given character is embedded in the printer. As printing moves to small clients, downloading may not be an option and clients have a need to know what characters are available in a given device.
There are many published named character repertoires, and a small client will not know about them all.
To improve operability, this document defines:
The primary target of this document is printing using languages based on XML or HTML (for example, XHTML-Print). It will be less applicable to traditional PDLs (PCL, PostScript, etc.) because they tend to have very language-specific mechanisms for managing character repertoires.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. The latest status of this document series is maintained at the PWG.
All sections of this document are normative unless noted as informative.
This document is a working draft and only a working draft. It is currently being reviewed by PWG Members but has not been approved. It is not a stable document and may not be used as reference material nor cited as a normative reference from another document.
Public discussion of PWG Character Repertoires takes place on the mailing list: cr@pwg.org. To subscribe send an email to majordomo@pwg.org with the words subscribe cr in the body. You must be subscribed to the mailing list to post there. Please report errors in this document to the editor listed above or on the mailing list.
A list of current PWG Standards and other technical documents can be found at http://www.pwg.org/standards.html
This document define a data element called "repertoires-supported". This element is intended to be incorporated into higher level description schemes, such as the PWG Semantic Model [PWG-SM], as well as protocols based on those schemes.
Inside the scope of this document are:
Some areas outside the scope of this document are:
In Unicode and W3C documents, the term character set usually refers to a method of encoding a (possibly very large) set of characters, e.g. UTF-8. This tells how to encode a given character if it is present, but doesn't define which characters in that space are actually in use.
The term character repertoire is used here to indicate a subset of characters that is actually present. It is convenient to specify a character repertoire using Unicode characters; however in principle a character repertoire could be encoded in a different encoding.
The keywords "MUST", "SHALL", "MUST NOT", "SHALL NOT", "REQUIRED", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" when used in this document are to be interpreted as described in RFC 2119 [RFC2119]. However, for readability, these words do not appear in all uppercase letters in this specification.
[PWG-SM] defines semantic elements for a printer to use in advertising its capabilities (among other things). We use the Model to let a printer advertise its supported repertoires; the union of all characters in all advertised repertoires tells the client what characters it may safely use. (Note that a printer is free to implement additional characters beyond those listed in the supported repertoires.) A printer might also use "repertoires-ready," in the usual manner described by the Semantic Model, to indicate repertoires that are available without any operator intervention (such as inserting a DIMM).
A client references characters in whatever encoding is present, without reference to a particular repertoire. In other words, repertoires are (possibly overlapping) sets of characters, but a repertoire is not needed to reference a character. Therefore, there are no semantic elements for default, current, or actual repertoire values.
This document specifies how to reference repertoires defined elsewhere. "repertoires-supported" contains one or more values, with each value constructed as follows:
Source | Form of each value | Example |
IANA charset registry as defined in [IANA-Charsets] | IANA: name | IANA: iso-8859-1 |
Unicode code chart as defined in [Unicode-Charts] | Unicode: name | Unicode: Basic Latin 1 |
Unicode Unihan database as defined in [Unihan] | Unihan: name | Unihan: JIS X 0208 |
Vendor specific | Vendor: vendor: name | Vendor: Oak: Floral |
Note that these sources are in a variety of encodings, not necessarily Unicode. If a non-Unicode repertoire is used in a Unicode context, the implication is that the corresponding Unicode codepoints are used. Such mappings are outside the scope of this document (but are commonly available in most cases).
In matching names, the client should consider these rules:
As a result, all of the following are equivalent:
Unicode: Latin-1 Supplement
unicode:Latin1Supplement
unicode: latin_1 supplement
Individual transport protocols may place further restrictions on the use of upper/lower case, and the use of space, hyphen, and underscore characters.
The semantic element "repertoires-supported" does not correlate with particular fonts. If a character is present in an advertised repertoire, then the printer must be able to render that character regardless of the currently selected font. However, renderings in different fonts need not be distinct. A common approach is for the printer to implement a system default font with all advertised characters, and to implement a fall-through mechanism that will render a character from the default font if it is not available in a currently select font.
In order to promote interoperability, this document designates a small number of repertoires as "basic". In this way a print client that only knows the names of the basic repertoires can get useful results.
The repertoires designated as basic are:
Latin Extended-A is used primarily by Latin-based languages in Eastern Europe. The last four support PRC, Japan, Korea, and Taiwan respectively.
A conforming printer must advertise a basic repertoire whenever it advertises similar repertoires. For example, any printer advertising any Cyrillic repertoire must also advertise "Unicode: Cyrillic". In this way a client that does not recognize a large number of repertoires can still recognize that basic Cyrillic printing is possible on this device.
Printers will often support larger repertoires. If a printer supports a repertoire that is a superset of a basic repertoire, then it must advertise the basic repertoire in addition to the superset.
A printer may implement several types of extensions without losing conformance with this document. Examples include:
A conforming printer must follow these rules:
There is no requirement that every supported character is represented in some repertoire; a printer may support specific characters without advertising them. In some languages (e.g. those based on XHTML) certain characters are implicitly supported (e.g. as built-in character entities), without being advertised in any repertoire.
Printing protocols (outside of this document) specify how a print client learns about the supported repertoires in a printer. Once it knows, a client may choose to use this knowledge in any of these ways:
This document was prepared with input and assistance from:
(To be written.)
(To be written.)