RETS Change Proposal 36: Field Data Encoding
Author: Leo Bijnagte
Organization: Fidelity National Information Systems
Telephone: (612)661-1087
Address: 100 Washington Sq. #900, Minneapolis, MN 55410
Email: lbijnagte@fnis.com
Status: Proposal
Date: 06/26/2003
Proposal Version: 1.0
1. Synopsis
This proposal supplements the current requirement that all transaction responses must be well-formed XML to address the problem of ambiguous characters in the <DATA> and <COLUMN> tags of COMPACT and COMPACT-DECODED responses to the Search transaction.
2. Rationale
Per section 7.6 the DELIMITER value attribute must be HEX HEX (see section 7.2.1 for the definition of field-delimiter) representing the character (OCTET) used to separate each field in the <COLUMN> tag (<COLUMNS> SystemNames ... </COLUMNS>) and each field-data in the <DATA> tag (<DATA> Content ... </DATA>) columns. Servers should select a delimiter that does not occur in the field (SystemNames) or field-data (Content) but may not be able to do so if all OCTETS are possible. For those servers, should the delimiter  OCTET occur in the Content (or SystemName if the delimiter is an ALPHANUM) it needs to be escaped so that it is unambiguously not the delimiter (doubling doesn't work since there may be no Content in a column) and unambiguously not part of the literal content.

XML escaping field (SystemNames) or field-data (Content) can  not result in unambiguous content because &#9; could be a literal string in the data, even though unlikely. A commonly used method like URI escapinging/unescaping has a number of tools available but, when applied only to the delimiter, imposes minimal overhead.

3. Proposal
3.1. Specification Changes

Section 7.6 needs to be modified to add support for an optional decode attribute::

delimiter-tag ::= <DELIMITER value="field-delimiter" [ decode="decode-type" ] />CRLF

with the definition

decode-type ::= "uri"

A server that specifies a decode-type of uri MUST encode the delimiter value and "%" in field-data or field using the escaped encoding (2.4.1) of RFC 2396. Servers MAY use the escaped encoding for any OCTET in field or field-data if it specifies a decode-type of uri. Servers MUST not specify a DELIMITER value of 25 when specifiying a decode-type of uri and MUST specify the decode-type of uri if uri escaped encoding was used on the field and field-data. Servers the do not specify a decode-type MUST NOT include the delimiter value in the field or field-data. 

3.2 Implementation Notes

Servers should minimize the impact on bandwidth by minimizing the character set that is uri escaped. Clients must be careful to interpret data in the correct sequence:
1. XML unescape the element body
2. Break fields at the delimiter
3. uri unescape the field or field-data
3.3. New ServerInformation transaction

This change proposal introduces an extensible mechanism for obtaining server settings. The settings may be system-wide, or may be associated with a specific resource and class.

4. Development Impact
Clients using COMPACT and COMPACT-DECODED formats will need to recognize and process the new DELIMITER attribute to be compliant. Servers must use either the delimiter value or decode to make the field and field-data unambiguous.

The addition of the ServerInformation transaction represents a mandatory enhancement for 1.6-compatible servers.

5. Compatibility
Because this change involves an addition that allow servers to specify a change to the interpretation of the reply, an earlier version of the RETS client may break when a servers utilizes the new attribute.
6. Proof/Need of Concept Examples
None.