RETS Change Proposal 48: Compact Format Simplification
|
| Author: David Terrell |
| Organization: Center for REALTORŪ Technology |
| Telephone: |
| Address: 430 N. Michigan Ave., Floor 6, Chicago IL, 60611-4087 |
| Email: dterrell@crt.realtors.org |
| Status: Proposal |
| Date: 11/03/2003 |
| Proposal Version: 1.7 |
| 1. Synopsis |
| A new non-XML based compact format to replace the current version in order to address current issues. |
| 2. Rationale |
| The compact format for search results is intended to provide a bandwidth efficient format to transport large datasets. It is essentially delimited data within an XML tag. Problems of how to include the delimiter within the data have already been addressed (RPC-035). In the end, it seems as if XML is just getting in the way, and this change proposal is for a non-XML based compact format.
The issues can best be seen from a client perspective. Given that compact format is specified as well formed XML, a client is most likely using an off the shelf XML parser. Upon receiving the data from the XML parser, the client must decode the data as per RPC-035, if it is encoded. And finally, the client must then parse the data again to split the data at the delimiters. This means that a client must perform up to three passes over the data to correctly parse it. The server is also affected as it must encode data, first substituting XML entities, and then optionally encoding as per RFC-035. This change proposal keeps the bandwidth used to transfer data small, while allowing a client to parse the data in a single pass. |
| 3. Proposal |
| The format of this compact format is defined to be line based. No XML tags are used. The format is defined by the following augmented BNF (as described in Section 2 of RFC 2616 ):
compact-data = *data-row
data-row = interpretation 1*(HT data) CRLF
interpretation = *1(token)
data = text
text = *(TEXT)
To include control characters inside data, the following backslash escapes are recognized:
| Escape Sequence | ASCII Equivalent |
| \a | 7 (BEL) |
| \b | 8 (BS) |
| \t | 9 (HT) |
| \n | 10 (LF) |
| \v | 11 (VT) |
| \f | 12 (FF) |
| \r | 13 (CR) |
| \\ | 134 ("\", backslash) |
| \(octal) | Character whose ASCII encoding is "octal" |
| \u(hex) | Character whose Unicode encoding is "hex" |
The interpretation rule provides context information to a data row. The currently known interpretations are as follows:
| Token Value | Token Meaning |
| "RETS" | The first data element is the reply code, and the second element is the reply text. |
| "Columns" | This row's data specifies column names. |
| "Count" | This first data element is the total number of data rows returned. |
| "Error" | Contains four data elements: field name, error-num, error-offset, and error-text as defined in Section 10.5. |
| "" (no token) | This row's data specifies actual uninterpreted data. |
Here is an example of the new compact format:
"RETS" HT "0" HT "Success" CR LF
"Columns" HT "STATUS" HT "OWNER" CR LF
"Count" HT "3" CR LF
HT "A" HT "Joe Schmoe" CR LF
HT "X" HT "Mary Jane" CR LF
HT "Z" HT "John\tMcCormick" CR LF
|
| 4. Development Impact |
| This would require new development for all clients and servers. However, this implementation is much simpler than the existing compact format. |
| 5. Compatibility |
| This format is not compatible with the existing compact format. |
| 6. Proof/Need of Concept Examples |
| None. |