Navigation
Learn About
Developing With
Ingres Talk
Information
Toolbox
Views
Enhanced Multi-byte Charset Support in the Ingres GCC
From Ingres Community Wiki
This is an Ingres 10.0 Connectivity project to enhance multi-byte character set support in the Ingres GCC.
Contents |
References
This project is a preliminary sub-set of the UTF8 transliteration project for Ingres/Net. More information can be found here.
Design Document is available here.
Existing Implementation
Existing charset support includes single-byte and double-byte transliteration. The double-byte implementation is mostly an add-on to the single-byte processing ifdef'd with DOUBLEBYTE conditional compilation. Ingres double-byte charsets have some characteristics in common with multi-byte charsets in that double-byte charsets include both 1 byte and 2 byte character encodings. Transliteration of double-byte charsets can result in string lengths shrinking or growing depending on the character lengths of corresponding characters. The existing implementation is also based on data type at a level where storage structure is a more significant attribute.
Proposed Implementation
This project integrates operations implemented in double-byte code as discrete processing operations in GCO processing compilations. The GCO data type definitions map character data types to common storage structure processing compilations and the new processing operations are applied to all charset types.
Operations being added are as follows:
- Generalized basic character transliteration to be defined by a processing function and transliteration table or map.
- Added character processing functions for single-byte and identity mappings.
- Separated character and byte lengths of strings.
- Generalized character lengths beyond 1 and 2 bytes.
- Adjust padding length to compensate for changes in string lengths.
- Added ability to save and update explicit string length indicators.
- Added ability to save and update meta-data lengths.
- Detect and carry-over partial characters when split across string segments.
In addition to single-byte and double-byte transliteration, an additional mapping type is being added, identity, for situations where the same charset is used as the local and network charset. Identity mappings allow character data to be handled in bulk with copy functions rather than character-by-character.
Notes
Ingres Enhancement Number
SIR 123289
DDS Review Summary
Test Considerations
Extensive GCC testing is needed using combinations of single-byte and double-byte character sets, including at least the following:
- The same character set on client and server across heterogenous platforms (or ingsetenv II_FORCE_HET true).
- Different single-byte character sets on client and server.
- Different double-byte character sets on client and server.
All character data types need to be tested. Byte, varbyte and long varbyte should also be tested to ensure string enhancements do not have any side effects.
OS Dependencies
None

