Login Register Actian.com  

Actian Community Blogs

Rate this Entry

Transliteration Feature Opens New Door to Vectorwise Big Data

Submit "Transliteration Feature Opens New Door to Vectorwise Big Data" to Digg Submit "Transliteration Feature Opens New Door to Vectorwise Big Data" to del.icio.us Submit "Transliteration Feature Opens New Door to Vectorwise Big Data" to StumbleUpon Submit "Transliteration Feature Opens New Door to Vectorwise Big Data" to Google
Posted 2012-10-19 at 03:57 PM by teresa
Updated 2012-11-08 at 10:48 AM by teresa

Until recently, it has been nearly impossible to bridge the UTF-8 (Unicode) world, which is required in Vectorwise, with non-UTF-8 data in the Ingres family of products (Ingres DBMS, OpenROAD, etc). Client and server installations both had to use a character set of UTF-8, or neither! This is no longer the case with the new transliteration support added to Vectorwise 2.5. Interoperability is greatly enhanced by addition of this feature.

Available Data Transformations

Vectorwise/Ingres Networking is responsible for network communications between Vectorwise/Ingres clients and servers on remote machines. For platform-dependent data formats, Vectorwise/Ingres Networking provides these data transformations:
  • Integer and float transformations based on hardware storage formats for these values
  • Character set transformations for transliteration between the Vectorwise/Ingres character sets configured for the client and server installations

Expanded Unicode Transliteration Support

Prior to Unicode support, Ingres Networking provided transliteration for single- and double-byte character sets supported by Ingres. When the Unicode UTF8 character set support was introduced, the multi-byte transliteration capabilities required for UTF-8 were not available. Vectorwise/Ingres Networking supported connections between a client and server that were configured with UTF8 as the installation character set, but a UTF8 installation could not connect to an installation which wasn’t configured as UTF8.

Unicode transliteration support is expanded in Vectorwise 2.5. Vectorwise Networking uses expanded transliteration support to enable UTF8 installations to connect with installations configured with a single- or double-byte character set, transliterating character data between the client and server installations. Clients and servers operate independent of the remote character set and need to deal with only the character set configured in the local installation.

Note that applications must still manage character data as represented by the Ingres installation configuration. An application running in a UTF8 installation may need to handle character data differently than when running in a single- or double-byte configured installation.

The new transliteration support is backward compatible, allowing older clients to fully connect to new servers, and new clients to connect to older servers. A limitation exists when a new UTF8 client connects to an older non-UTF8 server: the server accepts only a limited number of potential character sets to be used for the connection, and the client has no way to determine what subset of the supported character sets will work with the target server. In this case, the character set configured in the server installation must be defined as an attribute of the client connection. This can be done by defining a Server Connection Definition (vnode) connection attribute with the name "character_set" and the server’s character set name as the attribute value. The attribute name may also be abbreviated as "charset". The attribute can also be defined as part of an explicit node in the connection string:

@host,port;charset=csname[user,password]::dbname


OpenROAD Considerations

OpenROAD Runtime in a non-UTF8 Ingres instance can be used to access and update Vectorwise using this new feature. OpenROAD Development must access a Vectorwise, Ingres or Enterprise Access instance in which the character set of the OpenROAD development instance matches the character set of the server. This means that a non-UTF8 OpenROAD development environment cannot use Vectorwise as its application repository, but a UTF8 OpenROAD development environment can use Vectorwise as its application repository.

.NET Data Provider and JDBC Driver Considerations

We recommend that, if the .NET Data Provider or the JDBC Driver is used to access Vectorwise, the Data Access Server (DAS) be located in the same instance as Vectorwise. We do not recommend that a Data Access Server located in a remote Ingres instance be used to access the Vectorwise instance via Ingres Net.

Transliteration in VDBA and Actian Director
Transliteration does not allow VDBA to display arbitrary (non-ASCII) Unicode characters from a UTF8 server. (They will be displayed as "?" if they cannot be displayed correctly.) Correct display and processing of Unicode data is available in the Director product.

Note: This functionality is currently not available with Ingres 10S and earlier but will be provided in a future release.

« Prev     Main     Next »

Total Comments 0

Comments

 
© 2011 Actian Corporation. All Rights Reserved