Remote Synthesis
Search my blog:

Category: Globalization

Mar 29, 2006

Open-Source I18N Tools

If working on globalizing Hasbro's web sites has taught me one thing, it is that the internationalization process can be very difficult and painful (so much so for me that I was hesitant even to write about it :| ). You also quickly learn that, while ColdFusion's support for internationalization (i18n) has taken dramatic leaps in both 6.1 and 7, there are a number of functions that you may require that ColdFusion isn't equipped to handle out-of-the-box. Thankfully, Paul Hastings (through his company Sustainable GIS) offer a wealth of i18n tools, free and open-source (these include, all but 1 of the open-source i18n projects on the ColdFusion Open-Source List at the moment). While I am certainly no expert in i18n, I wanted to give a beginner's experience using the Geolocator CFC and Resource Bundles CFC in a recent project.

This is excerpted from a report I developed - portions have been removed where they may be irrelevant or confidential. Note that when "we" or "our" refers to my employer)

As discussed above, we face a number of challenges going forward with international web site development. Some of these, for example locales and character sets, are directly related to the addition of Asian and Eastern European languages. Others are indirectly related in as much as the number of supported languages has a great impact on development cost and time as well as long-term maintenance.

This is excerpted from a report I developed - portions have been removed where they may be irrelevant or confidential. Note that when "we" or "our" refers to my employer)

  • Collation – Different languages sort data differently and in some instances within our sites it may be an important consideration. In some cases, we are sorting by non-string data (for instance, news is sorted by date). However, products are often sorted alphanumerically by their display name (after taking into account the priority). As mentioned earlier, this can be handled within the SQL syntax using COLLATE. Nonetheless, instances where internal ColdFusion sorting functionality is used will have to be accommodated (possibly through the inclusion of IBM's ICU4J). I am not aware of any such cases at this point in time however.

  • Searching – To my knowledge, we are not using ColdFusion's built-in Verity search technology at the moment, but it has been discussed as an option to replace the limitations and inefficiencies of LIKE queries. Thus, searching may have to be accommodated in the long term but is not a pressing issue in the near term.

  • Calendars – Given the functionality on existing sites, accommodating non-Gregorian calendar systems may not be an urgent consideration. I am not aware of any items that would be date-sensitive from a user's perspective. Still, it is important to note, should additional functionality be requested that requires calendar considerations, that locales such as Japan and China, for example, use non-Gregorian calendar systems. This may also require incorporating functionality available through leveraging the capabilities built into ICU4J.

  • Time Zones – In much the same manner as calendars, I do not believe we have any time-sensitive information currently in use on our sites. The recommended solution, should this be required in the longer term, is storing time information in Greenwich Mean Time and converting it to the user's time zone.

  • BIDI Locales – BIDI stands for bi-directional in relation to those locales that require that a page be oriented from right-to-left. As of writing this, I am not aware of intentions to support BIDI locales such as Hebrew or Arabic. Should this be a long term goal though, it will affect the entire design and layout process.

  • Addresses – I do not believe this is yet a consideration given that contact information is collected via an outside vendor (and I do not believe we have intentions of supporting this for international at the moment). Should this change, address collection forms and databases will have to be built taking into account non-U.S. formatted addresses.

  • Measurement – While product information occasionally refers to the objects measurement, differences in measurement can be accommodated in translation at the moment as (I believe) measurement data is not stored independently.

This is excerpted from a report I developed - portions have been removed where they may be irrelevant or confidential. Note that when "we" or "our" refers to my employer)

A resource bundle is a collection of key/value combinations used to translate text data throughout the site. This functions in much the same way that our current Legal component does except that it separates out key/value combinations into locale specific bundles and does not require a knowledge of ColdFusion to maintain (in many respects, the current translation spreadsheets are similar to resource bundles though not directly usable within ColdFusion). In addition, the use of resource bundles open up the possibility of using tools such as IBM's RB Manager to manage building and maintaining resource bundles. There are also pre-built tools available that are both free and open-source to assist in implementing resource bundles in ColdFusion.

This is excerpted from a report I developed - portions have been removed where they may be irrelevant or confidential. Note that when "we" or "our" refers to my employer)

As briefly mentioned above, character encoding is an issue when discussing adding additional locales. However, all available documentation from Macromedia or otherwise recommends using Unicode and, in particular, using ColdFusion's default UTF-8 Unicode encoding, which uses variable length encoding to limit bandwidth. Macromedia also recommends that your pages be encoded with a byte-order mark (BOM) that indicates that the page has been encoded in UTF-8 (this is supported by Dreamweaver, via the Page Properties Document Encoding property, and Eclipse, though it appears may not be a default in either). If this is not done, a cfprocessingdirective tag is necessary to establish the page encoding at the beginning of every page that has non-default encoding.

|