13 August 2006

On the varieties of Translation Memory mismatches

A close look at two different versions of the API Reference and the TM was enlightening.

Whereas the early version of the API Reference had multiple instances of

"These are filled by

Interface A and can be examined in the callback function."

and the current version has

"These are filled by
Interface A and can be examined in the callback function."

TM has

"These are filled by Interface A and can be examined in the callback function."

Ordinarily, segmentation rules in the TM tools would ignore the whitespace and CRLFs and see the sentence as a single translation unit, but it appears that the TM tool cannot (or will not) ignore all of the whitespace. It could also be that ignoring it correctly here would have adverse effects elsewhere in the document. It's also more or less successful in ignoring "unimportant" differences inside HTML tags.

In any event, because the engineers are constantly tweaking the Perl scripts, the source header files are still much more stable than the extracted HTML files. We should figure out a way to localize the former and generate the latter using scripts, as we do for English.

Labels:

0 Comments:

Post a Comment

<< Home