29 May 2008

Localizing Robohelp Files - The Basics

We get a lot of search engine queries like "localize Robohelp file" and "translate help project." I'm pretty sure that most of them come from technical writers who have used Robohelp to create help projects (Compiled HTML Help Format), and who have suddenly received the assignment to get the projects localized.

The short answer
Find a localization company who can demonstrate to your satisfaction that it has done this before, and hand off the entire English version of your project - .hpj, .hhc, .hhk, .htm/.html and, of course, the .chm. Then go back to your regularly scheduled crisis. You should give the final version a quick smoke test before releasing it, for your own edification as well as to see whether anything is conspicuously missing or wrong.

The medium answer
Maybe you don't have the inclination or budget to have this done professionally, and you want to localize the CHM in house. Or perhaps you're the in-country partner of a company whose product needs localizing, and you've convinced yourself that it cannot be that much harder than translating a text file, so why not try it?

You're partially right: it's not impossible. In fact, it's even possible to decompile all of the HTML pages out of the binary CHM and start work from there. But your best bet is to obtain the entire help project mentioned above and then use translation memory software to simplify the process. Once you've finished translating, you'll need to compile the localized CHM using Robohelp or another help-authoring product (even hhc.exe).

The long answer
This is the medium answer with a bit more detail and several warnings.
  • There may be a way to translate inside the compiled help file, but I wouldn't trust it. Fundamentally, it's necessary to translate all of the HTML pages, then recompile the CHM; thus, it requires translation talent and some light engineering talent. If you don't have either one, then stop and go back to The Short Answer.
  • hhc.exe is the Microsoft HTML Help compiler that comes with Windows. It's part of the HTML Help Workshop freely available from Microsoft. This workshop is not an authoring environment like Robohelp, but it offers the engineering muscle to create a CHM once you have created all of the HTML content. If you have to localize a CHM without recourse to the original project, you can use hhc.exe to decompile all of the HTML pages out of the CHM.
  • Robohelp combines an authoring environment for creating the HTML pages and the hooks to the HTML Help compiler. As such, it is the one-stop shopping solution for creating a CHM. However, it is known to introduce formatting and features that confuse the standard compiler, such that some Robohelp projects need to be compiled in Robohelp.
  • Robohelp was developed by BlueSky Software, which morphed into eHelp, which was acquired by Macromedia, which Adobe bought. Along the way it made some decisions about Asian languages that resulted in the need to compile Asian language projects with the Asian language version of Robohelp. This non-international approach was complicated by the fact that not all English versions of Robohelp were available for Asian languages. Perhaps Adobe has dealt with this by now, but if you're still authoring in early versions, be prepared for your localization vendor to tell you that it needs to use an even earlier Asian- language version.
  • Because the hierarchical table of contents is not HTML, you may find that you need to assign to it a different encoding from that of the HTML pages for everything to show up properly in the localized CHM, especially in double-byte languages.
  • The main value in a CHM lies in the links from one page to another. In a complex project, these links can get quite long. Translators should stay away from them, and the best way to accomplish that is with translation memory software such as Déjà Vu, SDL Trados, across or Wordfast. These tools insulate tags and other untouchable elements from even novice translators.
We've marveled at how many search engine queries there are about localizing these projects, and we think that Robohelp and the other authoring environments have done a poor job explaining what's involved.

If you liked this article have a look at "Localizing Robohelp Projects."

Labels: , , , , , , , ,

05 October 2007

"Why are you charging me for that?" - Part 1

Have you ever asked your localization vendor this question? Or, if you're a vendor, has any client ever asked it of you?

For a few clients, we manage large documentation projects, notably HTML Help and Robohelp localization. When the vendor translated 800 HTML pages for version 1.0 of the product, a particular client swallowed hard and paid for all non-matches, because it was the first time localizing the product.

By version 2.0, the Help had grown to 1400 pages. Many of the original 800 pages had no translatable changes, but Trados dutifully scooped up all of those words, dropped them into the "100%" or "95-99%" buckets, and the vendor charged us for them, even if at a greatly discounted rate.

"Why are you charging me for that?" I asked. I'll have more on this topic in an upcoming post, but for now:

If you're on the vendor-side, do you have a good answer for that question? If you're on the client-side, have you ever received an answer to that question that satisfied you?

Labels: , , ,

10 August 2007

Localization - Top 5 Web searches

What is your most frequently used Web search regarding localization? Are there search phrases you check every now and again to see what new results they yield?

Over the last couple of years, I've tuned the keywords on this blog and on our Web site, www.1-for-all.com, for both pay-per-click and search engine optimization. I have a pretty good idea of which search topics bring people to this blog, and here are the top five topics, with my comments:
  1. Localization of HTML help projects (Robohelp, CHM, etc.). I can't tell whether people have trouble with this, or whether they're poking around to find out whether they are going to have trouble with it once they undertake it. My hunch is that Robohelp, the dominant product for creating HTML help, either doesn't do a good job creating localized help systems or doesn't do a good job in explaining how to create them. Our experience has been that double-byte localization requires a specifically enabled, separate version of Robohelp, which strikes me as silly, but perhaps Adobe has addressed this by now.
  2. How much to charge/pay for translation. Everybody wants to know this. Responding to the frequency of these queries, I wrote an article called "Going Global Without Going Broke" to help people who want a few benchmark figures from which to cobble together a budget. If you're any further along than that, you should just contact a vendor, push your files to him and get an estimate. If you're a translator or want to become one, phone a localization company, tell them what you can do and find out what they'll pay you.
  3. Localization project manager/management. I would guess that about half of these are vendors (a.k.a localization service providers, or LSPs) and half are companies with localization needs to fill.
  4. Localization jobs. Most of these queries come from Ireland. There's a relatively high concentration of localization talent in that country, and perhaps a high rate of turnover as well.
  5. What is localization? Again, the frequency of these queries prompted me to write articles called "Opening the Black Box." I'm glad to see people asking this question, because it demonstrates continuing and continuing interest in this specialty. At the same time, however, I notice that some of these queries come from China and India, suggesting to me that the IT shop which has just promised you it can localize your software for one-seventh the price you've gotten from other vendors, is now trying to figure out what's involved in fulfilling that promise.
At the other end of this list are the searches we're not seeing: questions we believe people should be asking but aren't.

Labels: , , , , ,

11 May 2007

Localizing RoboHelp projects

Is it time for you to localize you RoboHelp projects? What's involved?

"RoboHelp project" is shorthand for "compiled help system." When this lives on a Windows client computer it is usually HTML Help (CHM) files. There are other variations like Web Help, which are also compiled HTML, but which do not run on the client.

The projects are a set of HTML files, authored in a tool such as--but not limited to--RoboHelp, then compiled into a binary form that allows for indexing, hierarchy and table of contents. Other platforms (Mac OS, Linux, Java) require a different compiler, but the theory is the same.

If you've done localization before, you'll find that RoboHelp projects are relatively easy, compared to a software project. RoboHelp (or whatever your authoring/compilation environment may be) creates a directory structure and file set that is easy to archive and hand off. It includes a main project file, table of contents file and index file. In fact, it's even possible in a pinch to simply hand off the compiled file, and have the localizers decompile it; the files they need will fall into place as a result of the decompilation.

Although you may think of the project as a single entity for localization purposes, each HTML page is a separate component. There may be large numbers of these pages that don't change from one version of your product to the next; nevertheless, you need to hand them off with the project, and you'll likely be charged for a certain amount of "touching" that the localizer's engineers will need to do. You may be able to save them some work and yourself some money by analyzing the project and determining which pages have no translatable changes, but by and large you should consider the costs for touching unchanged pages an unavoidable expense.

The biggest problem with these projects is in-country review. There's no easy way for an in-country reviewer to make changes or post comments in the compiled localized version. We've found that MS Excel is the worst way of doing this (except for all the others), so we've learned to live with it.

In theory, the translators are not mucking about with any tags, so the compiled localized version should work the same as the original. Yeah, right. All the links need to be checked--they do break sometimes--and the index and table of contents should be validated. And, don't forget to try a few searches to make sure they work; your customers surely will, and you want to spare them any unpleasant surprises.

Remember:
  • If you've included graphics in your help project, you'll need to obtain the original source files. These are not GIFs or JPEGs; they will be the application files from which the GIFs and JPEGs were generated. You'll need to hand off files from applications like Adobe Illustrator, or Flash or even PowerPoint, so that the translators can properly edit the text in them. Engineers often do quick mock-ups in Microsoft Word's Word Art that end up in the final product, and it takes a while to track them down.
  • Encoding can be thorny. Some compilers behave oddly if you try to impose the same encoding on both the HTML pages and the table of contents, especially in Japanese, in our experience.

Labels: , , , , ,

20 April 2007

Localization Testbenches, Part IV (Online Help)

What are you using to test your localized products? If you're handing them to your domestic QA team and expecting that they'll intuitively test them with correct language locale settings, you may be in for an unpleasant surprise.

3) Help files
Your online documentation also deserves some testing. After its contents (usually HTML pages or XML documents) have been translated - in the correct encoding for the target language - the help project will be compiled, in the same way that software applications are compiled. This compilation step needs to account for the correct language, locale and encoding, and this doesn't happen by itself, no matter how lucky you may feel today.

Again, it's important to test the help file in an environment that closely matches your customers' environment. Run your Greek help file on a native Greek operating system. Be sure to test the main window, the contents pane and the index for properly displayed characters. Above all, perform a few searches using native characters in the Find field to ensure that your help file's index was properly created and encoded; if your searches are successful, then your customers' searches will probably be successful as well.

Note: HTML Help under Windows has some idiosyncrasies when it comes to the table of contents (TOC) pane and the main window. Most tools like RoboHelp will properly encode the TOC and main pane content for, say Japanese, when all of the content resides in the same project. However, if you're building your HTML help files with your own tools (e.g., Perl scripts and hh.exe), you may find that encoding sauce for the goose is not encoding sauce for the gander. We've found, for example, that the HTML pages displayed in the main window are happy with UTF-8, whereas the TOC pane won't support UTF-8 but will support Shift-JIS.

Labels: , , ,

26 February 2007

Translation non-savings, Part I

How far will you go to improve your localization process?

Because of how localization is viewed in many companies, the best improvements are the ones that lower cost. Low cost helps keep localization inconspicuous, which is how most companies want it.

But if a big improvement didn't save any obvious money, would your organization go for it?

Elsewhere in this blog I relate the saga of the compiled help file with 3500+ HTML pages in it. These pages come from a series of Perl scripts that we run on the header files to extract all of the information about the product's API and wrap it up in a single, indexed, searchable CHM. In a variety of experiments, we've sought to move the focus of translation from the final HTML files to a point further upstream, at or near the header files themselves. If the raw content were translated, we believe, all downstream changes in the Perl scripts, which get revised quite often, would be imposed automatically on the localized CHM.

One of the biggest cost items - we have suspected - is due to changes in line wrapping and other HTML variations that confuse TM tools into thinking that matches are fuzzier than they really are. The false positives look like untranslated words when analyzed, so the wordcounts rise, and not in our favor.

"If we work with raw text, before HTML formatting," our thinking goes, "the match rate will rise."

Not.

I'll describe my experiment shortly.

Labels: , , , , ,

30 January 2007

Localization Train slowing

We're seeing the localization juggernaut lose some steam.

In the early years, this client localized its flagship software package for developers in China, Japan and Korea (CJK), then added Brazil. It took small, reference applications into as many as 10 languages (including Hebrew and Thai) as those markets showed promise. The budget was pretty fat, the localized products were freshened frequently, and the developers were happy to have software and doc in their own language.

I suppose it was to be expected that this would peter out with time, because markets change, business cases wax and wane, and some regions never return the investment.

The new stressor on localization was less easy to anticipate: bulk. Each generation of improvements to the product brings several hundred more pages of documentation. All of this new documentation is, of course, "free" in English, but somebody has to pull out a checkbook to deal with it in other languages, and that checkbook comes out more slowly and with more misgivings these days.

Engineering and Product Management furrow their brow nowadays when I walk in with cost estimates. I've adapted to this change in attitude with a few techniques:
  1. The Technical Reference is the fattest target and the source of most of the expansion. It lives in a compiled help file (CHM) that is no longer written by Tech Pubs, but generated by Perl scripts from header files written by the engineers. Our modus localizandi has been to hand off the finished help project, now comprising 3700 HTML files, and have the HTML translated. In an effort to lower cost, I'm attempting a proof-of-concept to localize the header files themselves, then tune the scripts to convert them into localized HTML. This should lower our localization engineering costs considerably.
  2. I agitate for interim localization updates, peeling off documentation deltas every few weeks and handing them off for translation, even if there are no plans to release them yet. This reduces the sticker shock and time-to-market delay that comes of getting an estimate on a release only when necessary, which may be a 10- to 18-month interval. Product Management and Engineering, who only think about localization when it's absolutely unavoidable, find the tsunami of untranslated text depressing.
  3. Although it's not a very clean way of doing things, I screen from the localization handoff those items that I know have little to be translated. Sometimes I go to the level of resource files, but more often I take documents to which only a few minor changes have been made from one En version to the next, hand off changed text, then place the translations myself. This is not for the faint of heart, nor for those who don't really know the languages involved, but it can save some money.
  4. I try to keep global plates spinning, in the hope that more people will consider the global dimension of what we do, and the fact that localization is the necessary step for making your product acceptable to people whose use of your product will make you money, if you make it easy for them.
  5. I never impart bad news on Friday.

Labels: , , , , ,