We all [heart] PDFs!
Any good localization manager (vendor- or client-side) knows that there's very little you can do with a PDF as a source file. Yet time and again, we confront the best intentions of our customers and co-workers who say, "It's not a very large file, so it shouldn't cost much to translate. I'll send it to you." They send us a PDF.
This has happened to me with two new clients this week. We'd all like more translation business, and it's convenient that it exists as a lingua-franca format for us, but PDF is something of a double-edged sword.
PDFs contain everything we need to view a file, but not everything we need to extract the text, formatting, callouts, frames, tags, etc. from it. Creating a localization estimate on a PDF is asking for trouble, because it smooths over a multitude of different issues that we'll encounter once we have the source files, most of which concern text that we know requires translation, but which is not "live" in the PDF and may not be live in the source file from which the PDF came. It's the equivalent of hard-coded strings in software, or localizing a binary without the .properties or resource files.
There are, of course, utilities for converting PDF to RTF to capture the live text and formatting, and that's better than nothing, but it's probably still a far cry from the Quark or InDesign or even MS Word file from which you started. I've sent one of my new clients back to the drawing board several times this week already:
This has happened to me with two new clients this week. We'd all like more translation business, and it's convenient that it exists as a lingua-franca format for us, but PDF is something of a double-edged sword.
PDFs contain everything we need to view a file, but not everything we need to extract the text, formatting, callouts, frames, tags, etc. from it. Creating a localization estimate on a PDF is asking for trouble, because it smooths over a multitude of different issues that we'll encounter once we have the source files, most of which concern text that we know requires translation, but which is not "live" in the PDF and may not be live in the source file from which the PDF came. It's the equivalent of hard-coded strings in software, or localizing a binary without the .properties or resource files.
There are, of course, utilities for converting PDF to RTF to capture the live text and formatting, and that's better than nothing, but it's probably still a far cry from the Quark or InDesign or even MS Word file from which you started. I've sent one of my new clients back to the drawing board several times this week already:
- He gave me a PDF and I asked for the source file.
- He found the source file (Quark) and I asked for the Photoshop files from which the text-bearing graphics had originated.
- There were tables in the Quark file that were Illustrator objects, because these looked much better than Quark's native tables.
- Another PDF of a Word document contains eight graphs created by engineers all over the building. He said he'd try to obtain the original artwork (probably PowerPoint, every engineer's favorite Etch-a-Sketch), but I'll be surprised if he can find it.
Labels: documentation localization, localization graphics, PDF localization