Extracting Word Format Source Files from PDF
May 11, 2011
Nearly every customer finds themselves in the embarrassing situation of not having a critical source file for translation in the appropriate desktop publishing (DTP) format. Sometimes, all that is available is a PDF file derived from a "lost" source file in FrameMaker, InDesign or Word format.
Regardless of the reason, you and your translation company have several tools and methods available to create an editable source file from PDF. This blog will cover two of the more effective techniques that our production staff has discovered here at GPI while performing multilingual desktop publishing.
1. Save the PDF as MS Word file.
This option is best suited for PDF files which are 20-30 pages in length that were created by printing from editable files like InDesign, FrameMaker and MS Word. Adobe Acrobat X Pro and Nitro PDF Professional are two software programs which allow you to do this.
The resulting MS Word file will preserve the text formatting (font family, size, and color) and the graphics. If your other source files are FrameMaker or InDesign, you can easily import the resulting MS Word file into these applications. This will enable you to recreate a properly formatted, editable source file that is suitable for language translation.
2. Convert the PDF using a standalone PDF to MS Word conversion tool.
You have a wide range of tools to choose from, including ABBYY FineReader and deskUNPDF. In addition, there are some tools which can be used directly online, like Zamzar, PDFonline, and PDF to Word Converter.
Check the list of 30+ Tools of PDF Converter, PDF Creator and PDF Reader for the complete list.
When converting a PDF file using one of these tools:
- We have the possibility of editing every page individually
- We can indicate which part of the page should be converted as a graphic or a table
- We can even indicate the format of the text before saving the conversion into Word format
These options are highly useful in the event that the application didn't detect the correct format in the first place.
Option to indicate PDF text language
Another great thing about these tools is that we can indicate the language of the text in the PDF file before starting the conversion process.
We can create a list of languages we use most often and the program will remember them, or we can specify the language(s) for the file we are converting on an as-needed basis.
The results will be better, especially with scans and large files with multiple graphics and complex formatting.
Optimizing Word files for translation
The results are not always perfect, regardless of the method used. You will usually need to prepare the word documents for translation before submitting it to your translation agency.
Some paragraphs might have the lines split by hard or soft returns, (as indicated in the screen capture below this paragraph.) The text on some of the graphics might be converted to editable text.
Find the best tool for your files
In order to obtain the best results with the least amount of work we need to make sure we are using the best tool for the type of PDF file we need to convert. Begin the process by analyzing the PDF file to determine its source, layout and format complexity.
More resources regarding multilingual DTP
Globalization Partners International has extensive experience translating documentation in all common authoring products from Microsoft, Adobe and other vendors. You may wish to review recommended steps used by GPI in DTP projects in Multilingual Desktop Publishing. You may also find our previous blog on "What You Need To Know About Graphic Localization" useful. You may also benefit from two of our recent blogs on desktop publishing: "8 Ways Unstructured FrameMaker 10 helps Translation" and "8 Steps to Optimize InDesign files for translation."
Please contact GPI at firstname.lastname@example.org or at 866-272-5874 with your specific questions about PDF files, Microsoft Word and your project goals. A complimentary Document Translation Quote for your project is also available upon request.
Oana Diaconu - Senior Desktop Publishing Specialist
Oana has extensive experience in multilingual publishing and document analysis for translation projects for content authored with Microsoft and Adobe publishing applications. She is an expert user of Trados linguistic software and has managed many DTP translation projects with source documents that exceed 1,000 pages in length in over 20 languages. Oana is originally from Iasi, Romania and speaks English, Romanian and some French and Italian.