Extracting Word Format Source Files from PDF

May 11, 2011

Nearly every customer finds themselves in the embarrassing situation of not having a critical source file for translation in the appropriate desktop publishing (DTP) format. Sometimes, all that is available is a PDF file derived from a "lost" source file in FrameMaker, InDesign or Word format.

Regardless of the reason, you and your translation company have several tools and methods available to create an editable source file from PDF. This blog will cover two of the more effective techniques that our production staff has discovered here at GPI while performing multilingual desktop publishing.

1. Save the PDF as MS Word file.

This option is best suited for PDF files which are 20-30 pages in length that were created by printing from editable files like InDesign, FrameMaker and MS Word. Adobe Acrobat X Pro and Nitro PDF Professional are two software programs which allow you to do this.

*

The resulting MS Word file will preserve the text formatting (font family, size, and color) and the graphics. If your other source files are FrameMaker or InDesign, you can easily import the resulting MS Word file into these applications. This will enable you to recreate a properly formatted, editable source file that is suitable for language translation.

2. Convert the PDF using a standalone PDF to MS Word conversion tool.

You have a wide range of tools to choose from, including ABBYY FineReader and deskUNPDF. In addition, there are some tools which can be used directly online, like Zamzar, PDFonline, and PDF to Word Converter.

Check the list of 30+ Tools of PDF Converter, PDF Creator and PDF Reader for the complete list.

When converting a PDF file using one of these tools:

  • We have the possibility of editing every page individually
  • We can indicate which part of the page should be converted as a graphic or a table
  • We can even indicate the format of the text before saving the conversion into Word format

These options are highly useful in the event that the application didn't detect the correct format in the first place.

*

Option to indicate PDF text language

*Another great thing about these tools is that we can indicate the language of the text in the PDF file before starting the conversion process.

We can create a list of languages we use most often and the program will remember them, or we can specify the language(s) for the file we are converting on an as-needed basis.

The results will be better, especially with scans and large files with multiple graphics and complex formatting.

Optimizing Word files for translation

The results are not always perfect, regardless of the method used. You will usually need to prepare the word documents for translation before submitting it to your translation agency.

Some paragraphs might have the lines split by hard or soft returns, (as indicated in the screen capture below this paragraph.) The text on some of the graphics might be converted to editable text.

*

Find the best tool for your files

In order to obtain the best results with the least amount of work we need to make sure we are using the best tool for the type of PDF file we need to convert. Begin the process by analyzing the PDF file to determine its source, layout and format complexity.

More resources regarding multilingual DTP

Globalization Partners International has extensive experience translating documentation in all common authoring products from Microsoft, Adobe and other vendors. You may wish to review recommended steps used by GPI in DTP projects in Multilingual Desktop Publishing. You may also find our previous blog on "What You Need To Know About Graphic Localization" useful. You may also benefit from two of our recent blogs on desktop publishing: "8 Ways Unstructured FrameMaker 10 helps Translation" and "8 Steps to Optimize InDesign files for translation."

Please contact GPI at info@globalizationpartners.com or at 866-272-5874 with your specific questions about PDF files, Microsoft Word and your project goals. A complimentary Document Translation Quote for your project is also available upon request.

Category:
Document Translation
Tags:
Desktop Publishing, source files, PDF, Adobe, Word, FrameMaker

When Machine Translation Backfires in ChineseSpeaking the Languages of Online Marketing in UAE

Comments

  • cathyOn May 20, cathy said:
    You can always use the old cut and paste from PDF into Word.
  • Ajay kumarOn May 20, Ajay kumar said:
    Free online tool to convert one file at a time.

    http://www.pdfonline.com/pdf-to-word-converter/
  • Catherine GuilliaumetOn May 20, Catherine Guilliaumet said:
    Thank you.
    Excellent summary.
    For a comparison of several free PDF to DOC converters, see here :
    http://www.freewaregenius.com/2010/03/06/how-to-convert-pdf-to-word-doc-for-free-a-comparative-test/
  • Oana DiaconuOn May 23, Oana Diaconu said:
    Cut/paste can be used when you have a small file, not so complicated in layout. When the file that needs to be converted is bigger and has a more complex layout, it is recommended to use a special tool, which allows us to preserve the text formatting and graphics, and also recognizes the language of the text.
    Web-based converters may be helpful and handy in some situations, but we should never use them for converting confidential documents.
  • ManuOn Jul 01, Manu said:
    Cut & Paste is in the past. It's more expensive, takes more time, it's repetitive task, you can introduce mistakes, you need for sure someone to check your work, your quote will be higher.
    You practically have no advantages using this method.
  • Oana DiaconuOn Jul 01, Oana Diaconu said:
    You are right, Manu, cut and paste is time consuming and error prone, but can be used in small documents, like a letter, when the layout is not important or complex.

    There is enough technology today which helps us extract text/graphics from a pdf with very good results.

    Acrobat X is a great tool for exporting text and graphics from a pdf. You can even edit the text and graphics in a pdf by opening them directly from Acrobat in an application for editing (like Illustrator), and the pdf is updated after you close the file in the respective application.
  • Dhiraj AggarwalOn Sep 09, Dhiraj Aggarwal said:
    The only method is a combination of software tools and manual formatting i.e. first use a software to extract the text/images, and then do a touch-up manually. We are a DTP company, and specialize in such work.