An Introduction to Machine Translation

January 02, 2013

This is the first in a series of Machine Translation (MT) blogs by Globalization Partners International. The MT technology discussed in this series will be of interest to both new and sophisticated translation users, especially those who face the daunting task of translating tremendous amounts of content with very tight budgets and time constraints.

We will start the series with a brief introduction to MT, discussing its history, explaining how the technology works and providing an overview of its use in business today.

History of Machine Translation

machine-translationThe modern history of automated translation begins primarily in the post WW2 & Cold War era, when the race for information and technology motivated researchers and scholars to find a way to quickly translate information.

In 1947, an American mathematician and scientist named Warren Weaver published a memorandum to his peers outlining his beliefs about a computer's capability to render one language into another using logic, cryptography, frequencies of letter combinations, and linguistic patterns. Fueled by the exciting potential of this concept, universities began research programs for MT, which eventually founded the Computational Linguistics field of study.

In the 1950's, one such research program at Georgetown University teamed up with IBM for a MT experiment. In January 1954, they demonstrated their technology to a keenly interested public. Even though the machine translated only a few dozen phrases from Russian into English, the project was hailed a success, as it showcased the possibilities of fully automated MT and provoked interest in and funding of MT research worldwide.

The optimism of the Georgetown-IBM experiment was replaced by a feeling of pessimism in the 1960's, when researchers and scholars became frustrated at the lack of progress made in the field of computational linguistics despite huge funding. In 1966, a special committee formed by the United States government reported that MT could not and would not ever be comparable to human translation and therefore was an expensive venture that would never yield any useable results.

translation-MTIn the 1970's and 80's, researchers changed their focus to develop tools that would facilitate the translation process rather than replace human translators, leading to the development of Translation Memory (TM) and other Computer Assisted Translation (CAT) tools that are still an integral part of the localization process today.

In the 1990's, globalization, the proliferation of the internet, access to cheap & powerful computers and advancements in speech recognition software were a few factors that fueled the progress of MT.

Today, despite advancements in the field, MT is still seen by most professionals and firms requiring accurate translations as an inadequate substitution for human translation teams. However, many companies are embracing MT and are applying the technology in their localization work flows, allowing them to get more out of their translation budgets, if post editing can be completed in a cost-effective manner (more on this in later blogs). In fact, if companies use MT in the right way, quality can actually be improved compared to pure Human Translation.

How does Machine Translation work?

Machine Translation is software that automatically renders text from one language into another. There are many different types of MT software, but for the sake of brevity, we will cover only Rule-Based Machine Translation (RBMT) and Statistical Machine Translation (SMT), two commonly used and very different approaches.

Rule-based Machine Translation is built on the premise that a language is based on sets of grammatical and syntactic rules. In order to obtain the translation of a segment, the software would need to have a robust bilingual dictionary for the specified language pair, a carefully outlined set of linguistic rules for the sentence structure of each language and a set of rules to link the two sentence structures together. These requirements can be time-consuming and expensive to create and must be done each time a new language pair is required. A key benefit of RBMT over other MT approaches is that it can produce better quality for language pairs with very different word orders (for example, English to Japanese).

Statistical Machine Translation is built on the premise of probabilities. For each segment of source text, there are a number of possible target segments with a varying degree of probability of being the correct translation. The software will select the segment with the highest statistical probability. SMT is generating the most interest in the field today, as it can be applied to a wide range of languages and is not as resource intensive as RBMT.

Machine Translation in Business Today

Companies these days are being forced to do more with less. Organizations are desperately trying to handle huge amounts of content in an era where shrinking budgets and instant gratification cause tremendous pressure. It is simply too cost prohibitive and time consuming to tackle these mountains of content using the standard human translation process, so it's no surprise that translation buyers are looking to MT for the answers.

Yes, Machine Translation quality is imperfect and it may never be perfect in our lifetime; but it is continuously improving. When in experienced and informed hands, it is a useful linguistic tool to translate large volumes of the right content.

Some of the world's largest companies and most recognizable brand names are putting MT technology to use in their localization workflows, in conjunction with Translation Memory, glossaries, style guides and human translators.  This process allows a leading e-commerce company to make it simple for customers to buy & sell across borders. It enables a large global equipment manufacturer to send translated operations manuals to their technicians out in the field. It gives a multinational computer technology corporation the means to provide online support to foreign customers in dozens of languages.

Conclusion

This Machine Translation series will continue with more in-depth blogs on the technology, implementation, use, benefits, and limitations of Machine Translation. We will also discuss business situations where Machine Translation would and would not be a practical solution (with supporting case studies) and will explain how Machine Translation can be built into a translation/localization workflow in a way that helps organizations achieve their translation goals.

You may also find some of the following articles and links useful:

Additional resources on language translation services

To further understand the entire Globalization process, you should download our PDFs Language Globalization Guides. You may also benefit from our previous blogs:

GPI, a premiere translation agency, provides comprehensive globalization and translation services. GPI will be happy to assist you. Request a Translation Quote online, or you may contact GPI at info@globalizationpartners.com or at 866-272-5874 with your specific questions about your target global markets and your project goals.

Category:
Document Translation
Tags:
Translation, Document Translation, Machine Translation

Chinese Website Content key to growing Chinese TourismInsights into Certified Translation Services

Comments

  • Emma GoldsmithOn Jan 02, Emma Goldsmith said:
    Thanks, Lauren, I enjoyed your brief history of MT. By chance I've recently been reading some articles by Muriel Vasconcellos, who was involved in the Georgetown Univ research in the late 1950s. http://www.murieltranslations.com/linguistics_mt_articles.html#mt

    However, I'm surprised that you boldly state "...if companies use MT in the right way, quality can actually be improved compared to pure Human Translation." Unless you're talking about substandard human translation, I cannot think of a case where MT will produce a better result than a human. Customised MT solutions, with strict terminology control and limited source language scope may be reaching a point where post-editing (by a human) starts to make sense, but without these steps I cannot agree that MT is comparable with or better than a human translator.
  • GPIOn Jan 02, GPI said:
    Hi Emma and thanks for reading!

    I completely understand your sentiment. A couple of years ago, I never would have been so bold as to suggest that Machine Translation could produce a better quality result than a human translation. (Indeed, raw Machine Translation output alone likely can't). Over the past couple years, however, I have heard client-side MT users from large global organizations speak about their MT experiences at various industry events (such as Localization World and TAUS) and their comments are very surprising.

    For example, one Translation Department Manager explained that the quality scores of technical manuals that were Machine Translated then Post-Edited were higher than the scores of the same manuals that were done by pure Human Translation. Surely this is the exception rather than the rule, but it's still within the realm of possibility and I think MT technology & the way we use it will only get better. It's not necessarily that the human translation is sub-standard, but I think there are some legitimate cases where software can take some of the room for subjectivity and human error out of the equation. I am planning to expand on this topic in subsequent blogs, so I hope you'll check back!

    Also, thanks for sharing the great link! I've bookmarked it and will definitely use some of those articles as resources in the future.
  • An Introduction to Machine Translation | Translation in the IT environment | Scoop.itOn Jan 03, An Introduction to Machine Translation | Translation in the IT environment | Scoop.it said:
    From blog.globalizationpartners.com - Today, 2:29 AM …
  • An Introduction to Machine Translation | Translation Memory | Scoop.itOn Jan 03, An Introduction to Machine Translation | Translation Memory | Scoop.it said:
    From blog.globalizationpartners.com - Today, 9:44 AM …
  • An Introduction to Machine Translation | Lexicool.com Web Review | Scoop.itOn Jan 07, An Introduction to Machine Translation | Lexicool.com Web Review | Scoop.it said:
    From blog.globalizationpartners.com - Today, 2:50 AM …
  • Star Automation machineOn Jan 09, Star Automation machine said:
    You made a few fine points there. I did a search on the matter and found most people will go along with with your blog.
  • Roberto A. HaasOn Jan 10, Roberto A. Haas said:
    If a machine translated and then post-edit technical (or specialized) translation is better than a pure human translation, my opinion is that the translators employed were not duly specialized in the corresponding field. The problem is that too many "general translators" accept (for the money)specialized jobs that are well above their knowledge and thus commit many technical translation errors. I am a 30 year-experienced English-Spanish translator with an MBA, and I have never accepted translations outside my field of professional experience and knowledge, so my customers never had to complain about my work.
  • An Introduction to Machine Translation | Metaglossia: The Translation World | Scoop.itOn Jan 10, An Introduction to Machine Translation | Metaglossia: The Translation World | Scoop.it said:
    From blog.globalizationpartners.com - Today, 7:33 PM …
  • Arvi HurskainenOn Jan 11, Arvi Hurskainen said:
    Thank you, Lauren, for starting discussion on this very important topic. I want to comment on the observation, or claim, that MT combined with human post editing can produce better results than human translation alone. The main reason for this may be that MT is systematic, in good and bad, while people tend to solve the same translation problem in two or more alternative ways. Human translation is more unsystematic, or saying in a more beautiful way, more creative than MT.

    There are, however, big differences in the performance of MT systems. Google Translate, for example, relies mainly on statistical methods, and the translation errors are difficult to trace and correct. On the other hand, rule-based MT systems make use of grammars and lexicons, have control of multi-word expressions, word order, inflection etc. As a result, the translation result is predictable. Therefore, when the MT result is systematic, the human translator capable of using post-editing tools, can automate much of the post editing task.
  • GPOn Jan 11, GP said:
    Thank you for your comments, Arvi! Very well said and I agree with you completely. I don't think that companies are getting "better" results because MT is inherently better (indeed, not!). I think that the application of MT as part of the process can help automate some aspects of the translation process (handling tags, for example) in a way that allows the translator/post-editor to focus their time on the more important elements of the work, like style, flow, accuracy, fluency and tone.
  • Dion WigginsOn Jan 14, Dion Wiggins said:
    This is a good intro for those not yet familiar with MT or with a experience in older or non-customized MT. We have a number of customers that have had their clients come back surprised that the quality was better than a human only approach. The reason for the difference in quality comes from the fact that multiple humans differ in their writing style and preferred terminology choices. Not every project has a large quality assurance budget and it is easy to get inconsistent terminology.

    When a high-quality MT engine is customized, we are able to normalize both terminology and writing style. The engine is trained on previous translations for the same or similar clients. When editors are trained to only edit when there is an error, not when there is something against their preference, the quality increases notably. The early problems are "red pen syndrome" where the editors feel the must change something. We have many customers who have had this experience and are able to achieve in excess of 50
    e45cf418f9845e88fe9ef65e305380 segments where not editing is required at all on a regular basis. The key is to build normalization into the process and train editors properly, not just throw them into post editing.

    I am very please to see that this message is starting to be heard, even if there are some skeptics. MT has matured a lot in recent years and it will take a long time to change mindset, but there are more proof points every day.
  • Dion WigginsOn Jan 14, Dion Wiggins said:
    I am not sure what happened to the above message "50 e45cf418f9845e88fe9ef65e305380" was supposed to read "50 percent".
  • GPIOn Jan 15, GPI said:
    Hello Dion and thank you very much for reading and for your comments! My intent is to delve into more complicated & advanced Machine Translation topics as this blog series continues, focusing largely on the topics you mentioned: machine training/customization and the post-editing process. I hope you will check back! I appreciate your input and agree completely with all your points.
  • Dion WigginsOn Jan 15, Dion Wiggins said:
    Hi Lauren, we are covering a lot of these topics in our webinar next week. The title of the webinar is "Effective MT Post-Editing - A Discussion on Best Practice and Fair Compensation for Post Editors". This includes a real world case study. Synopsis and registration is at https://www1.gotomeeting.com/register/919970008 in case you are interested.
  • GPIOn Jan 16, GPI said:
    Thanks Dion, I signed up! Looking forward to the webinar.
  • Enrique Woll BattistiniOn Jan 18, Enrique Woll Battistini said:
    One issue not deal with here, perhaps because technically it is outside of the scope of this article, is the quality of the original text to be translated as to grammatical and orthographic correctness, consistency of terminology and style, and freedom from ambiguity and/or meaningless sequences of words. The occurrence of these roadblocks to translation -whether purely human or MT aided- is frequent in the work I am faced with in translating reports prepared by the staff writers in NGOs, and signals the clear need for entities of all sorts to edit their own work carefully before submitting it for translation. I believe that the situation described can significantly increase the time and cost, and decrease the quality and usefulness of the resulting translated documents.
  • GPIOn Jan 18, GPI said:
    Enrique, thanks very much for reading and for your comment. Indeed, as with most things, "Garbage in - garbage out." It is imperative to have good quality source material that is simple, grammatically correct, free of slang/jargon and with well structured sentences. I hope to cover this more in future blogs. Thanks again!
  • Doris GanserOn Jan 20, Doris Ganser said:
    In addition to impeccable source copy to which Enrique Woll Battistini referred above (don't we all wish it existed), I am wondering whether not perhaps many of these much-praised "better" machine translations are, indeed, translations "from scratch" or perhaps based on excellent existing translation memories produced by humans.
  • GPIOn Jan 21, GPI said:
    Doris, most likely the engines that produced those results were trained using very robust, high quality Translation Memories, which indeed are the product of good, professional, human translators. Good machine translation engines become that way because they are built and trained using very good data from humans, which we'll cover more in blogs to come :)
  • Cateva chestiuni despre traducerile automate | Birou Traduceri Craiova   0761431255On May 12, Cateva chestiuni despre traducerile automate | Birou Traduceri Craiova 0761431255 said:
    ecat sa elimine activitatile de rutina si sa indrepte traducatorii spre sarcini mai dificile, de finete. Sursa imagine: www.blog.globalizationpartners.com. Daca aveti nevoie de noi, ne gasiti la 0761 431 255/ 0723 678 986/andrinacraciun@gmail.com …
  • To Reach A Global Audience, Consider Writing For Computers First, Humans Second | The Content WranglerOn Jul 03, To Reach A Global Audience, Consider Writing For Computers First, Humans Second | The Content Wrangler said:
    fast-growing global audience — involves looking at content production from the viewpoint of a rules processing engine (a machine). It means rethinking the content we publish to the web to ensure it is easily and accurately understandable, processable …
Contact Us!
Request Translation Quote +1 703-286-2193 +1 866-272-5874 On-Line live chat


Federico has over 10 years' experience as a globalization engineer managing a wide range of software and website globalization projects (internationalization I18n + localization L10n). His expertise spans software and website internationalization and localization processes, standards and tools as well as locale specific SEO. Federico has completed hundreds of successful globalization engagements serving as lead I18n architect involving different programming languages. He is a certified developer in several content management systems and helps clients create world-ready applications, utilizing development practices that are faster, more economical, and more localization-friendly.

Follow us on: GPI's Google Plus account GPI's Twitter account GPI's LinkedIn account