Machine translation {{Refimprove|date=June 2008}} {{TOCright}}'''Machine translation''', sometimes referred to by the abbreviation '''MT''', is a sub-field of [[computational linguistics]] that investigates the use of [[computer software]] to [[translation|translate]] text or speech from one [[natural language]] to another. At its basic level, MT performs simple [[substitution]] of words in one natural language for words in another. Using [[corpus linguistics|corpus]] techniques, more complex translations may be attempted, allowing for better handling of differences in [[linguistic typology]], phrase [[recognition]], and translation of [[idiom]]s, as well as the isolation of anomalies. Current machine translation software often allows for customisation by domain or [[profession]] (such as [[meteorology|weather reports]]) — improving output by limiting the scope of allowable substitutions. This technique is particularly effective in domains where formal or formulaic language is used. It follows then that machine translation of government and legal documents more readily produces usable output than conversation or less standardised text. Improved output quality can also be achieved by human intervention: for example, some systems are able to translate more accurately if the user has [[word sense disambiguation|unambiguously identified]] which words in the text are names. With the assistance of these techniques, MT has proven useful as a tool to assist human translators, and in some cases can even produce output that can be used "as is". However, current systems are unable to produce output of the same quality as a human translator, particularly where the text to be translated uses casual language. ==History== {{main|History of machine translation}} The history of machine translation begins in the 1950s, after [[World War II]]. The [[Georgetown-IBM experiment|Georgetown experiment]] (1954) involved fully-automatic translation of over sixty [[Russian language|Russian]] sentences into [[English language|English]]. The experiment was a great success and ushered in an era of substantial funding for machine-translation research. The authors claimed that within three to five years, machine translation would be a solved problem. Real progress was much slower, however, and after the [[ALPAC|ALPAC report]] (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced. Beginning in the late 1980s, as [[computation]]al power increased and became less expensive, more interest was shown in [[statistical machine translation|statistical models for machine translation]]. The idea of using digital computers for translation of natural languages was proposed as early as 1946 by A.D.Booth and possibly others. The Georgetown experiment was by no means the first such application, and a demonstration was made in 1954 on the APEXC machine at Birkbeck College (London Univ.) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (see for example Wireless World, Sept. 1955, Cleave and Zacharov). A similar application, also pioneered at Birkbeck College at the time, was reading and composing Braille texts by computer. Recently, Internet has emerged as global information infrastructure, revolutionizing access to any information, as well as fast information transfer and exchange. Using Internet and e-mail technology, people need to communicate rapidly over long distances across continent boundaries. Not all of these Internet users, however, can use their own language for global communication to different people with different languages. Therefore, using machine translation software, people can possibly communicate and contact one to another around the world in their own mother tongue, in the near future. Hary Gunarto, Building Dictionary as Basic Tool for Machine Translation in Natural Language Processing Applications, Journal of Ritsumeikan Studies in Language and Culture, VOL 15, No 3, Kyoto, February 2004, pp. 177-185. ==Translation process== {{main|Translation process}} The [[translation process]] may be stated as: # [[Decoding]] the [[meaning (linguistic)|meaning]] of the [[source text]]; and # Re-[[encoding]] this [[meaning (linguistic)|meaning]] in the [[target language]]. Behind this ostensibly simple procedure lies a complex [[cognitive]] operation. To decode the meaning of the [[source text]] in its entirety, the translator must interpret and analyse all the features of the text, a process that requires in-depth knowledge of the [[grammar]], [[semantics]], [[syntax]], [[idiom]]s, etc., of the [[source language]], as well as the [[culture]] of its speakers. The translator needs the same in-depth knowledge to re-encode the meaning in the [[target language]]. Therein lies the challenge in machine translation: how to program a computer that will "understand" a text as a person does, and that will "create" a new text in the [[target language]] that "sounds" as if it has been written by a person. This problem may be approached in a number of ways. ==Approaches== [[Image:Direct translation and transfer translation pyramind.svg|thumb|right|300px|Pyramid showing comparative depths of intermediary representation, [[interlingual machine translation]] at the peak, followed by transfer-based, then direct translation.]] Machine translation can use a method based on [[Expert System|linguistic rules]], which means that words will be translated in a linguistic way — the most suitable (orally speaking) words of the target language will replace the ones in the source language. It is often argued that the success of machine translation requires the problem of [[natural language processing|natural language understanding]] to be solved first. Generally, rule-based methods parse a text, usually creating an intermediary, symbolic representation, from which the text in the target language is generated. According to the nature of the intermediary representation, an approach is described as [[interlingual machine translation]] or [[transfer-based machine translation]]. These methods require extensive [[lexicon]]s with [[morphology (linguistics)|morphological]], [[syntax|syntactic]], and [[semantics|semantic]] information, and large sets of rules. Given enough data, machine translation programs often work well enough for a [[native speaker]] of one language to get the approximate meaning of what is written by the other native speaker. The difficulty is getting enough data of the right kind to support the particular method. For example, the large multilingual [[Text corpus|corpus]] of data needed for statistical methods to work is not necessary for the grammar-based methods. But then, the grammar methods need a skilled linguist to carefully design the grammar that they use. To translate between closely related languages, a technique referred to as [[shallow-transfer machine translation]] may be used. ===Rule-based=== The rule-based machine translation paradigm includes transfer-based machine translation, interlingual machine translation and dictionary-based machine translation paradigms. {{main|Rule-based machine translation}} '''''Transfer-based machine translation''''' {{main|Transfer-based machine translation}} '''''Interlingual''''' {{main|Interlingual machine translation}} Interlingual machine translation is one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, is transformed into an interlingual, i.e. source-/target-language-independent representation. The target language is then generated out of the [[interlinguistics|interlingua]]. '''''Dictionary-based''''' {{main|Dictionary-based machine translation}} Machine translation can use a method based on [[dictionary]] entries, which means that the words will be translated as they are by a dictionary. ===Statistical=== {{main|Statistical machine translation}} Statistical machine translation tries to generate translations using [[statistical methods]] based on bilingual text corpora, such as the [[Hansard#Canadian hansard and machine translation|Canadian Hansard]] corpus, the English-French record of the Canadian parliament and [[EUROPARL]], the record of the [[European Parliament]]. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. The first statistical machine translation software was [[CANDIDE]] from [[IBM]]. Google used [[SYSTRAN]] for several years, but has switched to a statistical translation method in October 2007. Recently, they improved their translation capabilities by inputting approximately 200 billion words from [[United Nations]] materials to train their system. Accuracy of the translation has improved. [http://blog.outer-court.com/archive/2005-05-22-n83.html Google Translator: The Universal Language] ===Example-based=== {{main|Example-based machine translation}} Example-based machine translation (EBMT) approach is often characterised by its use of a bilingual [[corpus]] as its main knowledge base, at run-time. It is essentially a translation by [[analogy]] and can be viewed as an implementation of [[case-based reasoning]] approach of [[machine learning]]. ==Major issues== ===Disambiguation=== {{main|Word sense disambiguation}} Word sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by [[Yehoshua Bar-Hillel]].[http://ourworld.compuserve.com/homepages/WJHutchins/Miles-6.htm Milestones in machine translation - No.6: Bar-Hillel and the nonfeasibility of FAHQT] by John Hutchins He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word.Bar-Hillel (1960), "Automatic Translation of Languages". Available online at http://www.mt-archive.info/Bar-Hillel-1960.pdf Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches. Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful. {{Fact|date=April 2007}} ===Named entities=== Related to [[named entity recognition]] in [[information extraction]]. ==Applications== There are now many [[software]] programs for translating natural language, several of them [[online]], such as the [[SYSTRAN]] system which powers both [[Google]] translate and [[AltaVista]]'s [[Babel Fish (website)|Babel Fish]] as well as [[Promt]] that powers online translation services at Voila.fr and Orange.fr. Although no system provides the holy grail of "fully automatic high quality machine translation" (FAHQMT), many systems produce reasonable output. Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the [[European Commission]]. [[Toggletext]] uses a transfer-based system (known as Kataku) to translate between [[English language|English]] and [[Indonesian language|Indonesian]]. [[Google]] has claimed that promising results were obtained using a proprietary statistical machine translation engine.[http://googleblog.blogspot.com/2005/08/machines-do-translating.html Google Blog: The machines do the translating] (by [[Franz Och]]) The statistical translation engine used in the [[Google tools#anchor_language_tools|Google language tools]] for Arabic <-> English and Chinese <-> English has an overall score of 0.4281 over the runner-up IBM's BLEU-4 score of 0.3954 (Summer 2006) in tests conducted by the National Institute for Standards and Technology.[http://ieeexplore.ieee.org/iel5/2/32474/01516048.pdf?arnumber=1516048 Geer, David, "Statistical Translation Gains Respect", pp. 18 - 21, IEEE Computer, October 2005][http://www.wired.com/wired/archive/14.12/translate.html Ratcliff, Evan "Me Translate Pretty One Day", Wired December 2006][http://www.nist.gov/speech/tests/mt/mt06eval_official_results.html "NIST 2006 Machine Translation Evaluation Official Results", November 1, 2006] [[Uwe Muegge]] has implemented a demo website[http://www.muegge.cc This demo website uses a controlled language in combination with the Google engine] that uses a [[controlled language]] in combination with the [[Google tools#anchor_language_tools|Google tool]] to produce fully automatic, high-quality machine translations of his English, German, and French web sites. With the recent focus on terrorism, the military sources in the United States have been investing significant amounts of money in natural language engineering. ''In-Q-Tel''[http://www.in-q-tel.com In-Q-Tel] (a [[venture capital]] fund, largely funded by the US Intelligence Community, to stimulate new technologies through private sector entrepreneurs) brought up companies like [[Language Weaver]]. Currently the military community is interested in translation and processing of languages like [[Arabic language|Arabic]], [[Pashto language|Pashto]], and [[Dari language|Dari]]. {{Fact|date=February 2007}} Information Processing Technology Office in [[DARPA]] hosts programs like [[DARPA TIDES program|TIDES]] and [[Babylon translator|Babylon Translator]]. US Air Force has awarded a $1 million contract to develop a language translation technology.[http://www.gcn.com/vol1_no1/defense-technology/23450-1.html GCN — Air force wants to build a universal translator] == Evaluation == {{main|Evaluation of machine translation}} There are various means for evaluating the performance of machine-translation systems. The oldest is the use of human judges[http://www.morphologic.hu/public/mt/2008/compare12.htm Compare MT systems by human evaluation, May 2008] to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable way to compare different systems such as rule-based and statistical systems. [[Automate]]d means of evaluation include [[Bilingual evaluation understudy|BLEU]], [[NIST (metric)|NIST]] and [[METEOR]]. Relying exclusively on machine translation ignores that communication in [[natural language|human language]] is [[wiktionary:context|context]]-embedded, and that it takes a human to adequately comprehend the context of the original text. Even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be of publishable quality and useful to a human, it must be reviewed and edited by a human. It has, however, been asserted that in certain applications, e.g. product descriptions written in a [[controlled language]], a [[dictionary-based machine translation|dictionary-based machine-translation]] system has produced satisfactory translations that require no human intervention.Muegge (2006), "Fully Automatic High Quality Machine Translation of Restricted Text: A Case Study," in ''Translating and the computer 28. Proceedings of the twenty-eighth international conference on translating and the computer, 16-17 November 2006, London'', London: Aslib. ISBN 978-0-85142-483-5. == See also == * [[Artificial Intelligence]] * [[Comparison of Machine translation applications]] * [[Computational linguistics]] * [[Computer-assisted translation]] * [[Controlled natural language]] * [[History of machine translation]] * [[Human Language Technology]] * [[List of emerging technologies]] * [[List of research laboratories for machine translation]] * [[Pseudo-translation]] * [[Translation]] * [[Universal translator]] * [[Wiktionary:Translations]] ==References== {{reflist}} *{{cite book | last = Hutchins | first = W. John | authorlink = John Hutchins | coauthors = and Harold L. Somers | year = 1992 | title = An Introduction to Machine Translation | url = http://www.hutchinsweb.me.uk/IntroMT-TOC.htm | publisher = Academic Press | location = London | id = ISBN 0-12-362830-X}} == External links== {{Wikiversity|Topic:Computational linguistics}} *[http://www.eamt.org/iamt.html International Association for Machine Translation (IAMT)] *[http://www.essex.ac.uk/linguistics/clmt/MTbook/ Machine Translation], an introductory guide to MT by D.J.Arnold et al. (1994) *[http://www.mt-archive.info Machine Translation Archive] by [[John Hutchins]]. An electronic repository (and bibliography) of articles, books and papers in the field of machine translation and computer-based translation technology *[http://www.hutchinsweb.me.uk/ Machine translation (computer-based translation)] — Publications by John Hutchins (includes [[PDF format|PDF]]s of several books on machine translation) *[http://bowland-files.lancs.ac.uk/monkey/ihe/mille/paper2.htm Machine Translation and Minority Languages] *[http://www.foreignword.com/Technology/art/Hutchins/hutchins99.htm John Hutchins 1999] ===Software=== *[http://xixona.dlsi.ua.es/apertium-www/ Apertium - an open-source shallow-transfer machine translation engine and toolbox] {{Approaches to machine translation}} [[Category:Artificial intelligence applications]] [[Category:Computational linguistics]] [[Category:Machine translation|*]] [[Category:Natural language processing]] [[af:Outomatiese vertaling]] [[ar:ترجمة آلية]] [[be-x-old:Машынны пераклад]] [[bg:Компютърен превод]] [[ca:Traducció automàtica]] [[cs:Strojový překlad]] [[cy:Peiriant cyfieithu]] [[da:Maskinoversættelse]] [[de:Maschinelle Übersetzung]] [[es:Traducción automática]] [[eo:Maŝintradukado]] [[eu:Itzulpengintza automatikoa]] [[fa:ترجمه ماشینی]] [[fr:Traduction automatique]] [[ko:기계 번역]] [[hi:मशीनी अनुवाद]] [[hr:Strojno prevođenje]] [[id:Terjemahan mesin]] [[he:תרגום מכונה]] [[lt:Automatinis vertimas]] [[hu:Gépi fordítás]] [[ms:Terjemahan mesin]] [[nl:Computervertaling]] [[ja:機械翻訳]] [[no:Maskinoversettelse]] [[oc:Traduccion automatica]] [[pl:Tłumaczenie automatyczne]] [[pt:Tradução automática]] [[ro:Traducere automată]] [[ru:Машинный перевод]] [[simple:Machine translation]] [[sk:Strojový preklad]] [[sr:Машинско превођење]] [[fi:Konekääntäminen]] [[sv:Maskinöversättning]] [[th:การแปลภาษาอัตโนมัติ]] [[tg:Тарҷумаи мошинӣ]] [[uk:Машинний переклад]] [[wuu:机器翻译]] [[zh-yue:機械翻譯]] [[zh:机器翻译]]