Masaryk University Faculty of Arts

Machine translation and the Czech language

Download 168.63 Kb.
Size168.63 Kb.
1   2   3   4   5   6   7   8

3.3 Machine translation and the Czech language

The use of machine translation in the Czech environment faces serious problems that are represented by a limited number of users (speakers) and a very complex system of grammar.

Statistical (or data-driven) approaches obtain linguistic knowledge from vast collections of concrete example texts. While it is sufficient to use text in a single language for training, e. g., a spell checker, parallel texts in two (or more) languages have to be available for training a machine translation system. The machine learning algorithm then learns patterns of how words, short phrases and complete sentences are translated. This statistical approach usually requires millions of sentences to boost performance quality. This is one reason why search engine providers are eager to collect as much written material as possible. (Bojar, Rehm, and Uszkoreit 43)

Statistical MT systems are very useful and applicable in the case of big languages, but the Czech language belongs to smaller languages with a limited number of speakers and the statistical method alone is not suitable for an accurate machine translation. Instead, it requires a combination of statistical and rule based machine translation system.

4 The use of machine translation in technical translation from English into Czech

This part of the thesis analyzes translations of technical texts (user guides, user instructions, documentation, and web pages) to show and highlight the common errors produced by statistical machine translation systems. The used machine translation systems are free online translators services Google Translate by Google and Bing Translator by Microsoft, as those systems are available to millions of users and therefore widely used.

Professional translators work with texts translated with the help of different MT systems using different corpora and most importantly different term-bases (glossaries or term-bases) which contain allowed and approved translations of certain terms. For this reason the analyses will not deal with the accuracy of individual terms – nouns (if they are not completely wrong) and rather focus on morphology and sentence structure.

The technical texts used are available both in the Czech and English version to ensure that the quality of translation can be assessed correctly.

4.1 Examples of technical machine-translated texts

The first three examples originate from a user manual for a dishwasher. They are selected to represent different lengths of sentences, different moods and phrases. The language of user manuals and user guides for home appliances is very simple, it uses relatively short sentences, clear instructions, and the individual terms are used frequently. The aim of such a manual is to provide the user with all necessary information and to avoid any possible confusion. Therefore user manuals are a good example of texts that are suitable for machine translation.

4.1.1 User manual

Table 1

Electrolux ESI64030 Dishwasher user manual (sentences with simple text)

English original text:

To switch on the appliance or to set a washing programme, turn the programme knob clockwise or counterclockwise.

Czech human translation:

Zapnutí spotřebiče nebo nastavení mycího programu se provádí otočením voliče programu doprava nebo doleva.

Google Translate:

Pro zapnutí přístroje nebo nastavit mycí program, otočte voličem programu ve směru hodinových ručiček nebo proti směru hodinových ručiček.

Bing Translator:

Přístroj nebo nastavení pracího programu, otočte knoflík program doprava nebo doleva.

The first example sentence originates from a user manual of an Electrolux dishwasher. Both the automated translations contain significant errors. The first part of the translated sentence produced by Google Translate requires a correction of the verb “nastavit” as it should correspond with the verbal form of the preceding verb “zapnout (zapnutí)”. However, the phrase “Pokud chcete spotřebič zapnout nebo nastavit mycí program” or the used human translation “Zapnutí spotřebiče nebo nastavení mycího programu” sounds better and more understandable to a Czech user. The second part of the sentence is translated correctly, but the Czech users are accustomed to turning the knobs “doprava” (to the right) or “doleva” (to the left). The Bing Translator omitted the verb “to switch” completely. The provided second part of the sentence is very good.

Table 2

Electrolux ESI64030 Dishwasher user manual (very short sentences)

English original text:

Open the door. Remove the lower basket.

Czech human translation:

Otevřete dveře. Odstraňte dolní koš.

Google Translate:

Otevřít dveře. Vyjměte dolní koš.

Bing Translator:

Otevřete dveře. Odstraňte spodní koš.

Translated sentences in Table 2 are very short and contain only predicate (verb) and object. The Bing Translator produced practically the same translation as the human translator. Google Translate did not recognize the imperative in the first sentence and used infinitive form of the verb “otevřít”. The structure of both translations is correct.

Table 3

Electrolux ESI64030 Dishwasher user manual (longer sentences)

English original text:

If the hot water comes from alternative sources of energy that are more environmentally friendly (e.g. solar or photovoltaic panels and aeolian), use a hot water supply to decrease energy consumption.

Czech human translation:

Pokud odebíráte horkou vodu z alternativních zdrojů, které jsou šetrnější k životnímu prostředí (např. solární či fotovoltaické panely, nebo větrná energie), použijte horkou vodu ke snížení spotřeby energie.

Google Translate:

Dostane-li se teplá voda z alternativních zdrojů energie, které jsou šetrnější k životnímu prostředí (např. Solární či fotovoltaické panely a aeolian), použijte dodávku teplé vody pro snížení spotřeby energie.

Bing Translator:

Je-li teplá voda pochází z alternativních zdrojů energie, které jsou k životnímu prostředí (např. solární a Fotovoltaické panely a Liparské), pomocí horké vody snížit spotřebu energie.

The third example from the dishwasher user manual is a longer sentence that would imply a more extensive post-editing work by a human translator. However, the Google Translate produced surprisingly accurate translation. A human translator or reviewer would in this case have to thoroughly select only the correct counterparts for the words “comes” and “aeolian” and could leave the rest of the sentence untouched. The rest of the translation in the means of morphology and sentence structure is correct. The translation provided by Bing Translator on the contrary contains serious errors. The attribute (modifier) “friendly (šetrný)” is omitted here and once again the imperative is not recognized. The translated text lacks the most important thing – it is not understandable.

4.1.2 Instruction manual

The second technical text used for this analysis is Instruction manual for Picture Style Editor, which is software provided by Canon, Inc. The previous user manual contained mostly very short sentences with the aim to express clear instructions that could be understood by the general public. This instruction manual is intended for professionals and enthusiasts in the field of digital photography. The text consists of longer and more complex sentences with terminology from both the world of software and photography.

Table 4

Picture style creating software – Picture Style Editor version 1.16 – Instruction Manual (sentences with simple text)

English original text:

Square brackets are used to indicate items such as menu names, button names and window names that appear on the computer screen.

Czech human translation:

Hranaté závorky se používají k označení položek, jako jsou například názvy nabídek, názvy tlačítek a názvy oken, které se zobrazí na obrazovce počítače.

Google Translate:

Hranaté závorky se používají k označení položky, jako jsou názvy menu, názvy tlačítek a názvy oken, které se objeví na obrazovce počítače.

Bing Translator:

Hranaté závorky se používají k označení položky jako menu názvy, názvy tlačítek a okno zobrazené na obrazovce počítače.

This first sentence selected from the instruction manual is very simple, does not contain any technical information and terminology.

All the words are used very often and the translations reflect this fact. Google Translate made an error in the number of the noun “položka”, it used the genitive case and singular number “položky” instead of the correct plural. However, this is by far the only error produced in this translation. The human translator would correct the number and could use the translation (the approved translation of the word menu is in the software field according to widely used Microsoft Style Guide “nabídka”, but this would be handled by an associated glossary). What is worth noticing is the correct use of commas in the object. Bing Translator made the same error using the number of the noun “položka” and the noun “okno”, translated the “such as” phrase automatically as “jako” without the verb “to be”. The first part of the sentence is obviously used very often and the system translated it correctly, unfortunately, made many mistakes in the second part.

Table 5

Picture style creating software – Picture Style Editor version 1.16 – Instruction Manual (shorter sentences)

English original text:

The adjustments made with the [Tool palette] are immediately applied to the image in the main window, enabling you to check the results as you work.

Czech human translation:

Úpravy provedené pomocí okna [Tool palette/Paleta nástrojů] se okamžitě projeví na snímku v hlavním okně, takže můžete kontrolovat výsledky během práce.

Google Translate:

Úpravy provedené pomocí palety nástrojů [] se okamžitě aplikuje na obraz v hlavním okně, což vám umožní zkontrolovat výsledky, jak si práci.

Bing Translator:

Úpravy s [paletu nástrojů] se okamžitě projeví obrázek v hlavním okně, což umožňuje kontrolovat výsledky při práci.

Both the sentences produced by MT systems evince errors that require human involvement. The basic structure of the original sentence is preserved and the meaning is clear. Subject verb agreement is improperly translated by Google Translate (the adjustments are applied > úpravy se aplikuje), the agreement is preserved in translation by Bing Translator, however in this case is the form of the verb “projevit se” the same for the 3rd person singular and 3rd person plural and the possible error cannot be detected.

Table 6

Picture style creating software – Picture Style Editor version 1.16 – Instruction Manual (longer sentence with a more complex syntax)

English original text:

In addition to displaying colors before and after adjustment, the adjustment color list (p.8) shows overlapped range of effect of adjusted colors, and has a checkbox for specifying whether applying adjusted colors or not.

Czech human translation:

V seznamu upravených barev (str. 8) se kromě barev před úpravou a po úpravě zobrazí symbol překrytí platných rozsahů upravených barev a zaškrtávací políčka umožňující určit, zda budou upravené barvy použity nebo nikoli.

Google Translate:

Kromě zobrazení barev před a po očištění se seznam nastavení barev (str.8) ukazuje překrývající spektrum účinku upravených barev a má políčko určující, zda použití upravených barev, nebo ne.

Bing Translator:

Kromě zobrazení barvy před a po úpravě, úprava seznamu Barva (p.8) ukazuje překrývající spektrum účinku upravená barev a má políčka určující, zda použití upravit barvy nebo ne.

The third sentence selected from the instruction manual shows the weakness of machine translation – the translation of longer and complex sentences. Texts produced by Google Translate and Bing Translator are very similar (the same segments “kromě zobrazení barev/barvy před a po” and “ukazuje překrývající spektrum účinku”) but confusing. The systems have enough information (they know) how they should translate some collocations of words, but fail to compose a meaningful sentence. The resulting translation is poor and requires significant edits on the part of translator who can use only the last part of the sentence and even this needs correction.

4.1.3 Technical documentation

The last analyzed text comes from a technical documentation of Domat Control System, a company that produces room controls, regulators, and other similar peripheral devices. The text is intended for designers and contains a mixture of simple and more complex sentences with highly specialized terminology. It provides detailed descriptions and instructions.

Table 7

Domat Control System – RcWare Vision Function Overview (short sentence)

English original text:

RcWare is a SCADA system with rich possibilities of integration.

Czech human translation:

RcWare Vision je vizualizační systém s bohatými možnostmi integrace.

Google Translate:

RcWare je SCADA systém s bohatými možnostmi integrace.

Bing Translator:

RcWare je SCADA systém s bohatými možnostmi integrace.

Google Translate and Bing Translator produced identical correct translations. The only difference between texts produced by MT service and human translation is that the translator used the collocation “vizualizační systém” instead of the abbreviation SCADA (Supervisory Control and Data Acquisition).

Table 8

Domat Control System – RcWare Vision Function Overview (short sentence with complicated content)

English original text:

The system is configured by defining and editing communication channels (serial lines, remote RS232 over Ethernet ports, OPC, etc.) and data points.

Czech human translation:

V systému se nejprve definují komunikační kanály (sériové linky, RS232 přes porty Ethernet, OPC, atd.) a datové body.

Google Translate:

Systém je nakonfigurován definováním a editaci komunikační kanály (sériové linky, dálkový RS232 přes porty Ethernet, OPC, atd) a datových bodů.

Bing Translator:

Systém je nakonfigurován pomocí definování a úpravy komunikačních kanálů (sériové linky, Vzdálená RS232 přes Ethernet porty, OPC, atd.) a datové body.

The output of both machine translation systems is also in this case of more complicated sentence acceptable. MT systems could not handle the declension of the collocation “communication channels” and also the spelling needs correction. However, the translation as a whole is not bad.

Table 9

Domat Control System – RcWare Vision Function Overview (sentence with highly technical content)

English original text:

The SQL database is open for 3rd party programs so that the RcWare Vision station can be used as a data integrator, providing actual values e.g. over an OPC server, and history readouts over a SQL database or/and automatic export files.

Czech human translation:

Databáze SQL je pro ostatní programy otevřená, takže stanice RcWare Vision může pracovat jako koncentrátor dat z různých systémů a poskytovat aktuální hodnoty např.

přes OPC server a historická data přes databázi SQL a automaticky exportované soubory.

Google Translate:

Databáze SQL je otevřena pro programy 3. stran, takže stanice RcWare Vision může být použit jako datový integrátor a poskytovat aktuální hodnoty např. přes OPC server a historická data přes databázi SQL a / nebo automaticky exportované soubory.

Bing Translator:

Databáze SQL je otevřené pro 3rd strana programy tak, že stanice RcWare Vision slouží jako údaje integrátor, poskytuje skutečné hodnoty například přes OPC server a historie komor přes databázi SQL nebo / a Automatický export souborů.

The last selected sentence is an example of highly technical longer sentence, however, the Google Translate produced a very good translation. There are only a few minor issues (“3. stran” and “a / nebo”) that require correction. Translation provided by Bing Translator contains more problematic segments: untranslated ordinal number (“3rd”), incorrectly used subject verb agreement (“databáze je otevřené”), and causelessly capitalized word “Automatický”.

Download 168.63 Kb.

Share with your friends:
1   2   3   4   5   6   7   8

The database is protected by copyright © 2024
send message

    Main page