Google Translate and the Struggle for Accurate Machine Translations

Using a pioneering approach to machine translation (MT), search behemoth Google now provides translations from 52 languages through its Google Translate service. Google has capitalized on its access to unfathomable amounts of data, largely in the form of transcripts from the proceedings at the United Nations, which have been rendered into some 23 languages by professional human translators. Google Translate trawls this invaluable source of data, along with text from the Google Books scanning project and additional Internet resources, for likely translation matches. Internet users access the tool tens of millions of times each day to translate information as they surf the web.

While Google Translate has made impressive strides in our ability to understand and communicate with the rest of the world, what do the future prospects look like for the service and other machine translation programs? According to the leader of Google’s machine translation team, Franz Och, “This technology can make the language barrier go away.” Other linguistics experts contend that MT will strengthen linguistic diversity by freeing the world from the need to focus on dominant languages such as English. Ironically, one potential consequence of the widespread use of tools like Google Translate is decreased incentive for individuals to learn English and/or become multilingual.

Though some experts claim that Google Translate’s results will better with time, researchers and computer scientists working on the project note that the system is unlikely to dramatically improve with the addition of more data. “We are now at this limit where there isn’t that much more data in the world that we can use,” notes Andreas Zollmann, a Google Translate team member, “so now it is much more important again to add on different approaches and rules-based models.”

Of course, detractors state that regardless of the technological advances made, machine translation will never learn to pick up on the cultural undertones and subtleties at play in language. Jokes, idioms and wordplay are largely lost on Google Translate, which fails to capture the “flavor” of the text. According to author Douglas Hofstadter, “There is no attempt at creating understanding, and therefore Google Translate is doomed to the same kind of failure forever.”

The Spanish Language in Brazil

The popularity of Spanish as a foreign language continues to grow in Brazil, the only Portuguese-speaking nation on a continent dominated by Spanish. Brazil shares a border with seven Spanish-speaking countries, and it conducts a substantial amount of trade with countries where Spanish is spoken (1/4 of exports and 1/5 of imports).

A significant number of non-Brazilian Spanish speakers, estimated at about 1 million people, call the nation home, mostly as the result of immigration from surrounding countries. Sephardic Jews – who speak both Ladino and Spanish – settled in Brazil and now compose a small portion of the country’s Spanish-speaking peoples.

With an eye toward more fully integrating Brazil with its Spanish-speaking neighbors and partners in the South American trade bloc Mercosur, the Brazilian Congress passed an education bill in 2005 requiring all secondary schools to offer Spanish as a second language. This legislation spurred an increase in resources dedicated to Spanish, and the number of Brazilian students studying español has increased from one million to five million in a period of just five years. A recent agreement between Spain’s Cervantes Institute, an organization devoted to promoting the Spanish language worldwide, and the Brazilian Ministry of Education provides for the training of 26,000 Spanish teachers to manage the increased demand sparked by the 2005 bill.

Spanish in the World

With approximately 400 million native speakers worldwide, Spanish is currently the second most widely spoken language, just behind Mandarin Chinese and ahead of English and Hindi/Urdu. Spanish is also the second most commonly used language on the Internet, trailing English. Most linguistic studies indicate that English, Spanish and Chinese will dominate as the languages of international communication and commerce in the 21st century.

The image of the Spanish language seems to have undergone a makeover in the last few years, resulting in its growth as a language of international communication. Many now view Spanish as a practical, useful language thanks to its demographic power. The use of the language in over 20 countries as well as its foothold in key places such as the United States provides incentive for people to learn Spanish as an investment in their professional futures, especially in the case of young people.

The Spanish language continues to grow at an astounding rate in the United States. Each year more than 1.5 million new speakers join the ranks. Brazil has also seen tremendous growth in the number of students choosing to study Spanish. The governments of countries such as Brazil, the Philippines, France and Italy have invested in high-quality Spanish language education for their citizens, recognizing the growing impact and importance of the language.

The Future of Spanish in the United States

Many scholars and researchers agree that the Spanish language’s future in the United States looks promising, though it seems highly unlikely that Spanish will unseat English as the nation’s dominant language. There are those, however, who argue that Spanish faces the possibility of diminishing influence in the United States over time, the result of a language shift seen previously in other immigrant groups.

A study commissioned by Hispanic USA, a market resource firm targeted at Latinos, estimates that by the year 2025, the tally of Spanish-speaking Latinos in the U.S. will climb to some 40 million. The study challenges the notion that the use of Spanish will decline as future generations of Latinos are born and raised in this country in the coming years. The authors of the Hispanic USA study claim that, unlike other immigrant groups, those born in the U.S. to Latino parents will continue to speak Spanish in exceptionally large numbers.

In separate but related analysis, Linguist Steve Schaufele notes: “Given the current health of the U.S. Hispanic community and the level of its emotional investment in its distinctive culture, I would say that American Spanish as one of the principal vehicles of that culture has an excellent chance of surviving indefinitely.”

In addition, the pool of new Spanish-speaking immigrants favors the continued importance of the language in the United States. Over half of the legal immigrants who arrive annually hail from Spanish-speaking countries, and the percentage is even higher for undocumented immigrants.

With the emergence of the Internet as a tool for international communication, the increasing growth and importance of the global economy, and the sheer number of Spanish speakers worldwide, there are more chances to use the language in the United States and greater economic incentives to retain and promote the use of Spanish.

However, some are not as optimistic about the future of the Spanish language in the U.S. They point to conflicting studies that reveal a failure on the part of second- and third-generation Hispanics to preserve the language, a trend which is gradually diminishing the pool of Spanish speakers. Additionally, recent studies by sociologists indicate a rapid shift to English among the children of immigrants.

Only time will tell the fate of the Spanish language in the United States, but it appears that, by most accounts, Spanish is here to stay.

Welsh language influence on English

From arctic birds to nicknames, the influence of Wales on the English language has been underestimated, says a Celtic Studies expert.

Compilers of the new online version of the Oxford English Dictionary (OED) say penguin, Taffy and cariad are examples of Welsh words adopted by English.

Poet Dylan Thomas is also responsible for 635 entries, they said.

Prof John Koch of the University of Wales said: “The two languages have lived side-by-side for 1,500 years.”

The OED, first published in 1884, this week relaunched itself online.

It claims to be the only English dictionary that tries to trace the first known use of every sense of every word in the English language.

And to prove the point its compilers have pointed to the number of entries that originated from Welsh.

The earliest recorded use of ‘penguin’ can be traced back to Wales, they said.

Apparently in spite of the fact that most penguins have black heads, the OED’s compilers said Welsh coined the term from pen meaning head and gwyn meaning white.

OED quotes the first written citation from 1577: “Infinite were the Numbers of the foule, wch the Welsh men name Pengwin & Maglanus tearmed them Geese.” [sic]

According to the OED the word ‘Taffy’, a nickname for a Welshman, has its roots in the pronunciation of Dafydd, it says.

‘Cariad’, a Welsh term of affection, is referenced as far back as the 13th century, from caru, meaning to love or woo.

Edmund Weiner, deputy editor of the Oxford English Dictionary, said Dylan Thomas was one of the most cited authors in the OED Online.

“His rich use of language has resulted in being acknowledged as the source of words and phrases such as ‘moochin’, a difficult or disagreeable person.

“The term to ‘prodnose’, meaning to pry or be inquisitive, is taken from Quite Early One Morning.”

‘Before English’

Prof John Koch of the University of Wales Centre for Advanced Welsh and Celtic Studies in Aberystwyth, said the influence of Welsh on the English language is surprising to many, but should not be.

“Before English was there, Welsh was there,” he said, “and you cannot say that about any other language that English has had contact with.

“The two languages have lived side-by-side for 1,500 years, so it shouldn’t be surprising.”

Prof Koch said that the English language behaved in many ways that could not be attributed to other Germanic languages and its contact with Welsh may be the reason.

“There has been an underestimation from the beginning of the Welsh component in English,” he said. “It probably isn’t massive like that of French or Latin. It’s more under the surface.”

Prof Koch said there were historical and political reasons behind the lack of credit given to the influence of Welsh.

He explained: “In the universities in which people studied the language most people who compiled the dictionaries in the first place did not know a lot about Wales, so it would not have been something they looked for.”

Prof Koch added that many proper names in England came from Wales and most of the names of major rivers in Britain are pre-English.

“That’s something that is well-known by experts but tends to be otherwise overlooked, but the influence of Welsh on English may yet come into its own as a subject.”

Source: BBC News

English words that derive from Welsh

  • Penguin – from pen and gwyn
  • Cwtch – to cuddle
  • Merchet – the Welsh word for daughter, merch, became an English term for a dowry
  • Cariad – commonly used by English speakers in Wales for sweetheart
  • Source: OED Online

Some History of the Welsh Language:

Welsh is a Celtic language, closely related to Cornish and Breton. The Welsh we speak today is directly descended from the language of the sixth century.
Until the mid-19th century, the majority of the Welsh population could speak Welsh – over 80%. Over the past centuries several factors have affected people’s usage of the language – these are some of the most prominent factors:
The 1536 and 1542 Acts of Union: The passing of the 1536 and 1542 Acts of Union made English the language of law and administration of Government. Although the Welsh language was not banned, it lost its status, and brought with it centuries of steady linguistic decline.
Translation of the Bible in 1588 by Bishop William Morgan: This was a great boost to the language because it ensured that Welsh was the language of religion and worship, and kept the language alive within communities.
18-19 Century Industrial Revolution: This caused the biggest collapse in Welsh speakers because of the huge influx of people into the industrial areas. Number of Welsh speakers fell to 50% of the population.

This decline continued through the Twentieth Century for several reasons:

• migration patterns from rural to urban areas in search of work
• inward migration of English speakers to rural areas
• increased availability of English-language news and entertainment media
• a general secularization of society, leading to a decline in chapel attendance, on which so many traditional Welsh-medium activities were centered.

Present:

The 2001 Census shows that 20.8% of the population of Wales said that they could speak Welsh. Analysis, maps and briefing papers for the 2001 Census can be found in the publications library of the Welsh Language Board site. The next Census will take place in 2011 and it is likely that the results will be announced during 2013.

Source: Welsh Language Board

Read How may people speak Welsh?

The RAE Discards Some Proposed Spanish Spelling Reforms

The Real Academia Española (RAE) recently announced a number of proposed changes to spelling conventions, causing quite a stir throughout the Spanish-speaking world. Many academics and writers scoffed at the RAE’s plans to introduce these spelling reforms, which, according to some, were bound to create unnecessary confusion.

Following the academy’s recent meeting in Guadalajara, Mexico, linguists approved an 800-page document describing the various newly adopted reforms. The following proposed changes – some of the most hotly debated – were discarded by the RAE’s 22 academics:

»Writers may choose whether the word “sólo” as well as demonstrative pronouns such as “éste” or “ésa” carry an accent. Previously, the RAE had suggested that the accent be eliminated.

»Respecting the fact that the names of the letters “b,” “v,” “w,” and “y” vary among different Spanish-speaking countries, the RAE dismissed the suggestion of assigning just one name to these letters. For example, the name of the letter “b” will continue as be alta,” “be larga,or simply “be,” while the letter “y” will retain its historic designation as “i griega” alongside the newly admitted name “ye.”

The rest of the previously announced reforms remain in effect.

Medical Terminology: Prefixes and Suffixes in English and Spanish

Doctors and other medical professionals communicate information about their patients using medical terminology, the language of health care. A medical term is composed of 1) a root word, 2) a prefix, a group of letters attached to the beginning of the root word, and/or 3) a suffix, a group of letters attached to the end of the root word. Since virtually all prefixes and suffixes used in English and Spanish medical terminology are derived from Latin and Greek, the two sets of terms are extremely similar in many cases. Some would argue that the complexities of medical terminology are akin to those of a foreign language, but with a bit of knowledge and understanding of prefixes and suffixes, the vocabulary of medicine is greatly simplified.

We recently added a medical terminology section to the Transpanish website that includes prefixes and suffixes in both English and Spanish. Feel free to bookmark the page as a resource!

Some of the prefixes and suffixes you will find in our page:

gynec/o
-scopy
endo-
-gram

Visit our Glossary section for Medical Glossaries.

Project Management and Translation Vendors

As a Project Manager you will be coordinating multiple projects – each project will have a minimum of two outside vendors (translator and editor) with the possibility of many more.

Your vendors will usually work off-site as independent freelance vendors who have agreed to work with your agency on an independent basis. Your agency in turn has agreed to work with the vendor and has ensured that they have completed the appropriate tax forms, signed the Confidentiality Agreement, submitted their resume, references and details of their past experience and in some cases have taken an evaluation test. The Purchase Order you issue to the vendor will act as the agreement for the job.

Keep in mind that your vendors will likely have entered into similar agreements with other agencies so you will be competing for their time.

Vendors you will be working with in a translation project

Translators will have the task to take the written text that your client provides and rewrite it in their native language, staying as faithful to the source text, source format and provided reference as possible. The translators you hire should be native speakers of the target language with subject area knowledge and they should have translation experience.

Editors will have the task of polishing the translation and making the language flow as smoothly as possible. They should also be responsible for confirming that the translation is complete and for verifying consistency of terms and adherence to any supplied reference or glossaries. The editors you hire should be native speakers with subject area knowledge and they should have translation experience.

Proofreaders focus on the details. They need to ensure that all text is faithfully reproduced. Though their knowledge of the target language can help verify the quality of the translation – or alert you to problems, proofreaders must be reminded that they are not to re-translate the text. If there are problems, the Project Manager should be told and the PM is responsible for contacting the translator and editor and developing the recovery plan. Ideally proofreading will be done by internal staff working closely with the PM.

Typesetters will be responsible for laying out the approved translation into the client-supplied source layout file. They will need to have the appropriate software application and good knowledge of typesetting in foreign languages. Remember language conventions vary! The Project Manager must be responsible for supplying them with the final source file and the translated file for typesetting. As clients can update files in the middle of the process and since the translation process involves multiple people, but sure to keep a close eye on the versions and always send the correct versions to the typesetter. If changes occur during the Desktop Publishing phase, be sure to communicate any changes to the typesetter and discuss it with them to make sure all instructions are clear.

Skills to look for when contracting translators and editors:

  • Native speakers
  • Subject area experience
  • Experience with Translation Memory software (Trados, Wordfast, etc)
  • Up-to-date on technology
  • They should be willing to do basic research as necessary for a project (projects requiring extensive research should have the research phase included in the work flow both for scheduling and cost)
  • They should ask questions when needed and should point out problems in the source text when they find them
  • They should produce accurate and complete translations, while adhering to their deadlines
  • They should deliver on time and alert you to any potential delays as soon as they are aware of them

Also read Project Management in the Translation Industry.

New Spanish Spelling Reforms from the RAE

Spanish Spelling Rules Get a Makeover

Change is coming to Spanish orthographic conventions courtesy of the Real Academia Española (RAE), the organization that defines Spanish language standards. Last week, the RAE announced a number of planned changes prepared by 22 linguists from both Spain and Latin America. If all goes well, the changes to the Spanish language will be officially adopted on November 28 at the academy’s next meeting in Guadalajara, Mexico.

The following is a summary of some of the most important changes that are about to be implemented:

»The letters “ch” and “ll” have been considered a part of the Spanish alphabet since the 19th century, but no more. The Spanish alphabet will now consist of 27 letters.

»The names of the letters “b,” “v,” “w,” and “y” previously varied among different Spanish-speaking countries. The RAE seeks to further unify the language by assigning just one name to these different letters, e.g. the name of the letter “b” will change from “be alta” or “be larga” to simply “be.”

»The accent will be eliminated from the word “sólo” except in cases where its omission may lead to ambiguity. Previously, “sólo” was used to distinguish between the adverbial form of the word meaning “only” and the adjectival form “solo” meaning “alone.” Demonstrative pronouns such as “éste” or “ésa” will also cease to carry an accent.

»The RAE plans to eliminate “q” when it is used to represent the phoneme “k.” As such, Iraq will be written as “Irak” and quórum will become “cuórum.”

» Prefixes such as “ex” and “anti” will be joined to the word they precede. For example, ex-husband will appear as “exmarido” instead of “ex marido,” as it is currently written. Prefixes will continue to be written with a space when they precede two words, as in the case of “pro derechos humanos.”

»Words such as guión, huí, riáis, Sión o truhán will be considered monosyllabic, and therefore, will no longer be accented.

»The conjunction “o” used to be written with an accent when it appeared between two numbers (e.g. 3 ó 4) to avoid confusion with 0, but this rule will be eliminated.

Please read The RAE Discards Some Proposed Spanish Spelling Reforms for the latest changes.

Related Posts:
New Inclusive Grammar Guidelines from the Real Academia Española
Dirae: The Latest Tool to Search for Terms in Spanish

When was the first Spanish Grammar Book published?

In 1492, Antonio de Nebrija published Gramática de la lengua castellana, the first grammar book of the Spanish language. Works had previously been published on Latin usage, such as Lorenzo Valla’s De Elegantiis Latinae Linguae (1471), but Gramática was the first book to focus on the study of the rules of a Western European language besides Latin.

Digital version of Gramática de la lengua castellana.