This article originally appeared on the BeyeNETWORK.
In a recent TIME Magazine article about Iraq there was a disturbing quote. It regarded the amount of captured documents that military intelligence could not analyze since they couldn’t be translated. “You should see the warehouse in Qatar where we have this stuff,” said a high-ranking former military official. “We’ll never be able to get through it all.”
This highlights the importance of translation to business intelligence and the public sector.
As I have pointed out in previous articles, there is a clear transformation afoot in the business intelligence space. One of the critical factors leading to this transformation is the convergence of structured and unstructured data as sources for analysis. In the past, business intelligence work had been fairly restricted to the structured data in relational databases. These databases could in some way be manipulated with OLAP tools or spreadsheets. With the advent of the Internet and the availability of very large amounts of unstructured data, mainly in text format, the world of business intelligence has now changed forever. And like the spoken word it captures, text comes in many different languages.
Most experts agree that there are close to six thousand different languages spoken worldwide today. Although many of these have no formal writing systems, they are legitimate languages used for communication. Even if we reduce the number substantially by limiting it to languages most relevant for international trade or politics, we are still dealing with hundreds of languages. From these languages, significant information may be captured and disseminated in the form of unstructured data sources. These unstructured data sources apply to analysts and knowledge workers.
The federal government has been duly preoccupied with translation for two very different, yet equally important, reasons. One is outbound, or the need to disseminate government-produced information; and the other inbound, or the need to analyze information produced externally. The net result is that the translation effort in the federal government, while already considerable, will become much larger in the coming years.
Appropriately, significant attention has been paid to the need for translating foreign language “chatter” to combat terrorism. But the current key driver, in terms of magnitude, is Executive Order 13166 (8/11/2000). This order requires agencies to provide meaningful access to government information for Limited English Proficiency (LEP) citizens. While federal agencies have worked hard to fulfill E.O. 13166, they will likely face increasing pressure from Congress and the White House to better serve the increasing number of LEP citizens.
What exactly is Executive Order 13166? It states “…each federal agency shall examine the services it provides and develop and implement a system by which LEP persons can meaningfully access those services…” and “…ensure programs and activities they normally provide in English are accessible to LEP persons and thus do not discriminate on the basis of national origin…” These issues are handled in the Civil Rights Division of the Department of Justice. The IRS estimates that 76 percent of its LEP population is Spanish-speaking, after which the demand drops off significantly: Chinese (4%), Korean (2.5%), Vietnamese (2%) and Russian (0.5%). Hence, one can safely assume that the vast majority of the information made available by the U.S. government will be translated into Spanish.
We should primarily focus, however, on the need to analyze information from external sources in other languages. This is the more relevant factor for business intelligence. While there are numerous important tools for text mining almost independently of language, there must also invariably be some level of translation necessary to help knowledge workers do their job. (Though we are focusing on the English language, this commentary applies to any other “mother tongue.”) Because we have a long way to go, this area offers a very significant opportunity for learning.
We should briefly examine the translation industry. In many ways, the U.S. translation business is a cottage industry. While it is expected to be a $5.7 billion business by 2007, the industry is comprised of thousands of individual translators. This also includes over 9,000 mainly small companies.
The translation industry was relatively unknown in the U.S. until September 11, 2001. Since then, the PATRIOT Act required the CIA Director to investigate the prospect of creating a National Virtual Translation Center. The federal government’s needs for translating foreign languages are currently addressed through contract services to support the war on terror. These needs may vary from year to year, but emphasis on languages like Arabic, Urdu, Pashto and Dari are obvious, given our involvement in Iraq and Afghanistan.
The three main types of sources for translation are web content, voice, and documents. Most federal agencies seem to use three main translation approaches. Whereas some agencies have employees translate the necessary documents, other agencies with contract certified individual translators or companies, often using the GSA Schedule. (The Federal Supply Schedule 738 II, Language Services, provides contract support for translation, interpretation, and language training and/or educational material. The top five customers for the GSA Language Schedule are: the Department of Defense, the Department of the Interior, the Department of the Treasury, the Department of Justice and the Social Security Administration.)
Lastly, some agencies, such as the CIA, the Drug Enforcement Agency (DEA) and the Department of State, are increasingly using automated translation software. The FBI Languages Services Section, for example, has built the Law Enforcement & Intelligence Agency Linguistic Access System (LEILA). This is now operated by the National Virtual Translation Center. LEILA provides a web interface to a comprehensive database of language specialists, including detailed information about language skills and experience. Furthermore, LEILA is accessed by a number of law enforcement, intelligence, homeland security and defense agencies. Such agencies include the DEA, the U.S. Citizenship and Immigration Services (DHS) and the CIA.
While there is still a long way to go, a great deal of progress has been made since the beginning machine translation. This is evident from an old, yet appropriate, anecdote stated in the beginning of automated translation. When the English sentence, “The spirit is willing but the flesh is weak,” was converted to Russian, it read, “The vodka is good but the meat is rotten.”
Today, the industry has actually produced very sophisticated machine translating software. Some of this software was developed by large U.S. manufacturers in response to the need for non-English user manuals. Other tools have been developed to serve international institutions like the European Union, which must translate documents into the national languages of its member countries. Finally, other software addresses the need of bilingual nations, such as Canada (French and English) or Belgium (French and Flemish), to maintain truly bilingual national platforms for all public affairs.
We have only begun to address the issue of translation in the federal government. Earlier this year, a special interest group tried to sue the U.S. Department of Health and Human Services over its E.O. 13166 translation policy. Similarly, a bill pending before the United States Senate would create the position of a foreign language director—or “czar”—to oversee a national foreign language strategy.
Where does this ultimately leave us in terms of business intelligence? The fact is that text mining has become a critical need in business intelligence, particularly in the federal government. As the volume of unstructured text increases exponentially, so will our need to translate between English and numerous other languages. I will discuss this topic further in future columns.
Dr. Barquin is the President of Barquin International, a consulting firm, since 1994. He specializes in developing information systems strategies, particularly data warehousing, customer relationship management, business intelligence and knowledge management, for public and private sector enterprises. He has consulted for the U.S. Military, many government agencies and international governments and corporations.
Dr. Barquin is a member of the E-Gov (Electronic Government) Advisory Board, and chair of its knowledge management conference series; member of the Digital Government Institute Advisory Board; and has been the Program Chair for E-Government and Knowledge Management programs at the Brookings Institution. He was also the co-founder and first president of The Data Warehousing Institute, and president of the Computer Ethics Institute. His PhD is from MIT. Dr. Barquin can be reached at email@example.com.
Editor's note: More government articles, resources, news and events are available in the BeyeNETWORK's Government Channel. Be sure to visit today!