This article originally appeared on the BeyeNETWORK
Mention grid computing and most people still think of SETI (Search for Extra-Terrestrial Intelligence), the long-running scientific experiment that uses Internet-connected computers to look for intelligent life somewhere else in the universe. Anyone can participate by running a free program that downloads and analyzes radio telescope data – massive amounts of it.
However, the concept of grid computing is an important one that eventually will come into its own and provide powerful platforms for application-hungry users, especially in the public sector. To make the point, the Government Information Technology Executive Council (GITEC) featured an interesting panel at its Information Processing Interagency Conference (IPIC) in Orlando, Florida, in March that provided a unique opportunity to explore the future of grid computing in the federal government. The session was chaired by Gary Danoff (NetApps) and featured panelists from industry and government that included Dan Kent (Cisco), Charles Church (Department of Homeland Security) and myself.
One of the general conclusions from the panel was that the concept of a federal IT grid is bound to happen. There are already several efforts within the Department of Defense, especially the Global Information Grid (GIG) effort engineered by the Defense Information Systems Agency (DISA). Hence, it is not so much “whether” but “when” it will happen. That said, there seem to be enough other issues surrounding grid computing that substantial delays will probably be encountered due to a reluctance of agencies to "lose control of their infrastructure."
We believe it will eventually happen in the same way that enterprise architecture (EA), especially service-oriented architecture (SOA) is happening. It will be de rigueur in order to enable orderly progress in each agency's effective and efficient operations. Funding will then follow the agencies’ willingness to buy into the equivalent of an outsourcing or ASP (application service provider) model for their basic IT infrastructure. It seems difficult to turn down the idea of infrastructure-on-demand. In terms of “what’s in it for me,” this is a very strong argument.
Grids offer a way to solve computation-intensive problems such as protein folding, financial modeling, earthquake simulation, or climate and weather modeling. In these types of problems, there is often no alternative to grid computing where we can pool the IT resources of many enterprises in order to be able to obtain solutions in a reasonable time frame.
Functionally, one can classify grids into several types. First, there are the computational grids – the ones we have been discussing – that focus primarily on computationally intensive operations. However, there are also data grids that provide for the controlled sharing and management of large amounts of distributed data; hence, storage is the resource on demand. Lastly, there are equipment grids that have a primary piece of equipment, such as a telescope, where the surrounding grid is used to control the equipment remotely and to analyze the data produced.
Grid computing offers a way of using IT resources optimally inside an organization. It also provides a means for offering IT as a utility bureau for commercial and non-commercial clients, with those clients paying only for what they use as with electricity or water.
It might be interesting to think of the advantages being touted with infrastructure-as-service and how the grid can facilitate this, especially as we focus on the issue of storage. The New York Times recently ran a story (3/6/07) blaring, “There's not enough storage space to hold it all.” It was in reference to the results of an IDC study that concluded that in the year 2006, the world had produced 161 exabytes of data. This is, apparently, the information equivalent of 3 million times all the books ever written and would amount to piling 12 stacks of books from the Earth to the Sun. The even more interesting conclusion, however, was tied to the fact that the estimates for global data generation in 2010 are for 988 exabytes, while we are expected to produce only 601 exabytes of storage that same year.
The more serious challenges will come on the security and transparency side, especially when dealing with the massive amounts of sensitive data we're talking about in the applications that run the Census, Medicare, the IRS or some of the applications that will be necessary to operate on the brontobytes of future databases around the intelligence and homeland security fronts.
If we think about the need for knowledge exploration of the massive amount of structured and unstructured data (signal, text, audio, video, imagery) being generated today, we can start to visualize that it can probably only happen in a distributed environment that is fairly open. That means that the grid will follow suit in order to enable the exploration of huge knowledge spaces.
There will have to be a significant amount of knowledge management tools attached to a federal IT grid to accomplish the level of security and transparency required. Possibly a good application to explore will be knowledge exploration. Danoff had suggested possibly making the National Archives and Records Administration (NARA) a “content warehouse” for the whole government. But, ultimately, the job of analysts in the future will be very different than that of today because of the explosion in sources of data, volume, type, structures, codes and protocols. We're dealing with exabytes that must be explored by the analyst – I don't mean just intelligence analysts, think clinical, financial, business, budget or any kind of analyst – and the process of exploration of these knowledge spaces may be an ideal one for the federal IT grid.
Clearly a federal computing grid will be required to be secure, reliable, fair and transparent. It all boils down to “trust.” If you have no ability to control that infrastructure, then very significant concerns arise on the trustworthiness of the person and/or organization that operates the grid. Trust is the essential factor in this equation. If you are going to rely on having a third party run your mission-critical applications, trust is going to be an absolute prerequisite. That means that whoever is going to be running the grid must demonstrate goodwill, encapsulate the interest of others and be competent to protect and manage your data.
Trust is a big issue today, especially in government. The general public, it seems, does trust the postal service (USPS) to deliver their mail (83%), but only 19% of the people trust NSA and 21% the CIA, according to Larry Ponemon, from the Ponemon Institute.
How do we get a start at operating a federal grid in these conditions? It might be through the use of vehicles the government IT community is already familiar with such as service level agreements (SLAs). There might also be joint teams to manage the grid infrastructure such as when the military conducts joint or combined operations and they refer to “purple” activities as opposed to Army green, Navy white or Air Force blue.
There are going to be questions about who owns the data as well as the applications; and there are going to be political, cultural and human adoption factors that might necessitate what NetApps' Gary Danoff, the IPIC panel chair, calls “change ombudsmen." I think this is a wise observation since so much of what will be needed is exactly at that level in order to overcome resistance to change and not-invented-here issues.
Dr. Barquin is the President of Barquin International, a consulting firm, since 1994. He specializes in developing information systems strategies, particularly data warehousing, customer relationship management, business intelligence and knowledge management, for public and private sector enterprises. He has consulted for the U.S. Military, many government agencies and international governments and corporations.
Dr. Barquin is a member of the E-Gov (Electronic Government) Advisory Board, and chair of its knowledge management conference series; member of the Digital Government Institute Advisory Board; and has been the Program Chair for E-Government and Knowledge Management programs at the Brookings Institution. He was also the co-founder and first president of The Data Warehousing Institute, and president of the Computer Ethics Institute. His PhD is from MIT. Dr. Barquin can be reached at firstname.lastname@example.org.
Editor's note: More government articles, resources, news and events are available in the BeyeNETWORK's Government Channel. Be sure to visit today!