NASA uses text analytics to bolster aviation safety

NASA relies on numeric data to explain what incidents occur, but text data from incident reports and warning signals reveals why they do.

Airline travel has a tendency to up the stress level of most passengers because of its lack of legroom, meager food options and long lines. And then there's the waiting and hurrying and wondering if that checked bag made the connection.

Travelers expect these experiences when they arrive at the departure area of a major airport. But something they may not be aware of is the amount of data generated by their flight and how it's being collected, mined and analyzed to improve aviation safety. While the majority of it is structured, some of the most important flight data exists in text form, generated through written reports, audio recordings and warning signals.

"It's no secret that the national airspace is very safe," said Ashok Srivastava, principal scientist for data sciences at NASA and lead for the System-Wide Safety and Assurance Technology (SSAT) Project. He illustrated his point by presenting the annual fatal accident rate from 1959 to 2006, which shows a steady decline in the almost five decades.

"[B]ut the question is," he continued, "how can it be made safer, particularly in the context of increasing the amount of flights?"

More on text analytics

Businesses seek new ways to visualize text

kCura uses text analytics to master e-discovery

To keep production running, R&d firm employs text analytics, enterprise search

Srivastava, a presenter at the eighth annual Text Analytics Summit held earlier this summer, said flight growth could be by as much as 2% to 3% per year, depending on the health of the economy. Based on reports received by its Bureau of Transportation Statistics, the U.S. Department of Transportation recorded 10 million commercial flights from national and international carriers operating within the country in 2011 alone. A 2% increase would equate to tens of thousands of additional flights annually, impacting airport traffic.

"Given that context, if we can look for ways to improve safety, we can maintain this low accident rate," he said. "That's a big issue that NASA and others are working on right now. … The whole idea here is that we would like to proactively manage risk."

Text analytics helps to answer questions

One of the ways Srivastava and his team dissect how unusual events lead to accidents is to map them along a continuum, which begins with a safe state of operations. The team is particularly interested in the continuum's "anomalous state," which indicates something different enough is happening to warrant attention but also marks a final fork in the road for an operation to either return to a safe state or turn into an accident.

"Text," he said, "starts to play a crucial role in understanding what's going on."

The team analyzes operational data to discern what happened in the anomalous state, but they also turn to flight and ground crew reports. Those documents can shed light on causal factors or why something happened, Srivastava said.

It's similar to the financial industry, he explained. Numeric data can reveal what stocks are rising or falling on a second-by-second basis and by how much, but media reports push beyond the numbers and explain why those stocks are rising or falling.

"Figuring out why something happened is difficult to do with numeric data alone," he said.

As an example, Srivastava described an incident at JKF International Airport from a little over a year ago, when EgyptAir Flight 986 and Lufthansa Flight 411 almost collided in a "runway incursion," which could have been a catastrophic event, he said.

Figuring out why something happened is difficult to do with numeric data alone.

Ashok Srivastava, principal scientist for data sciences, NASA

"But why did this happen? What was going on? How can we analyze the data associated with it?" he asked. A part of the investigation included analyzing the audio recordings between the pilots and the control tower, which helped reveal that EgyptAir had made a wrong turn. NASA is in the early stages of analyzing audio recordings, which it does by transcribing the recordings into text.

Knowing the what and the why, Srivastava and his team aim to push beyond causal factors and into the world of prediction where they'll be able to stop similar instances from happening in the future.

"This is of critical importance to the safety of today's systems," he said, "and, most importantly, for tomorrow's systems."

Moving into prediction

A traditional approach when performing data analysis is to analyze the numeric and text data separately, and then combine the results for a big picture perspective.

"That's an approach a lot of people have taken, and I think it's valuable," Srivastava said. "But the approach we're taking is really different from that."

They've rigged a system to analyze all of the data together by pushing it into one place, or what's called the kernel. That includes network data from radar and satellite systems; aircraft data from the software; engines and sensors; and text data from the ground and flight crews. The analysis is completed in a single framework using a one-class support vector machine, which is a machine learning model used to detect patterns.

"[B]y putting all of the data together at once and simultaneously analyzing it," he said, "we can get very high accuracy rates as far as making predictions on real systems such as aviation systems."

Data, like text, can be analyzed in different ways using a variety of methods, such as algorithms and visualizations, which enables NASA and other agencies to keep tabs on what's trending, whether certain events are increasing or decreasing, and what might lead to possible runway incursions.

Srivastava compared this to the proactive stance businesses take when driving their own growth or mitigating their own risk: Like the business community, NASA and other agencies are attempting to develop hypotheses, strategies and techniques that can be implemented to help identify problems before they occur.

"And this whole system is dependent on analytics," he said.

Dig Deeper on Text analytics and text mining