Fordham’s Clavius Distinguished Professor of Science, Frank Hsu, Ph.D., likes to testify to the extraordinary precision of the Gregorian Calendar being used today; he said it’s probably close to being 99.9 percent accurate.
The Gregorian Calendar, it just so happens, was developed in the 16th century by the namesake of Hsu’s academic chair, Christopher Clavius, S.J., to prevent Catholic holidays from drifting relative to the seasons of the year (it earned the blessing of Pope Gregory XIII, who also blessed it with his name).
Like Father Clavius, accuracy is Hsu’s specialty. As director of Fordham’s Laboratory of Informatics and Data Mining in the Department of Computer and Information Science, Hsu’s research focuses on developing intelligent systems to help researchers turn data and information into knowledge and wisdom.
With advancing technology causing an explosion in the ease and variety of data collection, the field of informatics is finding applications across a wide array of academic and business disciplines.
“We are generating so much data in today’s world, but it is not clean,” Hsu said. “How do we process that into something people understand? That’s informatics.”
Problem 1: What Can You Do When You Are in a Karaoke Restaurant
Trying to Order a Song, But You Only Know the First Few Notes?
Many data searches, Hsu explained, are done by converting a piece of information, such as the first phrase of a song, into a digital “query” that can effectively search a larger database. Internet technology lends itself to such projects.
Hsu and his students converted short musical phrases into digital information and sent out queries to a database of thousands of digitized songs. The outcome, he said, was “accurate and efficient,” leading to an accuracy rate of 87 percent, considered extremely high in complex classification problems.
“Computer information systems are never perfect,” he explained. “You might buy a computer or router and the salesman says it’s 99.9 percent reliable. But it’s very rarely that perfect. It’s only perfect relative to something else. In our lab, we try for maximum accuracy.”
The query concept is also used in the latest biomedical research. In a recent study, Hsu applied computer-aided drug design, a method of simulating drug-receptor interactions, and did virtual screenings to find useful drug candidates among millions of chemical compounds. This process effectively enables pharmaceutical companies to accurately find a pool of the right potential compounds to treat any particular disease by inexpensively eliminating millions of “possibilities.”
“Companies can’t possibly try every chemical combination,” he said. “That’s why you need information science. It will eliminate the impossibilities and pick, for example, the best 50—then the company will try those top 50.”
Problem 2: Do Crowds Have Wisdom?
Hsu’s laboratory applies a method called “information fusion” to devise and improve outcomes in fields such as management, finance and business, in essence by combining information. One combination method is called “decision fusion.”
For example, Hsu looked at the way that judges in international skating competitions combine their scores—assigning a number on a scale of 1 to 10, where 10 is the perfect score. He determined that there is a deficiency in their methodology.
“If some judge is biased in favor of his or her own player, even when you drop the highest and lowest scores, there’s still something you can’t take care of,” Hsu said. “It’s the human element of manipulation of the score numbers.”
By crunching the numbers on 10 skaters, Hsu discovered that if you use a rank combination rather than a score combination, the order of the top three skaters may change. Rank combination (1st, 2nd, 3rd, etc.) is less easily manipulated by a judge than is score combination, Hsu says.
“By rank, it becomes a fairer process,” Hsu said. “So apparently collective decisions are not always wise decisions. This is actually one of the most fundamental problems in information science.”
Hsu’s research concluded that consensus opinions are only better than individual opinions when certain conditions prevail: First, those individuals being combined for consensus all must be “very good.” Second, they all have to “think differently.”
“Otherwise, you might get a decision like the Bay of Pigs,” Hsu said, referring to President John F. Kennedy’s ill-fated attempt, on recommendation by advisors, to invade Cuba in 1961.
Problem 3: Do Customers Who Buy Cookies also Buy Milk?
A second component to “information fusion” is “data fusion,” which is also related to data mining. This is a bit like mining for gold, as scientists try to uncover knowledge that they didn’t expect to have. As an example, Hsu explained that the 7-Eleven stores in Japan collect detailed records from their sales. The company checked for patterns within those sales and discovered not only the expected cookie-milk correlation, but a unique and surprising correlation: people who bought diapers also bought beer.
“Data mining can lead to results that go against your intuition,” Hsu said. “But why is it important? Because it can show fundamental societal changes and it can help retailers understand purchasing patterns.”
Having lived in big cities such as Taipei, Tokyo, New York and Boston, Hsu’s interest in information science evolved from his interest in maps and transportation systems. “I have always been interested in traffic converging and diverging in a complex network,” he said.
Hsu served as the chairman of the CIS department at Fordham for 12 years, and is associate dean of the Graduate School of Arts and Sciences. He is also a co-founder of the Journal of Interconnection Networks and has served as editor-in-chief. Last semester Hsu taught an undergraduate course in biomedical informatics.
The fields of biomedicine and informatics, he said, have come together (most notably in the Human Genome Project) to further knowledge of living systems. To help position Fordham in the field, Hsu is involved in spearheading the University’s initiative to create an interdisciplinary science program in bioinformatics.
“We know more about living systems than we used to because of science and technology, but there’s still more to know,” he said. “Informatics can help.”