Seeking to use machine learning to advance the public good, a Fordham graduate student applied it to the data on blood tests for lead given to New York City children—and found a testing shortfall in some high-risk neighborhoods.

The study published last month in the Journal of Urban Health shows that the child populations in some neighborhoods are not being tested as completely as they should be, said Khalifa Afane, a student in the M.S. program in data science who wrote the study with his advisor, Juntao Chen, Ph.D., an assistant professor in the computer and information science department.

For the study, they used the city’s publicly available lead testing data, which he said “nobody has analyzed before” at the neighborhood level.

A Toxic Heavy Metal

Lead is a toxic heavy metal that can cause learning disabilities and behavior problems. Children pick it up from lead-based paint or contaminated dust, soil, and water. Lead exposure risk “remains persistent” among vulnerable groups including low-income and non-Hispanic Black children, the study says.

Khalifa Afane
Khalifa Afane with his research poster the Graduate School of Arts and Sciences Research Day last spring.

The city promotes blood lead level testing and awareness of lead poisoning in high-risk communities through a variety of educational efforts and partnerships.

But some high-risk neighborhoods still don’t get enough testing, Afane said.  A case in point is Greenpoint in Brooklyn vs. South Beach in Staten Island. The study says that despite similar numbers of children and similar rates of lead testing, Greenpoint has consistently averaged eight times more cases—97 out of 3,760 tests conducted in 2021, compared to just 12 in South Beach that year (out of 3,720 tests).

There should actually be more testing of children in Greenpoint, Afane said, because their risk is clearly higher. While testing efforts have expanded in the city, he said, “it matters much more where these extra tests were actually conducted,” since lead is more prevalent in some neighborhoods than in others, he said.

More than 400 Cases May Have Been Missed

For the study, he analyzed test result data from 2005 to 2021, focusing on children under 6 years old who were found to have blood lead levels of 5 micrograms per deciliter. Afane applied a machine learning algorithm to the testing data and projected that another 410 children with elevated blood lead levels might be identified per year citywide, mostly in vulnerable areas, by expanding testing in neighborhoods that tend to have higher case rates.

The highest-risk neighborhoods are in Brooklyn, Queens, and the north shore of Staten Island, and average about 12 cases per 1,000 tests, compared to less than four in low-risk neighborhoods, Afane said.

The city helps coordinate care for children with elevated levels and also works to reduce lead hazards. Since 2005, the number of New York City children under 6 years old with elevated blood lead levels has dropped 93%, a city report says.

Using a Data-Informed Strategy

But the study recommends a better, data-informed, strategy to focus more lead testing on high-need areas. “What we wanted to highlight here is that this needs to be done and reported at the neighborhood level, not at the city level,” Afane said.

The study also recommends awareness campaigns in high-risk areas emphasizing early detection, and it calls on local authorities to step up monitoring of water quality and blood lead levels in pregnant women.

“Our main goal was to use data science and machine learning tools to genuinely improve the city,” Afane said. “Data analysis is a powerful skill that could be used much more often to make a positive impact in our communities.”

Share.

Chris Gosier is research news director for Fordham Now. He can be reached at (646) 312-8267 or [email protected].