Deep learning helps predict lung cancer

WRITTEN BY: Kathryn DeMuth Sullivan

A study from Penn State Great Valley researchers describes how a deep learning model could be used to predict the survival expectancy of patients with lung cancer. This knowledge could help medical caretakers navigate complex decision-making processes when determining the best treatment options for patients. Their findings are published in the International Journal of Medical Informatics.

"This is a high-performance system that is highly accurate and is aimed at helping doctors make these important decisions about providing care to their patients," comments Youakim Badr, who is an associate professor of data analytics. "Of course, this tool can't be used as a substitute for a doctor in making decisions on lung cancer treatments."

The model that Badr and colleagues have developed demonstrated over 71% accuracy in predicting survival expectancy of lung cancer patients; that is 10% more accurate than other machine learning models demonstrated previously. The model analyzes large quantities of data concerning information such as types of cancer, size of tumors, speed of tumor growth, and demographics to predict patient survival.

"Deep learning is a machine-learning algorithm that makes associations between the data, itself, and the labels that we use to describe the data examples," explains Badr. "By making these associations, it learns from the data."

Robin G. Qiu, a professor of information science and engineering, adds that deep learning goes a step farther than machine learning. "It improves performance tremendously. In deep learning we can go deeper, which is why they call it that. In traditional machine learning, you have a simple structure of layers of neural networks. In each layer, you have a group of cells. In deep learning, there are many layers of these cells that can be architected into a sophisticated structure to perform better feature transformation and extraction, which gives you the ability to further improve the accuracy of any model."

Such a model is necessary to process such large amounts of data like those that the team analyzed from the Surveillance, Epidemiology, and End Results (SEER) program, which includes data on nearly 35% of U.S. cancer patients. All in all, SEER has roughly 800,000 to 900,000 entries on cancer patients in the U.S. Deep learning provides the tool needed to sort through all that data.

"One of the really good things about this data is that it covers a large section of the population and it's really diverse," concludes first study author Shreyesh Doppalapudi. "Another good thing is that it covers a lot of different features, which you can use for many different purposes. This becomes very valuable, especially when using machine learning approaches."

The team hopes to continue their investigations in order to improve the model’s accuracy and expand its capabilities to other kinds of cancer. "The accuracy rate is good so far -- but it's not perfect, so part of our future work is to improve the model," said Qiu.

Sources: International Journal of Medical Informatics, Eureka Alert

About the Author

Kathryn DeMuth Sullivan

Bachelor's (BA/BS/Other)

Kathryn is a curious world-traveller interested in the intersection between nature, culture, history, and people. She has worked for environmental education non-profits and is a Spanish/English interpreter.