Accès gratuit
Pré-publication électronique
Dans une revue
Publié en ligne 15 mai 2020

© SFRP, 2020

1 Introduction

Lung cancer remains the most common cancer and the leading cause of cancer-related deaths in China and worldwide (Ferlay et al., 2010; Chen et al., 2016). Currently, approximately 60–70% of patients receive radiation therapy during the course of disease. Radiotherapy has both radical and palliative effects and plays an important role in lung cancer (Cannon et al., 2013). Due to the inflammation of normal lung tissues, radiation pneumonitis (RP) can occur after radiotherapy and is the most significant dose-limiting toxicity, affecting 5–36% of patients (Medhora et al., 2012; Wang et al., 2012). Therefore, patients with thoracic cancer are unable to finish the entire treatment course because of RP. A previous study found that RP was an independent negative prognostic factor in non-small-cell lung cancer (NSCLC) patients (Farr et al., 2015). Currently, the underlying mechanisms of RP remain to be understood. Numerous factors, including smoking status, pulmonary function, radiation dose, and irradiated lung volume, might have an association with RP (Das et al., 2007Tsoutsou et al., 2006; Schallenkamp et al., 2007). However, these results are not sufficiently informative about the underlying mechanisms of RP development. Increasing evidence indicates that genetic and molecular factors might play a critical role in the development of RP and clinical outcomes of NSCLC (Li et al., 2016; Du et al., 2018; Wen et al., 2018). Therefore, the discovery of genetic pathways is needed to understand the underlying mechanisms of RP.

Due to the popularity of global communication and convenience of the internet, the biomedical literature in public databases such as PubMed has grown exponentially (Hunter et al., 2006). The manual collection of relevant published information on RP is an extremely time-consuming task. Furthermore, selecting specific genetic pathways among mountains of publications requires much labor, and some publications will be missed due to carelessness. Text mining (TM) adopts many algorithms to analyze correlations and statistical patterns in unstructured text to obtain specific information. In recent years, the TM technique could simultaneously process large-scale information extraction and discover knowledge (Simpson et al., 2012) and has been successfully applied in medical research. For instance, the TM technique was used in cancer research, toxicogenomics, and cancer drug effects/safety (Korhonen et al., 2012; Zhu et al., 2013; Harpaz et al., 2014; Lee et al., 2014; Baker et al., 2016).

In this study, we used TM, pathway analysis, and database analytical tools to identify genetic pathways potentially related to RP. We applied a proven natural language processing methodology and other computational resources (e.g., STRINGdb, GeneCodis) to perform in silico analysis (Shim et al., 2014). Biological knowledge can be categorized and classified using Gene Ontology (GO) hierarchical relationships (Andronis et al., 2011). Combined with TM, sets of genes can be further prioritized by assessing their impact on protein interaction networks. This technique is useful because it reveals novel relationships through the analysis of gene interactions that tend to cluster together in networks because of certain properties of the disease or pathological condition (Liu et al., 2014). This study aims to investigate the genetic pathways of RP by using computational methods to mine a list of high priority target genes with publicly available biological data.

2 Materials and methods

2.1 Text mining

We performed a search in PubMed ( using the terms “radiation”, “pneumonitis”, and “lung cancer”. All abstracts were retrieved and input into R software (version 3.4.4). The jieba package ( was used for word segmentation (the R script is provided in supplementary file S1). We then extracted all the unique gene hits from each result as the starting point for the next steps.

2.2 Biological process and pathway analysis

GeneCodis ( is a web-based tool for interpreting genomic data by integrating various sources of GO and functional information (Nogales-Cadenas et al., 2009; Tabas-Madrid et al., 2012). The gene list from TM was used for enrichment analysis in GeneCodis. We performed a query with genes regarding the GO biological processes (BPs) involved. We selected the most highly enriched terms closely related to RP. Then, the genes in the most highly enriched terms were analyzed in GeneCodis with annotations of the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways.

2.3 Protein interaction network

Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) ( is also a web-based tool for analyzing the protein-protein interaction network of the selected genes. The STRING database integrates TM in PubMed, experimental/biochemical evidence, co-expression networks, and database association to provide an interactive platform in which the connections, associations, and interactions between proteins can be assessed (Szklarczyk et al., 2015). All the genes selected from the pathway enrichment analysis were used as the input set. To further narrow the candidate gene field, a confidence level of ≥ 0.90 was set within STRING to display gene interactions (Szklarczyk et al., 2015).

3 Results

Based on the data mining strategy described in Figure 1, 1099 articles were retrieved from PubMed. Then, the abstracts were input into R for TM, and a list of 256 genes were identified to be related to the search terms “lung”, “radiation” and “pneumonitis” (Supplementary Tab. S1).

The GO BP annotations revealed that the most highly enriched terms were particularly relevant to the search terms “lung”, “radiation” and “pneumonitis”, validating the utility of the TM search. The enriched BP annotations resulted in 47 sets of annotations containing a total of 156 unique genes. Table 1 shows the top 15 enriched BP annotations (Tab. 1). The three most enriched BP annotations were positive regulation of gene expression (P = 6.80E-07), blood coagulation (P = 9.60E-07), and signal transduction (P = 1.22E-06), containing 10, 18, and 29 genes from the query set, respectively (Tab. 1). Signal transduction is broadly defined as a BP. Hence, the higher gene representation and relatively low P value make this process particularly relevant to RP. Other highly enriched BP annotations included negative regulation of apoptotic process, cell adhesion, and acute-phase response.

The analysis of enriched pathway annotations resulted in the selection of 24 pathways containing a total of 41 unique genes when combined (Tab. 2). The three most significantly enriched pathways were focal adhesion (P = 5.22E-10), Jak-STAT signaling pathway (P = 8.00E-08), and complement and coagulation cascades (P = 2.98E-07), containing 13, 10, and 7 genes from the query set, respectively. Additional highly enriched relevant pathways were pathways in cancer, apoptosis, non-small-cell lung cancer, and chemokine signaling pathway (Tab. 2). STRING combined data from current database, gene fusion, gene co-expression, protein homology, and biological experiments to determine the protein-protein interaction. The analyses yielded 23 genes (Fig. 2). Of these 23 genes, PIK3CG, PIK3CB, PIK3CD, PIK3CA, and PIK3R5 were PI3K isoforms; SERPINC1 and PDGFRB were molecules upstream of PI3K pathways; CREBBP was downstream molecule of PI3K pathways.

thumbnail Fig. 1

Overall data mining strategy.

Table 1

Summary of the top 15 biological processes from gene set enrichment analysis.

Table 2

Summary of the KEGG pathways from gene set enrichment analysis.

thumbnail Fig. 2

High-confidence protein-protein interaction network of the 23 target genes. Connecting line colors indicate the type of information used to infer the interaction with a confidence interval set at 90%. “Sky blue lines” represents data from curated databases; “Purple lines” represents experimental determined; “black lines”represents co-expression; “Orange lines” represents protein homology.

4 Discussion

RP and subsequent radiation pulmonary fibrosis are the two main dose-limiting factors in thoracic radiation that can have severe implications for patients’ quality of life, especially when combined with tyrosine kinase inhibitors, immunotherapy, or chemotherapy. To explore the detailed mechanism of RP, the present study identified a list of 256 genes related to the search terms “lung”, “radiation” and “pneumonitis” by TM. Further analyses identified 23 genes representing a network following gene set enrichment analysis. Of these 23 genes, PI3Ks constituted the most important part. Five genes (PIK3CG, PIK3CB, PIK3CD, PIK3CA, and PIK3R5) were PI3K isoforms; three genes (SERPINC1, PDGFRB, and CREBBP) were molecules upstream or downstream of PI3K pathways.

PI3Ks participate in various BPs by phosphorylating the 3-hydroxyl group on cellular membrane phosphoinositides and were demonstrated to participate in the pathogenesis of the inflammatory response to injury by regulating several determinant events in inflammation (Hawkins et al., 2015). Zhang et al. showed apoptosis in irradiated normal lung tissue correlated with inhibition of downstream PI3K signaling levels (Zhang et al., 2012). Radiation can induce a robust inflammatory response that contributes to the development of lung fibrosis and the resultant late morbidity from lung radiation (Rubin et al., 1995). Several studies showed PI3K pathways play important roles in pulmonary fibrosis (Miyoshi et al., 2013; Yan et al., 2014). Tsoyi et al. found syndecan-2 attenuates radiation-induced pulmonary fibrosis in mice by down-regulating PI3K signaling (Tsoyi et al., 2018). In addition, in the pathogenesis of RP, lung epithelial cells transdifferentiate into fibroblast-like cells by epithelial-to-mesenchymal transition (EMT) (Yarnold et al., 2010). The PI3K/AKT signaling pathway is a critical mediator of EMT (Barber et al., 2015). Tang et al. found that genetic variations in PIK3CA were significantly associated with the occurrence of severe RP (HR = 0.132, 95% CI: 0.042–0.416, P = 0.001) (Tang et al., 2016). Several studies have found the correlation between radiation-induced injury and PI3K pathway, while these correlations were commonly weak and indirect. Further well-designed study was still needed to explore the value of PI3Ks in the development of RP.

This study combined TM and bioinformatic approaches to explore the molecular mechanisms of RP. The lack of validation by further experiments is a major drawback of this study. It also has certain limitations that need to be addressed. Some key genes or pathways were commonly described in the abstracts, even though they were negative, and these genes obtained from PubMed might be irrelevant to RP. Some potential genes only in the full text but not in the abstract might be missed. In addition, bioinformatics stems from the current databases, which may have limited information on genes regarding their function or role in a pathway. Additionally, not all gene interactions have been fully elucidated.

5 Conclusion

We presented a method to discover the genes/pathways relevant to RP. This method may be used routinely at intervals as databases and analytical tools evolve and improve. In the present analysis, we identified 23 genes, of which the PI3K family might have a correlation with RP. In fact, our group has established a RP model in mice. We proposed some of PI3K gene family would change along with development of RP and lung tissue at different time point after ionizing radiation would be obtained to determine expression of PI3K signaling proteins. Of course, further experimental results are needed to validate the hypothesis.

Conflicts of interest

The authors declare that they have no conflicts of interest in relation to this article.


This work was supported by the Science and Technology Project of Hangzhou Bureau (2018A20 and 2018A33), Scientific research funds of Zhejiang province health department (2017KY532), Social development project of Hangzhou Municipal Science and Technology Commission (20170533B95), National Nature and Science Foundation of China (81803042) and Natural Science Foundation of Zhejiang Province (LQ17H160003 and LQ20H160020). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.


  • Andronis C et al. 2011. Literature mining, ontologies and information visualization for drug repurposing. Brief. Bioinform. 12: 357–368. [CrossRef] [PubMed] [Google Scholar]
  • Baker S et al. 2016. Automatic semantic classification of scientific literature according to the hallmarks of cancer. Bioinformatics 32: 432–440. [CrossRef] [PubMed] [Google Scholar]
  • Barber AG et al. 2015. PI3K/AKT pathway regulates E-cadherin and desmoglein 2 in aggressive prostate cancer. Cancer Med. 4: 1258–1271. [Google Scholar]
  • Cannon DM et al. 2013. Dose-limiting toxicity after hypofractionated dose-escalated radiotherapy in non-small-cell lung cancer. J. Clin. Oncol. 31: 4343–4348. [CrossRef] [PubMed] [Google Scholar]
  • Chen W et al. 2016. Cancer statistics in China, 2015. CA Cancer J. Clin. 66: 115–132. [CrossRef] [PubMed] [Google Scholar]
  • Das SK et al. 2007. Predicting lung radiotherapy-induced pneumonitis using a model combining parametric Lyman probit with nonparametric decision trees. Int. J. Radiat. Oncol. Biol. Phys. 68: 1212–1221. [CrossRef] [PubMed] [Google Scholar]
  • Du L et al. 2018. GSTP1 Ile105Val polymorphism might be associated with the risk of radiation pneumonitis among lung cancer patients in Chinese population: A prospective study. J. Cancer 9: 726–735. [CrossRef] [PubMed] [Google Scholar]
  • Farr KP et al. 2015. Inclusion of functional information from perfusion SPECT improves predictive value of dose-volume parameters in lung toxicity outcome after radiotherapy for non-small cell lung cancer: A prospective study. Radiother. Oncol. 117: 9–16. [Google Scholar]
  • Ferlay J et al. 2010. Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int. J. Cancer 127: 2893–2917. [CrossRef] [PubMed] [Google Scholar]
  • Harpaz R et al. 2014. Text mining for adverse drug events: The promise, challenges, and state of the art. Drug. Saf. 37: 777–790. [CrossRef] [PubMed] [Google Scholar]
  • Hawkins PT et al. 2015. PI3K signalling in inflammation. Biochim. Biophys. Acta 1851: 882–897. [CrossRef] [PubMed] [Google Scholar]
  • Hunter L et al. 2006. Biomedical language processing: What’s beyond PubMed? Mol. Cell 21: 589–594. [CrossRef] [PubMed] [Google Scholar]
  • Korhonen A et al. 2012. Text mining for literature review and knowledge discovery in cancer risk assessment and research. PLoS One 7: e33427. [CrossRef] [PubMed] [Google Scholar]
  • Lee M et al. 2014. Of text and gene – using text mining methods to uncover hidden knowledge in toxicogenomics. BMC Syst. Biol. 8: 93. [Google Scholar]
  • Li P et al. 2016. Single nucleotide polymorphisms in CBLB, a regulator of T-Cell response, predict radiation pneumonitis and outcomes after definitive radiotherapy for non-small-cell lung cancer. Clinical Lung Cancer 17: 253–262.e5. [CrossRef] [PubMed] [Google Scholar]
  • Liu H et al. 2014. Integrating in silico resources to map a signaling network. Methods Mol. Biol. 1101: 197–245. [CrossRef] [PubMed] [Google Scholar]
  • Medhora M et al. 2012. Dose-modifying factor for captopril for mitigation of radiation injury to normal lung. J. Radiat. Res. 53: 633–640. [CrossRef] [PubMed] [Google Scholar]
  • Miyoshi K et al. 2013. Epithelial Pten controls acute lung injury and fibrosis by regulating alveolar epithelial cell integrity. Am. J. Respir. Crit. Care Med. 187: 262–275. [CrossRef] [PubMed] [Google Scholar]
  • Nogales-Cadenas R et al. 2009. GeneCodis: Interpreting gene lists through enrichment analysis and integration of diverse biological information. Nucleic Acids Res. 37: W317–W322. [CrossRef] [PubMed] [Google Scholar]
  • Rubin P et al. 1995. A perpetual cascade of cytokines postirradiation leads to pulmonary fibrosis. Int. J. Radiat. Oncol. Biol. Phys. 33: 99–109. [CrossRef] [PubMed] [Google Scholar]
  • Schallenkamp JM et al. 2007. Incidence of radiation pneumonitis after thoracic irradiation: Dose-volume correlates. Int. J. Radiat. Oncol. Biol. Phys. 67: 410–416. [CrossRef] [PubMed] [Google Scholar]
  • Shim JS et al. 2014. Recent advances in drug repositioning for the discovery of new anticancer drugs. Int. J. Biol. Sci. 10: 654–663. [CrossRef] [PubMed] [Google Scholar]
  • Simpson MS et al. 2012. Biomedical text mining: A survey of recent progress, Mining text data. Springer, pp. 465–517. [Google Scholar]
  • Szklarczyk D et al. 2015. STRING v10: Protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43: D447–D452. [CrossRef] [PubMed] [Google Scholar]
  • Tabas-Madrid D et al. 2012. GeneCodis3: A non-redundant and modular enrichment analysis tool for functional genomics. Nucleic Acids Res. 40: W478–W483. [CrossRef] [PubMed] [Google Scholar]
  • Tang Y et al. 2016. Genetic variants in PI3K/AKT pathway are associated with severe radiation pneumonitis in lung cancer patients treated with radiation therapy. Cancer Med. 5: 24–32. [Google Scholar]
  • Tsoutsou PG et al. 2006. Radiation pneumonitis and fibrosis: Mechanisms underlying its pathogenesis and implications for future research. Int. J. Radiat. Oncol. Biol. Phys. 66: 1281–1293. [CrossRef] [PubMed] [Google Scholar]
  • Tsoyi K et al. 2018. Syndecan-2 attenuates radiation-induced pulmonary fibrosis and inhibits fibroblast activation by regulating PI3K/Akt/ROCK pathway via CD148. Am. J. Respir. Cell Mol. Biol. 58: 208–215. [CrossRef] [PubMed] [Google Scholar]
  • Wang D et al. 2012. Functional dosimetric metrics for predicting radiation-induced lung injury in non-small cell lung cancer patients treated with chemoradiotherapy. Radiat. Oncol. 7: 69. [CrossRef] [PubMed] [Google Scholar]
  • Wen J et al. 2018. Potentially functional variants of ATG16L2 predict radiation pneumonitis and outcomes in patients with non-small cell lung cancer after definitive radiotherapy. J. Thorac. Oncol. 13: 660–675. [Google Scholar]
  • Yan Z et al. 2014. Reviews and prospectives of signaling pathway analysis in idiopathic pulmonary fibrosis. Autoimmun. Rev. 13: 1020–1025. [Google Scholar]
  • Yarnold J et al. 2010. Pathogenetic mechanisms in radiation fibrosis. Radiother. Oncol. 97: 149–1461. [Google Scholar]
  • Zhang Y et al. 2012. Oxidative stress mediates radiation lung injury by inducing apoptosis. Int. J. Radiat. Oncol. Biol. Phys. 83 740–748. [CrossRef] [PubMed] [Google Scholar]
  • Zhu F et al. 2013. Biomedical text mining and its applications in cancer research. J. Biomed. Inform. 46: 200–211. [CrossRef] [PubMed] [Google Scholar]

Cite this article as: Zhu L, Zhang J, Xia B, Chen S, Xu Y. 2020. Identification of potential molecular mechanisms of radiation pneumonitis development in non-small-cell lung cancer treatment by data mining. Radioprotection,

Supplementary Material

Supplementary File S1.

Supplementary Tab. S1.

(Access here)

All Tables

Table 1

Summary of the top 15 biological processes from gene set enrichment analysis.

Table 2

Summary of the KEGG pathways from gene set enrichment analysis.

All Figures

thumbnail Fig. 1

Overall data mining strategy.

In the text
thumbnail Fig. 2

High-confidence protein-protein interaction network of the 23 target genes. Connecting line colors indicate the type of information used to infer the interaction with a confidence interval set at 90%. “Sky blue lines” represents data from curated databases; “Purple lines” represents experimental determined; “black lines”represents co-expression; “Orange lines” represents protein homology.

In the text

Les statistiques affichées correspondent au cumul d'une part des vues des résumés de l'article et d'autre part des vues et téléchargements de l'article plein-texte (PDF, Full-HTML, ePub... selon les formats disponibles) sur la platefome Vision4Press.

Les statistiques sont disponibles avec un délai de 48 à 96 heures et sont mises à jour quotidiennement en semaine.

Le chargement des statistiques peut être long.