| Numéro |
Radioprotection
Volume 60, Numéro 3, Juillet-Septembre 2025
|
|
|---|---|---|
| Page(s) | 211 - 220 | |
| DOI | https://doi.org/10.1051/radiopro/2024056 | |
| Publié en ligne | 15 septembre 2025 | |
Article
Producing nuclear disaster prevention materials with artificial intelligence chatbots: comparison of ChatGPT-3.5, Copilot, and Gemini output with google search results
1
School of Nursing, Kitasato University, 1-15-1, Kitasato, Minami-ku, Sagamihara-city, Kanagawa 252-0373, Japan
2
University hospital Medical Information Network (UMIN) Center, The University of Tokyo Hospital, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8655, Japan
3
Department of Health Communication, School of Public Health, Graduate School of Medicine, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113-8655, Japan
* Corresponding author: shinyai@nrs.kitasato-u.ac.jp
Received:
19
August
2024
Accepted:
24
November
2024
Objective: To compare the understandability, actionability, and readability of AI chatbot-generated text and webpage text about nuclear disasters. Methods: In this cross-sectional study, we compared the understandability, actionability, and readability of ChatGPT-3.5, Copilot, and Gemini generated texts and web page sentences about radiation. The keywords related to radiation were extracted using Google Trends. A Google search was performed using the extracted keywords, and the top 8 pages for each keyword were extracted. The AI chatbot generated two types of sentences: normal level and 6th grade level. The Japanese version of the Patient Education Materials Assessment Tool (PEMAT-P) was used to rate the understandability and actionability of each text. The higher the score, the higher the perceived ease of understandability and actionability and the cutoff for both was set at 70%. The jReadability was used to quantitatively assess the readability of Japanese texts. Results: With regard to understandability, Copilot (n = 22, 71.0%) and Gemini (n = 26, 92.9%) 6th grade level texts had significantly higher percentages of 70 or higher, while Google Search had a significantly lower percentage of 70 or higher (n = 58, 32.8%; p < .05). Gemini at the normal level (n = 69, 55.2%) and Copilot (n = 74, 55.6%) and Gemini (n = 73, 56.2%) at the 6th grade level had significantly higher percentages of very readable to somewhat difficult responses (p < .05). Conclusions: The Japanese sentences generated by the AI chatbot were easier to read than the Google search results; the prompt of 6th grade level improved the readability of Japanese sentences. Thus, the AI chatbot can be an effective tool to promote understanding of radiation disaster prevention.
Key words: nuclear power / patient education / ChatGPT / Copilot / Gemini
© S. Ito et al., Published by EDP Sciences 2025
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1 Introduction
On March 11, 2011, many people were exposed to radiation as a result of the accident at TEPCO’s Fukushima Daiichi Nuclear Power Station (Tsubokura et al., 2012; United Nations Scientific Committee on the Effects of Atomic Radiation, 2015). At the time of the nuclear accident, information related to radiation and radioactivity was scarce, and explanations by experts and others were difficult for the public to understand, leading to anxiety and emotional distress among local residents (Rubin et al., 2012). In Japan today, nuclear disaster prevention pamphlets for local residents have been prepared in each region (Cabinet Office and Fire and Disaster Management Agency, 2013). Ito et al. evaluated the quality of Japanese language nuclear emergency preparedness pamphlets available free of charge online and reported that 61.2% of the pamphlets were easy to understand and 49.1% were easy to act upon according to the Patient Education Materials Assessment Tool (PEMAT) (Ito et al., in press). However, they reported that 88.8% of these nuclear disaster preparedness pamphlets were written in a manner that required reading comprehension skills capable of comprehending of substantially technical text. Given that the Internet is reported to be a useful method of distributing information to the general public in emergency situations such as nuclear disasters (Kanda et al., 2014), it is important to create nuclear disaster preparedness information that is easily understood by local residents (Gauntlett et al., 2019; Goto et al., 2018; Hellier et al., 2014; Ito et al., 2017; Ohno and Endo, 2015).
In recent years, artificial intelligence (AI) chatbots such as ChatGPT, Copilot and Gemini have been released, and various studies have been conducted in the medical field (Ayers et al., 2023; Decker et al., 2023; Pan et al., 2023). In the area of nuclear disaster prevention, the report suggests that AI chatbots in radiological emergency response could serve as a decision support tool for humans (Chandra and Chakraborty, 2024). It is recommended that patient information materials be readable at a 6th to 8th grade level or lower (Centers for Disease Control and Prevention (U.S.) et al., 2009; Cotugna et al., 2005; Weiss, 2003). However, it did not assess whether the information produced by AI chatbots is easier to understand and act upon than that on a web page. Moreover, the previous study did not assess the reading level of the text. Therefore, it is not certain whether AI chatbots can be a useful source of information in the event of a nuclear disaster.
The aim of this study was to evaluate the understandability, actionability, and readability of texts related to nuclear disasters produced by an AI chatbot, and to investigate whether it could be a useful source of information in the event of a nuclear disaster. We also investigated whether the AI chatbot could produce texts that were easy to understand at the 6th grade level. To evaluate the AI chatbot output, we compared it to Google Search results. It is important to note that AI chatbots may generate content that includes misinformation, and verifying the reliability of the information they provide is crucial. Additionally, it is essential to understand that AI chatbots do not bear responsibility for the outcomes or interpretations arising from the generated content. This study does not assess the reliability of the generated text but focuses solely on its understandability, actionability, and readability.
2 Methods
2.1 Information generation and website selection
In this study, we conducted a systematic quantitative content analysis of online materials based on a cross-sectional study. From March 25 to April 25, 2024, web pages were searched using the Google search engine, and text was generated by ChatGPT-3.5 (Chat Generative Pretrained Transformer, model 3.5; Open AI, San Francisco, CA, USA), Microsoft Copilot (Microsoft Corporation, WA, US), and Google Gemini (Alphabet Inc., CA, US) AI chatbot. In addition, Ito had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. The keyword selection process was as follows. First, Google Trends was used to search for the top 25 most frequently searched keywords in Fukushima Prefecture during the week of March 11 to March 18, 2011. Second, among those top 25 keywords, we extracted five keywords related to the nuclear power plant accident: “nuclear power plant,” “radioactivity,” “radiation,” “nuclear power,” and “Chernobyl.” Third, Google Trends was used to search for these five keywords in Fukushima Prefecture during the week of March 11 to March 18, 2011, and the top 25 most frequently searched words for each of these five keywords were extracted. The reason for the two-stage keyword extraction using Google Trends is that the single-stage keyword extraction process extracts a large number of keywords related to the Great East Japan Earthquake and a small number of keywords related to the nuclear power plant accident. In addition, proper nouns such as company and individual names were excluded.
First, we describe the procedure for Google. The extracted keywords were entered into Google Search and evaluated in the order of search engine results. Search results were sorted in the order of keyword relevance, so those with lower rankings (those appearing lower on the search results page) were considered to have lower relevance to the keywords. According to a previous study, when the general public uses search engines to search for health-related information, the average number of web pages viewed is about eight, and 66% of users reported viewing fewer than five documents (Jansen and Spink, 2006). Based on this, eight web pages were extracted per keyword. Among the search results, videos, advertisements, broken links, inappropriate content, sites requiring subscriptions or fees, external link lists only, numbers only, and professional pages were excluded from the evaluation. Next, we describe the sentence generation procedure for the AI chatbots. Two prompts were used for sentence generation: “Tell me about that keyword” (normal level) and “Tell me about that keyword at a 6th grade level” (6th grade level) . To ensure that information from the search history did not influence the results, the search history was erased before prompting the chatbots. Because search results can differ even when the same search keyword is used, each AI chatbot was prompted for the same keyword four times. Responses that contained obviously incorrect information regarding the search results were excluded from the evaluation.
This study was not subjected to ethical review because it did not involve any human subjects or interventions.
2.2 Evaluators
A researcher in the field of nuclear disaster prevention read each document and independently scored all documents except those whose content was clearly inappropriate. Next, a physician who is not a specialist in radiation or nuclear hazards scored 20% of randomly selected texts from each condition group using the Patient Education Materials Assessment Tool for Printable Materials (PEMAT-P) (Furukawa et al., 2022; Shoemaker et al., 2014). In cases of disagreement, consensus was reached through discussion.
2.3 Understandability and actionability
The understandability and actionability of the content of each website were evaluated using the Japanese version of PEMAT-P (Furukawa et al., 2022; Shoemaker et al., 2014). The Japanese version of PEMAT-P consists of 23 items and assesses the following: “content,” “word choice and style,” “use of numbers,” “organization content,” “word choice and style,” “organization,” “layout and design,” and “use of visual aids.” Evaluators were required to respond to each item by selecting either 0 (disagree) or 1 (agree). These scores ranged from 0% to 100%, with higher scores indicating higher levels of perceived comprehensibility and ease of actionability. The cutoff value was set at 70% for both scores. As for the text generated by AI chatbots, only the fourth result was evaluated.
2.4 Readability
In this study, the jReadability text measurement system was used to quantitatively assess the readability of Japanese text (Lee, 2016; Lee et al., 2023). This index calculates readability based on the average length of a sentence, word difficulty, percentage of grammatical parts of speech, and type of characters per sentence. Scores range from 0.5 to 6.4, with higher scores indicating that the sentences are relatively easy to read: 0.5–1.4, very difficult to read; 1.5–2.4, difficult to read; 2.5–3.4, somewhat difficult to read; 3.5–4.4, neutral; 4.5–5.4, easy to read; and 5.5–6.4, very easy to read. All four results from each AI chatbot were evaluated.
2.5 Statistical analysis
Three types of statistical analyses were performed in this study. First, within each rating scale, a one-factor analysis of variance using Tukey’s method was conducted to compare means, and a chi-square test or Fisher’s exact probability test was used to compare proportions. Residual analysis, which is the posterior analysis of the χ2 test, is significantly different if the value is greater than 1.96 or less than −1.96. Second, the intraclass correlation coefficient was calculated to compute the agreement between evaluators. A significance level of 0.05 was used in this study. All analyses were performed using IBM SPSS (Ver. 25.0 for Windows; IBM SPSS Japan, Tokyo, Japan).
3 Results
The keyword search with Google Trends extracted 19 words from “nuclear power plant,” 8 words from “radioactivity” and “radiation,” 6 words from “nuclear power,” and 2 words from “Chernobyl” (Tab. 1). Of these, the two words extracted from “Chernobyl” were identical to two words extracted from “nuclear power plant,” meaning that this keyword produced no useful results. The search results for “map (Number of deleted keywords: 1),” “dose of radiation (2),” “nuclear power, information (1),” and “Fukushima City, radiation (1)” were excluded because their web pages contained only figures and numbers and almost no text. Keywords containing “breaking news (1)” were excluded because the web page was not currently displayed. Keywords indicating a specific company name were excluded (1) because the web pages were mostly company introductions and publications. Ultimately, 34 words were extracted.
These 34 words were input to the three AI chatbots (ChatGPT-3.5, Copilot, and Gemini) to generate sentences four times for two reading levels (normal level and 6th grade level), and a total of 816 sentences were generated. A total of 48 (5.9%) sentences contained obvious errors, 26 from ChatGPT-3.5 (3.2%), 5 from Copilot (0.6%), and 17 from Gemini (2.1%), and these were excluded from the analysis. Examples of errors included a sentence in which “Fukushima No. 2 Nuclear Power Plant” was mistaken for “Unit 2 of Fukushima Daiichi Nuclear Power Plant,” a sentence in which information on “Onagawa Nuclear Power Plant” was mixed with information on “Fukushima Daiichi Nuclear Power Plant,” and a sentence in which “radiation” and “radioactivity” were confused in their descriptions. Therefore, a total of 768 (94.1%) sentences obtained from 34 keywords were evaluated in this survey.
For the web search using Google, the top 8 web pages in search order for 34 keywords were extracted from the web pages that met the relevant criteria. Of the 272 web pages extracted, 177 pages were included in the analysis, excluding duplicate keywords. The intraclass correlation coefficient (ICC) was calculated to quantify the degree of agreement between items, with ICC= 0.66 for PEMAT-P’s understandability and ICC = 0.40 for actionability.
The results of keyword searches by Google trend.
3.1 Understandability and actionability
Table 2 and 3 show the results of comparisons of scale scores for the AI chatbots and Google Search. Table 4 shows the comparison of PEMAT-P item scores. First, we discuss the percentage of materials that met the cutoff score of 70 or higher in PEMAT-P (Tab. 2). With regard to understandability, Copilot (n = 22, 71.0%) and Gemini (n = 26, 92.9%) 6th grade level texts had significantly higher percentages of 70 or higher, while Google Search had a significantly lower percentage of 70 or higher (n = 58, 32.8%). Conversely, with regard to actionability, there were no materials with a score of 70 or higher from the AI chatbots and only one from Google Search, with no significant differences between the groups. Next, we discuss the results of comparisons among the PEMAT-P scale scores (Tab. 3). For understandability, the 6th grade level sentences from Gemini had the highest scale scores (M = 84.1, SD = 7.4), significantly higher than those of the regular level ChatGPT-3.5 (M = 63.8, SD = 13.3) and Copilot (M = 71.2, SD = 10.9), and the 6th grade level ChatGPT-3.5 (M = 70.4, SD = 8.5) and Google Search (M = 60.8, SD = 17.9). For actionability, the scores were significantly higher for normal level Gemini texts (M = 6.9, SD = 16.3) than for Google Search (M = 1.4, SD = 8.4), but lower for both groups.
Table 4 shows the results of the PEMAT-P item score comparison among AI chatbots and Google Search. The following items from Google Search had significantly lower percentages of applicable materials than the other groups.
Comparison of item scores among artificial intelligence chatbot and web search (categorical variable).
Comparison of Scale Scores for Artificial Intelligence Chatbots and Web Search (Continuous Variables).
PEMAT-P item scores among artificial intelligence chatbot and web search.
3.2 Readability
The percentages of jReadability difficulty levels are given in Table 2. Gemini at the normal level and Copilot and Gemini at the 6th grade level had significantly higher percentages of very readable to somewhat difficult responses. Conversely, ChatGPT-3.5 at the normal level, Copilot, and Google Search had significantly lower percentages of very readable to somewhat difficult responses. The results of the comparison of jReadability difficulty scores (continuous variables) and sentence length are given in Table 3. For the jReadability score, the normal level Gemini (M = 1.7, SD = 0.5) and 6th grade level Copilot (M = 2.7, SD = 1.2) and Gemini (M = 2.9, SD = 0.9) had significantly lower scores than the regular level ChatGPT-3.5 (M = 1.7, SD = 0.5), Copilot (M = 2.1, SD = 1.0), and Google Search (M = 1.5, SD = 3.0).
4 Discussion
In this study, we evaluated the understandability, actionability, and readability of radiation-related texts generated by AI chatbots and investigated whether AI chatbots can be a useful tool for disseminating radiation-related disaster prevention information. The results showed that the sentences produced by the AI chatbots were easier to understand, the difficulty level of the Japanese sentences was lower, and the purpose of the document was clearer than the sentences on web pages from the Google Search results. It was also suggested that when the prompt “Please teach me at a 6th grade level” was included, the difficulty level of the Japanese text often decreased compared to text generated without this prompt. The results of this study of radiation-related text are similar to those of previous studies that evaluated the quality of patient education materials about medical care produced by AI chatbots using PEMAT-P and other methods (Ayoub et al., 2023; Dihan et al., 2024; Musheyev et al., 2024; Pan et al., 2023). However, AI chatbots have the following problems: they cannot present effective diagrams and pictures to explain key concepts, and they produced obvious errors in 48 sentences (5.9% of the material produced), which raises questions about the reliability of the information. Although AI chatbots may be a useful tool to help local residents understand radiation disaster prevention, there are issues with the reliability of information and effective ways to present numbers and images. Further research, including the creation of effective prompts, may be needed to provide more understandable, actionable, and readable materials.
This section describes the results of a comparison between documents created by the AI chatbots and web page documents from Google Search results. In terms of radiation-related terms, the documents generated by the AI chatbots were easier to understand than the web-based documents, and the Japanese sentences were also easier to understand. However, for the normal level ChatGPT-3.5, there was no significant difference from the web page documents in the Google Search results. The results of this study differed from those of previous studies. For example, in terms of text difficulty, a study by Shen et al. that used ChatGPT-3.5 and Google Search to answer patients’ questions about medical practice guidelines reported that ChatGPT-3.5 produced text more difficult to understand than the Google Search results (Shen et al., 2024). In terms of comprehensibility, the study by Shen et al. reported no difference between ChatGPT-3.5 and Google sentences (Shen et al., 2024). Also, a study by Ayoub et al. evaluated the general medical knowledge of ChatGPT (M = 68.2%, SD = 4.4) and Google Search pages (Google: M = 89.4%, SD = 5.9) and reported that Google results were easier to understand (Ayoub et al., 2023). Regarding the differences in the results of previous studies, there may be differences depending on the topic studied. For medical knowledge, there are many Web sites aimed at the general public. While Web pages created by medical institutions can be difficult to understand due to the highly technical nature of the text, many other Web pages, created by companies for example, are reported to be easy to understand (Ito and Furukawa, 2024). However, with regard to nuclear disaster prevention, many nuclear-related materials for the general public have been published online in Japan since the accident at TEPCO’s Fukushima Daiichi Nuclear Power Station, but it is reported that the difficulty level of the texts of online nuclear-related materials is high (Ito et al., in press). Therefore, for the present topic, it is possible that web pages found with Google are more difficult to understand and have a greater reading difficulty than that of the text produced by AI chatbots.
Next, we discuss the effect of giving the prompt, “Please teach me at a 6th grade level.” When this prompt was given, there was no significant change in the AI chatbots’ comprehension of the sentences. However, the addition of the prompt significantly decreased the difficulty level of Japanese for ChatGPT-3.5 and Copilot, and Gemini produced the lowest difficulty level of Japanese with and without the prompt. The finding that the AI chatbots in this study decreased the difficulty of Japanese with the reading level prompt is similar to the findings of previous studies (Cheong et al., 2024; Dihan et al., 2024; Shen et al., 2024). For example, Dihan et al. used ChatGPT-3.5, ChatGPT-4.0, and Google Bard to assess the comprehension and text difficulty of pediatric glaucoma patient education materials (Dihan et al., 2024). Only ChatGPT-4.0 improved comprehension when prompted to respond at a 6th grade level, but the sentence difficulty decreased for all AI chatbots. The study by Shen et al. used ChatGPT-3.5 and Google Search to generate answers to patients’ questions about medical guidelines (Shen et al., 2024). When prompted to respond at a 6th grade level, sentence difficulty decreased more than without prompting. A study by Cheong et al. evaluated patient education materials for obstructive sleep apnea using ChatGPT-3.5 and Google Bard. The results reported that Google Bard was able to produce 5th grade level text (Cheong et al., 2024). These results suggest that, to varying degrees depending on the type of AI chatbot, AI chatbots may be able to reduce the difficulty of sentences by prompting them to produce 5th or 6th grade level documents, but may have difficulty improving comprehension. The sentences produced by the chatbot are patterned compositions and layout designs unless specifically instructed by prompts. Therefore, the prompt “teach me at a 6th grade level” may have decreased the difficulty of the sentences, but may not have improved comprehension unless the composition and layout were changed.
This study had various strengths and limitations. One strength of this study is that it is the first study to numerically evaluate the understandability, actionability, and difficulty of documents produced by AI chatbots related to nuclear power and radiation disaster prevention. It is also the first study to comprehensively extract the most frequently searched radiation-related keywords in Fukushima Prefecture during the week of the Fukushima Daiichi Nuclear Power Station accident. Next, we discuss the limitations of this study. First, the prompt, “Please teach me at a 6th grade level,” was simple but lacked sufficient content to confirm the usefulness of the AI chatbot in obtaining radiation disaster prevention information for the general public. This prompt is simple and did improve readability. However, AI chatbots do not actively display layouts and designs that promote comprehension, charts and summaries, suggestions for action, or explanations of technical terms without prompting. We believe that only those who can enter sophisticated search queries will be able to obtain useful information through the use of AI chatbots and search engines. This could lead to a widening of the human knowledge gap. We believe that prompts should be considered that allow many people to get the maximum benefit from AI chatbots. Second, it is difficult to compare the results for AI chatbots with those of previous studies: new versions of AI chatbots are released every year, and their performance varies greatly depending on the version and type. In recent years, guidelines for papers using AI chatbots and other types of chatbots have been published, but before that, many papers did not specify the time of the study or the version of the AI chatbot. Therefore, it is difficult to know what results regarding the quality evaluation of texts created with an AI chatbot are caused by the prompt, the type of material evaluated, the type of AI chatbot, and the chatbot version. We believe it is important to develop and adhere to guidelines regarding the use of AI chatbots.
Third, there is the possibility of generalization to other languages. Some time after the Fukushima Daiichi Nuclear Power Station accident, many radiation-related materials for the general public were created and published online in Japan. However, immediately after the nuclear accident, there were almost no easy-to-understand radiation-related materials for the general public, and the Web contained a lot of misinformation, making it difficult to judge the appropriateness of the information. Therefore, it may be difficult to use AI chatbots to obtain appropriate radiation-related information in countries and languages where radiation-related information for the general public is not published online. Prompts for translations and summaries by AI chatbots may be important when radiation-related information for the general public is needed. Finally, AI chatbots present challenges in terms of information reliability, as they occasionally produce incorrect information. In this study, 48 (5.9%) of the sentences generated by the AI chatbots contained outright errors. There are prior studies that analyzed academic literature and specialized books using AI tools to build large language models tailored to the medical field, enabling reliable information extraction and generation (Wu et al., 2024). These studies included examples such as the PMC-LLaMA model, which was developed using medical papers and textbooks. While concerns remain about the reliability of text generated by AI chatbots, these studies provide a foundation for generating trustworthy content. Future research should focus on mitigating the risks of misinformation and further improving the accuracy and reliability of generated content.
5 Conclusions
The radiation-related sentences generated by AI chatbots were easier to understand, and the Japanese sentences generated by chatbots were easier to read than the sentences on webpages. We also found that the prompt, “Please teach me at a 6th grade level,” could further reduce the reading difficulty of the Japanese documents. These results suggest that the AI chatbot is an effective tool for promoting the public’s understanding of radiation disaster prevention. However, further research is needed to provide more understandable, practical, and readable materials, including the creation of effective prompts.
Funding
This work was supported by the Japan Society for the Promotion of Science under KAKENHI Grant Number JP 24K06389 and the Program of the Network-type Joint Usage/Research Center for Radiation Disaster Medical Science.
Conflicts of interest
The authors declare that they have no conflict of interest.
Data availability statement
Ito had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.
Aurhor contribution statement
ITO conceived the paper, collected and analyzed data, and wrote the paper; Okuhara, Okada, and Kiuchi conceived and designed the study and revised the paper based on critical peer review; Furukawa conceived the study, collected data, and revised the paper based on critical peer review. All authors have reviewed the manuscript and agreed to accept responsibility for all aspects of the work.
References
- Ayers JW, Poliak A, Dredze M, Leas EC, Zhu Z, Kelley JB, Faix DJ, Goodman AM, Longhurst CA, Hogarth M, Smith DM. 2023. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Internal Med 183: 589–596. [Google Scholar]
- Ayoub NF, Lee YJ, Grimm D, Divi V. 2023. Head-to-head comparison of ChatGPT versus google search for medical knowledge acquisition. Otolaryngol Head Neck Surg 8. https://doi.org/10.1002/ohn.465 [Google Scholar]
- Cabinet Office, Fire and Disaster Management Agency. 2013. Manual for Preparation of Regional Disaster Prevention Plans (Nuclear Disaster Preparedness). Retrieved from https://www8.cao.go.jp/genshiryoku_bousai/keikaku/keikaku.html [Google Scholar]
- Centers for Disease Control and Prevention (U.S.), Office of the Associate Director for Communication, Strategic and Proactive Communication Branch. 2009. Scientific and technical information simply put. Retrieved from https://stacks.cdc.gov/view/cdc/11938/cdc_11938_DS1pdf [Google Scholar]
- Chandra A, Chakraborty A. 2024. Exploring the role of large language models in radiation emergency response. J Radiol Prot 44: 15, Article 011510. [Google Scholar]
- Cheong RC, Unadkat S, McNeillis V, Williamson A, Joseph J, Randhawa P, Andrews P, Paleri V. 2024. Artificial intelligence chatbots as sources of patient education material for obstructive sleep apnoea: ChatGPT versus Google Bard. Eur Arch Otorhinolaryngol 281: 985–993. [Google Scholar]
- Cotugna N, Vickery CE, Carpenter-Haefele KM. 2005. Evaluation of literacy level of patient education pages in health-related journals. J Community Health 30: 213–219. [CrossRef] [PubMed] [Google Scholar]
- Decker H, Trang K, Ramirez J, Colley A, Pierce L, Coleman M, Bongiovanni T, Melton GB, Wick E. 2023. Large language model-based chatbot vs surgeon-generated informed consent documentation for common procedures. Jama Network Open 6: Article e2336997. [Google Scholar]
- Dihan Q, Chauhan MZ, Eleiwa TK, Hassan AK, Sallam AB, Khouri AS, Chang TC, Elhusseiny AM. 2024. Using large language models to generate educational materials on childhood glaucoma. Am J Ophthalmol 265: 28–38. [Google Scholar]
- Furukawa E, Okuhara T, Okada H, Shirabe R, Yokota R, Iye R, Kiuchi T. 2022. Translation, cross-cultural adaptation, and validation of the japanese version of the patient education materials assessment tool (PEMAT). Int J Environ Res Public Health 19. https://doi.org/10.3390/ijerph192315763 [Google Scholar]
- Gauntlett L, Amlôt, R., Rubin GJ. 2019. How to inform the public about protective actions in a nuclear or radiological incident: a systematic review. Lancet Psychiatry 6: 72–80. [CrossRef] [PubMed] [Google Scholar]
- Goto A, Lai AY, Kumagai A, Koizumi S, Yoshida K, Yamawaki K, Rudd RE. 2018. Collaborative processes of developing a health literacy toolkit: a case from Fukushima after the nuclear accident. J Health Commun 23: 200–206. [CrossRef] [PubMed] [Google Scholar]
- Hellier E, Edworthy J, Newbold L, Titchener K, Tucker M, Gabe-Thomas E. 2014. Evaluating the application of research-based guidance to the design of an emergency preparedness leaflet. Appl Ergon 45: 1320–1329. [CrossRef] [PubMed] [Google Scholar]
- Ito S, Furukawa E. 2024. Evaluating the understandability and actionability of online information on anemia in Japanese. Am J Health Educ 1–10. [Google Scholar]
- Ito S, Furukawa E, Okuhara T, Okada H, Kiuchi T. in press Evaluating the overall quality of online information on nuclear power plant accidents in japanese. Radioprotection. [Google Scholar]
- Ito S, Goto A, Ishii K, Ota M, Yasumura S, Fujimori K. 2017. Fukushima mothers’ concerns and associated factors after the fukushima nuclear power plant disaster. Asia Pac J Public Health 29: 151s–160s. [Google Scholar]
- Jansen BJ, Spink. 2006. How are we searching the World Wide Web? A comparison of nine search engine transaction logs. Inf Process Manag 42: 248–263. [Google Scholar]
- Kanda H, Takahashi K, Sugaya N, Mizushima S, Koyama K. 2014. Internet usage and knowledge of radiation health effects and preventive behaviours among workers in Fukushima after the Fukushima Daiichi nuclear power plant accident. Emerg Med J 31: e60–e65. [Google Scholar]
- Lee J. 2016. Readability research for Japanese language education. Waseda Stud Jpn Lang Educ 21: 1–16. [Google Scholar]
- Lee J, Sunakawa Y, Hori K, Hasebe Y. (2023). jReadability PORTAL. Retrieved from http://jreadability.net [Google Scholar]
- Musheyev D, Pan A, Loeb S, Kabarriti AE. 2024. How well do artificial intelligence chatbots respond to the top search queries about urological malignancies? Eur Urol 85: 13–16. [Google Scholar]
- Ohno K, Endo K. 2015. Lessons learned from Fukushima Daiichi nuclear power plant accident: efficient education items of radiation safety for general public. Radiat Protect Dosimetry 165: 510–512. [Google Scholar]
- Pan A, Musheyev D, Bockelman D, Loeb S, Kabarriti AE. 2023. Assessment of artificial intelligence chatbot responses to top searched queries about cancer. JAMA Oncol 9: 1437–1440. [Google Scholar]
- Rubin GJ, Amlôt, R., Wessely S, Greenberg N. 2012. Anxiety, distress and anger among British nationals in Japan following the Fukushima nuclear accident. Br J Psychiatry 201: 400–407. [Google Scholar]
- Shen SA, Perez-Heydrich CA, Xie DX, Nellis JC. 2024. ChatGPT vs. web search for patient questions: what does ChatGPT do better? Eur Arch Otorhinolaryngol 7. https://doi.org/10.1007/s00405-024-08524-0 [Google Scholar]
- Shoemaker SJ, Wolf MS, Brach C. 2014. Development of the Patient Education Materials Assessment Tool (PEMAT): a new measure of understandability and actionability for print and audiovisual patient information. Patient Educ Couns 96: 395–403. [CrossRef] [PubMed] [Google Scholar]
- Tsubokura M, Gilmour S, Takahashi K, Oikawa T, Kanazawa Y. 2012. Internal radiation exposure after the Fukushima nuclear power plant disaster. JAMA 308: 669–670. [Google Scholar]
- United Nations Scientific Committee on the Effects of Atomic Radiation. 2015. Developments since the 2013 UNSCEAR report on the levels and effects of radiation exposure due to the nuclear accident following the great east-Japan earthquake and tsunami. http://www.unscear.org/unscear/publications.html [Google Scholar]
- Weiss BD. 2003. Health Literacy: A Manual for Clinicians. American Medical Association, American Medical Foundation. https://www.yumpu.com/en/document/view/8189575/health-literacy-a-manual-for-clinicians [Google Scholar]
- Wu CY, Lin WX, Zhang XM, Zhang Y, Xie WD, Wang YF. 2024. PMC-LLaMA: toward building open-source language models for medicine. J Am Med Inform Assoc 31: 1833–1843. [Google Scholar]
Cite this article as: Ito S, Furukawa E, Okuhara T, Okada H, Kiuchi T. 2025. Producing nuclear disaster prevention materials with artificial intelligence chatbots: comparison of ChatGPT-3.5, Copilot, and Gemini output with google search results. Radioprotection 60(3): 211–220. https://doi.org/10.1051/radiopro/2024056
All Tables
Comparison of item scores among artificial intelligence chatbot and web search (categorical variable).
Comparison of Scale Scores for Artificial Intelligence Chatbots and Web Search (Continuous Variables).
Les statistiques affichées correspondent au cumul d'une part des vues des résumés de l'article et d'autre part des vues et téléchargements de l'article plein-texte (PDF, Full-HTML, ePub... selon les formats disponibles) sur la platefome Vision4Press.
Les statistiques sont disponibles avec un délai de 48 à 96 heures et sont mises à jour quotidiennement en semaine.
Le chargement des statistiques peut être long.
