Volume- 12
Issue- 6
Year- 2024
DOI: 10.55524/ijircst.2024.12.6.8 | DOI URL: https://doi.org/10.55524/ijircst.2024.12.6.8 Crossref
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (CC BY 4.0) (http://creativecommons.org/licenses/by/4.0)
Article Tools: Print the Abstract | Indexing metadata | How to cite item | Email this article | Post a Comment
Zhongwen Zhou , Siwei Xia, Mengying Shu, Hong Zhou
Medical report generation demands accurate abnormality detection and precise description generation from CT images. While large language models have shown promising results in natural language processing tasks, their application in medical imaging analysis faces challenges due to the complexity of fine-grained feature detection and the requirement for domain-specific knowledge. This paper presents a novel framework integrating large language models with specialized medical image processing techniques for fine-grained abnormality detection and natural language description generation. Our approach incorporates a multi-modal knowledge enhancement module and a hierarchical attention mechanism to bridge the gap between visual understanding and textual description. The framework employs an adapter-based architecture for efficient domain adaptation and introduces a medical knowledge-enhanced loss function to improve description accuracy. Experimental results on three public datasets demonstrate the effectiveness of our approach, achieving 94.6% detection accuracy and a BLEU-4 score of 0.421 for description generation, surpassing current state-of-the-art methods. The system shows particular strength in handling subtle abnormalities, with a 91.2% average precision in fine-grained detection tasks. Comprehensive ablation studies validate the contribution of each component, while qualitative analysis demonstrates the clinical relevance of generated descriptions. The proposed framework represents a significant advancement in automated medical image analysis, offering potential benefits for clinical workflow optimization and diagnostic support.
[1] Y. Guo and Z. Wan, "Performance Evaluation of Multimodal Large Language Models (LLaVA and GPT-4-based ChatGPT) in Medical Image Classification Tasks," in 2024 IEEE 12th International Conference on Healthcare Informatics (ICHI), 2024, pp. 541-543. Available From: https://doi.org/10.1109/ICHI61247.2024.00080
[2] M. H. Van, P. Verma, and X. Wu, "On large visual language models for medical imaging analysis: An empirical study," in 2024 IEEE/ACM Conference on Connected Health: Applications, Systems and Engineering Technologies (CHASE), 2024, pp. 172-176. Available From: https://doi.org/10.1109/CHASE60773.2024.00029
[3] J. H. Moon, H. Lee, W. Shin, Y. H. Kim, and E. Choi, "Multi-modal understanding and generation for medical images and text via vision-language pre-training," IEEE J. Biomed. Health Informatics, vol. 26, no. 12, pp. 6070-6080, 2022. Available From: https://doi.org/10.1109/JBHI.2022.3207502
[4] S. Wu, B. Yang, Z. Ye, H. Wang, H. Zheng, and T. Zhang, "MAKEN: Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models," in 2024 IEEE International Symposium on Biomedical Imaging (ISBI), 2024, pp. 1-5. Available From: https://doi.org/10.48550/arXiv.2312.03970
[5] J. Xu, Z. Hu, J. Zou, and A. Bi, "Intelligent emotion detection method based on deep learning in medical and health data," IEEE Access, vol. 8, pp. 3802-3811, 2019. Available From: https://doi.org/10.1109/ACCESS.2019.2961139
[6] H. Li, J. Sun, and X. Ke, "AI-Driven Optimization System for Large-Scale Kubernetes Clusters: Enhancing Cloud Infrastructure Availability, Security, and Disaster Recovery," J. Artif. Intell. Gen. Sci. (JAIGS), vol. 2, no. 1, pp. 281-306, 2024. Available From: https://doi.org/10.60087/jaigs.v2i1.244
[7] S. Xia, M. Wei, Y. Zhu, and Y. Pu, "AI-Driven Intelligent Financial Analysis: Enhancing Accuracy and Efficiency in Financial Decision-Making," J. Econ. Theory Bus. Manag., vol. 1, no. 5, pp. 1-11, 2024. Available From https://doi.org/10.5281/zenodo.13926298
[8] J. Wang, T. Lu, L. Li, and D. Huang, "Enhancing Personalized Search with AI: A Hybrid Approach Integrating Deep Learning and Cloud Computing," Int. J. Innov. Res. Comput. Sci. Technol., vol. 12, no. 5, pp. 127-138, 2024. Available From: https://doi.org/10.55524/ijircst.2024.12.5.17
[9] J. Wang, T. Lu, L. Li, and D. Huang, "Enhancing Personalized Search with AI: A Hybrid Approach Integrating Deep Learning and Cloud Computing," Int. J. Innov. Res. Comput. Sci. Technol., vol. 12, no. 5, pp. 127-138, 2024. Available From: https://doi.org/10.55524/ijircst.2024.12.5.17
[10] C. Che, Z. Huang, C. Li, H. Zheng, and X. Tian, "Integrating generative AI into financial market prediction for improved decision-making," arXiv preprint arXiv:2404.03523, 2024. Available From: https://doi.org/10.48550/arXiv.2404.03523
[11] C. Che, H. Zheng, Z. Huang, W. Jiang, and B. Liu, "Intelligent robotic control system based on computer vision technology," arXiv preprint arXiv:2404.01116, 2024. Available From: https://doi.org/10.48550/arXiv.2404.01116
[12] Y. Jiang, Q. Tian, J. Li, M. Zhang, and L. Li, "The Application Value of Ultrasound in the Diagnosis of Ovarian Torsion," Int. J. Biol. Life Sci., vol. 7, no. 1, pp. 59-62, 2024. Available From: https://doi.org/10.54097/nnvdz490
[13] L. Li, X. Li, H. Chen, M. Zhang, and L. Sun, "Application of AI-assisted Breast Ultrasound Technology in Breast Cancer Screening," Int. J. Biol. Life Sci., vol. 7, no. 1, pp. 1-4, 2024. Available From: https://doi.org/10.54097/1y59dx48
[14] L. Lijie, P. Caiying, S. Liqian, Z. Miaomiao, and J. Yi, "The application of ultrasound automatic volume imaging in detecting breast tumors," unpublished. Available From:
[15] K. Xu, H. Zhou, H. Zheng, M. Zhu, and Q. Xin, "Intelligent Classification and Personalized Recommendation of E-commerce Products Based on Machine Learning," arXiv preprint arXiv:2403.19345, 2024. Available From: https://doi.org/10.48550/arXiv.2403.19345
[16] K. Xu, H. Zheng, X. Zhan, S. Zhou, and K. Niu, "Evaluation and Optimization of Intelligent Recommendation System Performance with Cloud Resource Automation Compatibility," unpublished. Available From: https://www.preprints.org/manuscript/202407.2199
[17] H. Zheng, K. Xu, H. Zhou, Y. Wang, and G. Su, "Medication Recommendation System Based on Natural Language Processing for Patient Emotion Analysis," Acad. J. Sci. Technol., vol. 10, no. 1, pp. 62-68, 2024. Available From: https://doi.org/10.54097/v160aa61
[18] H. Zheng, J. Wu, R. Song, L. Guo, and Z. Xu, "Predicting Financial Enterprise Stocks, and Economic Data Trends Using Machine Learning Time Series Analysis," Appl. Comput. Eng., vol. 87, pp. 26–32, 2024. Available From: https://www.preprints.org/manuscript/202407.0895
[19] M. Zhang, B. Yuan, H. Li, and K. Xu, "LLM-Cloud Complete: Leveraging Cloud Computing for Efficient Large Language Model-based Code Completion," J. Artif. Intell. Gen. Sci. (JAIGS), vol. 5, no. 1, pp. 295-326, 2024. Available From: https://doi.org/10.60087/jaigs.v5i1.200
[20] P. Li, Y. Hua, Q. Cao, and M. Zhang, "Improving the Restore Performance via Physical-Locality Middleware for Backup Systems," in Proc. 21st Int. Middleware Conf., 2020, pp. 341-355. Available From: https://doi.org/10.1145/3423211.3425691
[21] S. Zhou, B. Yuan, K. Xu, M. Zhang, and W. Zheng, "The Impact of Pricing Schemes on Cloud Computing and Distributed Systems," J. Knowl. Learn. Sci. Technol., vol. 3, no. 3, pp. 193-205, 2024. Available From: https://doi.org/10.60087/jklst.v3.n3.p206-224
[22] F. Shang, F. Zhao, M. Zhang, J. Sun, and J. Shi, "Personalized Recommendation Systems Powered By Large Language Models: Integrating Semantic Understanding and User Preferences," Int. J. Innov. Res. Eng. Manag., vol. 11, no. 4, pp. 39-49, 2024. Available From: https://doi.org/10.55524/ijirem.2024.11.4.6
[23] J. Sun, X. Wen, G. Ping, and M. Zhang, "Application of News Analysis Based on Large Language Models in Supply Chain Risk Prediction," J. Comput. Technol. Appl. Math., vol. 1, no. 3, pp. 55-65, 2024. Available From: https://doi.org/10.5281/zenodo.13377298
[24] F. Zhao, M. Zhang, S. Zhou, and Q. Lou, "Detection of Network Security Traffic Anomalies Based on Machine Learning KNN Method," J. Artif. Intell. Gen. Sci. (JAIGS), vol. 1, no. 1, pp. 209-218, 2024. Available From: https://doi.org/10.60087/jaigs.v1i1.213
[25] C. Ju and Y. Zhu, "Reinforcement Learning-Based Model for Enterprise Financial Asset Risk Assessment and Intelligent Decision-Making," unpublished. Available From: https://www.preprints.org/manuscript/202410.0698
[26] D. Huang, M. Yang, and W. Zheng, "Integrating AI and Deep Learning for Efficient Drug Discovery and Target Identification," unpublished. Available From: https://www.preprints.org/manuscript/202410.1089
[27] M. Yang, D. Huang, and X. Zhan, "Federated Learning for Privacy-Preserving Medical Data Sharing in Drug Development," unpublished. Available From: https://www.preprints.org/manuscript/202410.1641
[28] H. Li, G. Wang, L. Li, and J. Wang, "Dynamic Resource Allocation and Energy Optimization in Cloud Data Centers Using Deep Reinforcement Learning," J. Artif. Intell. Gen. Sci. (JAIGS), vol. 1, no. 1, pp. 230-258, 2024. Available From: https://doi.org/10.60087/jaigs.v1i1.243
[29] H. Zhang, T. Lu, J. Wang, and L. Li, "Enhancing Facial Micro-Expression Recognition in Low-Light Conditions Using Attention-guided Deep Learning," J. Econ. Theory Bus. Manag., vol. 1, no. 5, pp. 12-22, 2024. Available From: https://doi.org/10.5281/zenodo.13933725
Computer Science, University of California, Berkeley, CA, USA
No. of Downloads: 4 | No. of Views: 37
Wenxuan Zheng, Mingxuan Yang, Decheng Huang, Meizhizi Jin.
November 2024 - Vol 12, Issue 6
Siti Nur.
November 2024 - Vol 12, Issue 6
Abhishek Kartik Nandyala, Yuvaraj Madheswaran, Mrinal Kumar.
November 2024 - Vol 12, Issue 6