International Journal of Innovative Research in Computer Science and Technology - IJIRCST Journal
Volume: 13, Issue: 4, 2025
Pages : 51 - 67
Satyadhar Joshi
Recent Advances and Evaluation Techniques in Prompt Engineering for Large Language Models is discussed in this work. This paper surveys recent advances in prompt engineering, including chain-of-thought, tree-of-thought, and graph-of-thought techniques, and reviews over 100 contemporary sources on evaluation metrics, real-world applications, and risks. This paper presents a comprehensive review and evaluation of advanced prompt engineering techniques for financial decision-making using Large Language Models (LLMs). We systematically analyze Chain-of-Thought (CoT), Tree-of-Thought (ToT), and Graph-of-Thought (GoT) prompting methods across six critical financial tasks: risk assessment, portfolio optimization, fraud detection, regulatory compliance, earnings analysis, and derivative pricing. Furthermore, we delve into the crucial aspect of prompt evaluation, discussing key quantitative and qualitative metrics and the tools available for assessing prompt effectiveness, relevance, and safety. We systematically analyze these methods across key financial tasks including risk assessment, portfolio optimization, and fraud detection. Our experimental results demonstrate that structured prompting approaches significantly outperform traditional methods, with Graph-of-Thought achieving 15-25% higher accuracy in complex financial reasoning tasks compared to baseline approaches across literature. Our review of literature also suggest that results demonstrate that structured prompting approaches significantly outperform traditional methods, with Graph-of-Thought achieving 20-25% higher accuracy in complex financial reasoning tasks while reducing hallucination rates by 25-30% as found in the literature. We also comment on FINEVAL, a novel evaluation framework incorporating 12 financial-specific metrics spanning three dimensions: basic quality (accuracy, relevance, fluency), financial validity (regulatory compliance, risk sensitivity), and advanced reasoning (logical soundness, argument depth). The architecture in literature integrates real-time regulatory checks, dynamic prompt optimization, and domain-specific modules for financial applications, achieving 20-25ms latency for CoT paths and 80-90% GPU utilization for ToT operations. Key findings reveal that while 60-65% of surveyed financial institutions are experimenting with CoT, only 10-15% have explored GoT due to computational costs (0.12/query) and skill gaps. We project through studying litreature that by 2030, 60-80% of Tier-1 banks will deploy GoT systems, yielding 30-40% faster M&A due diligence. The paper concludes with strategic recommendations for workforce upskilling (30-50 hour curricula), and risk management protocols, while highlighting emerging challenges in explainability, adversarial robustness, and cross-border compliance. This is a pure review paper and all results are from the cited literature.