Multiplication has recently been given top priority in all applications of digital signal processing and machine learning. It is crucial to control the area, latency, power, and performance overall using parallel implementations. This will require more logic sizes with critical routes and more power consumption because the amount of multiplications will also result in a number of arithmetic additions and subtractions. In order to address this issue, the proposed work will present extensive optimization of radix-4 multiplication circuits using a modified booth algorithm and a kogge stone adder, which will result in smaller critical paths and improved performance overall when compared to Wallace trees and DADDA multipliers. Finally, this effort will synthesize in a Xilinx FPGA using Verilog HDL and demonstrate area comparisons.