FPGA Implementation of a Decimal Floating-Point Co-Processor with Accurate Scalar Product Unit 
(by Malte Baesler)

Scientific and engineering problems are usually modeled using real numbers but are solved on digital computers that approximate them by floating-point numbers applying the predominant standard IEEE 754-1985. Although we generally use and think in decimal numbers, this arithmetic is binary because the corresponding circuits require less area, data is stored more densely, and binary arithmetic is more suitable for scientific applications due to its higher performance and better error characteristic.

However, many decimal fractional numbers cannot be represented exactly in binary (e.g., 1/10=0.1 has no exact binary representation) and must be approximated introducing rounding errors. In numerous engineering, financial, and commercial applications these errors are not acceptable and may even violate legal requirements of accuracy. Hence, in 2008 the floating-point standard IEEE 754-2008 was approved that also incorporates specifications for a decimal arithmetic. Software libraries can implement this arithmetic but are usually 100 to 1000 times slower than equivalent floating-point operations in hardware. Therefore, fast hardware support for decimal arithmetic on modern computer architectures is desirable.

IEEE 754 floating-point arithmetic is well-conceived for elementary operations, causing least possible rounding errors. However, due to cancellation, more complex operations in numerical algorithms might introduce serious errors and can even raise the question whether the computed result solves the given problem or not. For instance, the scalar product is a widely used operation in numerical applications that is prone to cancellation. Hence, the implementation of the accurate scalar product in floating-point units can significantly increase the accuracy of many algorithms. Moreover, various scientific and engineering applications require informations about the quality of the computed result. Interval arithmetic offers a method to yield reliable results by computing guaranteed enclosures of real-valued expressions. Unfortunately, interval arithmetic is not supported well on modern floating-point units because switching the rounding mode requires many cycles. Therefore, an efficient hardware support for interval operations requires that rounding is inherent to each operation.

In the context of this thesis, new decimal fixed-point and floating-point algorithms were analyzed and a decimal floating-point co-processor has been implemented. The four elementary operations (addition, subtraction, multiplication, and division) as well as the accurate scalar product are supported. The arithmetic units are fully combinational and can be improved by a configurable number of pipeline registers with the exception of the decimal divider that works sequentially. The co-processor provides support for the data format decimal64 and is fully compliant to IEEE 754-2008. Furthermore, the rounding mode is inherent to each operation allowing the implementation of an efficient interval arithmetic for reliable computing based on decimal floating-point arithmetic. Finally, the algorithms were optimized for FPGA architectures and have been implemented on a Xilinx Virtex-5 FPGA.