multiply_bsc: a VHDL multiplication module

The VHDL module "multiply_bsc" (see symbol) calculates the signed product of a multiplicand and a multiplier.

It uses the redundant number system "binary stored-carry" as described in
IEEE TRANSACTIONS ON COMPUTERS, VOL. 39, NO. I, JANUARY 1990
Generalized Signed-Digit Number Systems:
A Unifying Framework for Redundant Number Representations
BEHROOZ PARHAMI, SENIOR MEMBER, IEEE

Using BSC has the advantage that an addition can be implemented which has a limited carry propagation,
which always propagates the carry only to the next 2 sum bits but to not any other sum bit.

When using BSC the addition operands are not numbers coded in bits, but numbers coded in digits,
which can have the value "00", "01" or "10".

The limited carry causes a circuit structure where the timing does not depend on the width of the operands,
but only depends on the number of consecutive additions which are executed in one clock period.
Compared to a carry-save structure (where the carry is not propagated to the next sum bit, but to the next addition)
the limited carry structure has the worse timing. This is due to the more complicated addition, that has to be
performed in the limited carry addition.

Note that if you are using an advanced synthesis tool such as Synopsys Design Compiler Ultra, neither the
"multiply_bsc" design nor a carry-save structure will give a better timing than the "multiply" design (also
available from this website). Design Compiler Ultra already uses advanced arithmetic optimisations that
implement fast addition structures.

The number of bits of multiplicand and multiplier are configured by generics.
Product, multiplicand and multiplier are numbers in 2's complement format.
The module uses flipflops only for storing the product (and for controlling).
For quick access to the multiplier bits, the multiplier is first stored in the product flipflops and
then replaced by the upcoming product bits through shift operations.
The latency of the module can be configured by a generic independently from the width of the operands.

This means, the module is configurable by generics in order

to fulfill any requirements regarding the number of bits of the operands and
to fulfill any requirements regarding its latency.

But of course there is no guarantee that timing closure can be reached with the selected values
for the generics, as the timing depends on the technology which is used at synthesis.

The module "multiply_bsc" was developed with HDL-SCHEM-Editor.

Ports:

Port name	Direction	Description
res_i	input	asynchronous reset input, 1-active Can be clamped to 0 when g_latency=0.
clk_i	input	clock input Can be clamped to 0 when g_latency=0.
start_i	input	This input expects an 1-active impulse of 1 clock cycle width in order to start the calculation. When g_latency=0 then back to back pulses can be used.
multiplicand_i(g_multiplicand_width-1:0)	input	Signed multiplicand (g_multiplicand_width is a generic). The input must be stable during the calculation.
multiplier_i(g_multiplier_width-1:0)	input	Signed multiplier (g_multiplier_width is a generic). The input is latched at start=1 and can be changed afterwards.
ready_o	output	1-active impulse of 1 clock cycle width, when the calculation is ready (at latency 0 it gets active in the same clock cycle in which start_i gets active).
product_o(g_multiplicand_width+g_multiplier_width-1:0)	output	Signed product. Valid at ready_o=1. Not stable during calculation.

Generics:

Generic name	Minimum Value	Maximum Value	Description
g_multiplicand_width	2	none	Number of bits of the multiplicand The first bit represents the sign as the operands have to be coded in 2's complement.
g_multiplier_width	2	none	Number of bits of the multiplier The first bit represents the sign as the operands have to be coded in 2's complement.
g_latency_mul	0	none	Latency of the multiplication algorithm in clock cycles When g_latency_mul is 0, then the multiplication is a combinatorial design.
g_latency_convert	0	1	Latency of the submodule multiply_bsc_convert in clock cycles This module converts the product from a BSC number back into a 2's complement number. When g_latency_convert is 0, then multiply_bsc_convert is a combinatorial design.

The module "multiply_bsc" is a hierarchical module, which is built by 4 submodules.

Submodule name	Functionality
multiply_bsc_package	The package multiply_bsc_package contains all needed type definitions and functions to handle "binary stored-carry" numbers.
multiply_bsc_negate	The "multiply_bsc_negate" submodule negates the multiplicand. The negated multiplicand is used by the "multiply_bsc_step" submodule. If g_latency_mul=0 or g_latency_mul=1 then the "multiply_bsc_negate" submodule is a combinatorial design. Otherwise the "multiply_bsc_negate" submodule will use a register to store the negated multiplicand, in order to relax the timing.
multiply_bsc_step	The "multiply_bsc_step" module processes 1 bit of the multiplier. It is instantiated as many times as multiplier bits are processed during 1 clock cycle (which depends on the generic g_latency_mul). Depending on the multiplier bit processed, 0 or the multiplicand is added to the partial product. If the processed multiplier bit is the sign bit and has a value of 0, 0 is added to the partial product. If the processed multiplier bit is the sign bit and has a value of 1, the multiplicand is subtracted from the partial product. This subtraction compensates for the error of treating the bits of the negative multiplier as if the multiplier were positive.
multiply_bsc_convert	The "multiply_bsc_convert" module converts the product from a BSC number into a 2's complement number. If g_latency_convert=0 then the "multiply_bsc_convert" submodule is a combinatorial design. Otherwise the "multiply_bsc_convert" submodule will use a register to store the converted product, in order to relax the timing at the product_o output.
multiply_bsc_control	The multiply_bsc_control modules generates all the control signals which are needed. It enables the internal registers for the intermediate or final results. It identifies the clock period in which the sign bit of the multiplier is handled. It activates the ready_o output at the end of the calculation.

There are no limitations for the generics g_multiplicand_width and g_multiplier_width (except that they must be bigger than 1).
These generics are most of the time determined by the environment, where the module multiply_bsc is used.

There is also no limitation for the generic g_latency_mul. But this generic determines not only the latency but also
how difficult it will be to reach timing closure: The smaller the value is chosen, the harder it will be to reach timing closure.

If g_latency_mul is equal to g_multiplier_width-1, then one bit of the multiplier is processed in each clock cycle,
except for the first clock cycle, which always processes one bit more than all the others.

If g_latency_mul is smaller than g_multiplier_width-1, then more than 1 bit of the multiplier is processed in each clock cycle.
The number of bits processed in one cycle can be calculated by rounding up (g_multiplier_width-1)/g_latency_mul to the next integer.
However, note that processing more than one bit of the multiplier in a clock cycle may prevent timing closure.

If g_latency_mul is greater than g_multiplier_width-1, then the number of bits of the multiplier is increased to g_latency_mul+1.
Again 1 bit of the (extended) multiplier is processed in each clock cycle.
This, of course, leads to an internal product register with the same additional number of bits as the multiplier.

Source code for HDL-SCHEM-Editor and HDL-FSM-Editor for module "multiply_bsc" and its testbench (Number of downloads = 198 ).
With these files the schematics and the state-diagram of module multiply_bsc can be loaded into HDL-SCHEM-Editor or HDL-FSM-Editor and can be easily read and modified:

hdl_editor_designs.zip

All module VHDL-files of the module "multiply_bsc" (Number of downloads = 214 ).
These files were generated by HDL-SCHEM-Editor and HDL-FSM-Editor:

VHDL_designs.zip

All testbench VHDL-files of the module "multiply_bsc" (Number of downloads = 210 ).
These files were generated by HDL-SCHEM-Editor and HDL-FSM-Editor:

VHDL_testbenches.zip

Relocation hints:

You should extract all archives into a folder named "multiply_bsc".

Then you should load the toplevel (probably the testbench) into HDL-SCHEM-Editor.
When you navigate through the design hierarchy by a double click at each symbol,
HDL-SCHEM-Editor will find the submodules on your disk and ask if it can replace
the original path to the submodule by the new one at your disk.
After storing the changed modules the relocation of the source files is ready
(instead you could replace "M:/gesicherte Daten/Programmieren/VHDL/multiply_bsc" in all
"hdl_editor_designs/*.hse" source files by your path to this directory with your editor).

Now you can navigate through the design by HDL-SCHEM-Editor and generate HDL by HDL-SCHEM-Editor for
all modules except multiply_bsc_control, for which the HDL must be generated by HDL-FSM-Editor.
Of course there is only need for generating HDL, if you change something at the modules, because you can find the HDL in VHDL_designs.zip and VHDL_testbenches.zip.

If you want to simulate or modify the modules by HDL-SCHEM-Editor you also must adapt the information in the Control-tab of the toplevel you want to work on.
There you must define a "Compile through hierarchy command", an "Edit command", the path to your HDL-FSM-Editor and a "Working directory".

Change log:

Version 1.4 (16.05.2025):

When the first partial product is calculated, two multiplier bits are now always processed instead of one.
For example, if g_multiplier_width is 8 and g_latency_mul is 8, g_latency_mul can now be reduced to 7; however, in
all other clock cycles, only one bit of the multiplier will still be processed.
The HDL code for connecting only the relevant product-bits to the output was simplified (no functionality change).

Version 1.3 (28.04.2025):

Removed not needed signal product_out from multiply_bsc_convert (no change of logic).

Version 1.2 (27.04.2025):

Improved submodule multiply_bsc_step for better readability (no change of logic).

Version 1.1 (16.04.2025):

Added some comments in multiply_bsc_package.vhd.

Version 1.0 (11.04.2025):

Initial version

If you detect any bugs or have any questions,
please send a mail to "matthias.schweikart@gmx.de".

The module "multiply_bsc"a VHDL implementation of a multiplication algorithm using the 'binary stored-carry' (BSC) number system

Ports:

Generics:

Relocation hints:

Change log:

The module "multiply_bsc"
a VHDL implementation of a multiplication algorithm using the 'binary stored-carry' (BSC) number system