rotate: a VHDL module which implements the Cordic algorithm

The VHDL module "rotate" (see symbol) rotates vectors. Vector rotation is normally done by the use of sine- and cosine-function.
But these functions are not available in a digital design. So the Cordic-algorithm is used for the rotation.

The module supports the Cordic-rotation mode (rotation by a given angle) and the Cordic vectoring-mode (rotation to x-axis).
The module executes the maximum number of Cordic micro-rotations which are possible based on the number of bits the operands have (configured by the generics).
The module extends the operands by additional least significant bits if the generic g_improve_accuracy is set to true.
The latency of the module can be configured (by generics) independently from the number of micro-rotations.

This means, the module is configurable by generics in order

to fulfill any requirements regarding the number of bits of the operands and
to fulfill any requirements regarding its latency.

But of course there is no guarantee that timing closure can be reached with the selected values
for the generics, as the timing depends on the technology which is used at synthesis.

The module "rotate" was developed with HDL-SCHEM-Editor.

Ports:

Port name	Direction	Description
res_i	input	asynchronous reset input, 1-active
clk_i	input	clock input
vector_mode_i	input	When 1: the module works in vector-mode. When 0: the module works in rotation-mode. Must be stable during the calculation time.
start_i	input	This input expects an 1-active impulse of 1 clock cycle width in order to start the calculation.
x_coord_i(g_coordinate_width-1:0)	input	Signed x-coordinate of the vector, which has to be rotated (g_coordinate_width is a generic). The input value is latched at start=1.
y_coord_i(g_coordinate_width-1:0)	input	Signed y-coordinate of the vector, which has to be rotated (g_coordinate_width is a generic). The input value is latched at start=1.
rotation_angle_i(g_angle_width-1:0)	input	Signed angle the vector has to be rotated by (used only in rotation-mode)(g_angle_width is a generic). The smallest negative number is interpreted as -180 degrees, the biggest positive number is interpreted as "short" before +180 degrees. If 45 degree has to be entered here, the value is calculated by 45/360 * 2**g_angle_width. The input value is latched at start=1.
ready_o	output	1-active impulse of 1 clock cycle width, when the calculations are ready (at latency 0 it gets active in the same clock cycle in which start_i gets active).
x_coord_o(g_coordinate_width:0)	output	Signed x-coordinate of the rotated vector (which has 1 bit more than the input x-coordinate). Valid at ready_o=1. Stable between 2 impulse at ready_o if g_latency_fix_cordic_length!=0 or g_latency_shorten_vector!=0.
y_coord_o(g_coordinate_width:0)	output	Signed y-coordinate of the rotated vector (which has 1 bit more than the input y-coordinate). Valid at ready_o=1. Stable between 2 impulse at ready_o if g_latency_fix_cordic_length!=0 or g_latency_shorten_vector!=0.
rotation_angle_o(g_angle_width-1:0)	output	In vector-mode this output shows the signed angle of the incoming vector, in rotation-mode it is 0 (or near 0). Valid at ready_o=1. Not stable between 2 impulses at ready_o, changes during calculation. The output has the same coding as rotation_angle_i.

Generics:

Generic name	Minimum Value	Maximum Value	Description
g_angle_width	4	666	Number of bits of the angles The maximum limit is caused by the constant c_cordic_correction_factor (module rotate_fix_cordic_length) and by the constant c_delta_phi (module rotate_by_cordic_step). This limit is only correct, if g_coordinate_width=663 and g_latency_rotate_by_cordic=664 (no dummy micro-rotations are needed, see "Configuration of latency").
g_coordinate_width	2	663	Number of bits of the coordinates The maximum limit is caused by the constant c_cordic_correction_factor (module rotate_fix_cordic_length) and by the constant c_delta_phi (module rotate_by_cordic_step). This limit is only correct, if g_angle_width=666 and g_latency_rotate_by_cordic=664 (no dummy micro-rotations are needed, see "Configuration of latency").
g_latency_lengthen_vector	0	1	Latency of the sub-module which lengthens the vector to the possible maximum length Bigger values than 1 are handled as 1.
g_latency_rotate_by_90	0	1	Latency of the sub-module which rotates the vector by 90 degree Bigger values than 1 are handled as 1.
g_latency_rotate_by_cordic	0	664	Latency of the sub-module which implements the cordic algorithm
g_latency_fix_cordic_length	0	1	Latency of the sub-module which corrects the distortion of the vector length introduced by the cordic algorithm Bigger values than 1 are handled as 1.
g_latency_shorten_vector	0	1	Latency of the sub-module which shortens the vector to its original length Bigger values than 1 are handled as 1.
g_improved_accuracy	false	true	If set to true, then additional bits are used for the internal adders. How many additional bits are used is calculated by log2(number of micro-rotations).

The module "rotate" is a hierarchical module, which is built by several submodules.

Submodule name	Functionality
rotate_lengthen_vector	The Cordic algorithm changes the coordinates of the incoming vector at each iteration by a small delta. These deltas are calculated by shifting the previous coordinates to the right by a number which is incremented at each iteration. As soon as this delta has the value 0, no more iterations can be executed, even if iterations are left. But when iterations are not executed, the result has a bad accuracy. To avoid this situation, the module rotate_lengthen_vector shifts both the incoming coordinates to the left (by the same number of bits) in order to make them as big as possible before the Cordic algorithm is started. The number of left shifts is stored and will be used at the end to decrease the resulting vector to its original length. The latency of the module can be configured to be 0 (combinatorial circuit) or 1 (sequential circuit with 1 flipflop stage).
rotate_by_90	The Cordic algorithm can only rotate vectors by -90 to +90 degrees. So there is a problem when in rotation mode the rotation angle is smaller than -90 degree or bigger than +90 degree. The same problem exists, when in vector mode the incoming vector is in 2. or 4. quadrant. The problem is solved by first rotating the incoming vector by 90 degrees (in the correct direction) by simply switching and negating its coordinates. The latency of the module can be configured to be 0 (combinatorial circuit) or 1 (sequential circuit with 1 flipflop stage).
rotate_by_cordic	This module is built by the two submodules rotate_control (small FSM) and rotate_by_cordic_step and by additionally necessary glue logic. The combinatorial submodule rotate_by_cordic_step executes 1 micro-rotation of the Cordic algorithm. Which micro-rotation is executed is configured by an input which is controlled by the submodule rotate_control. As only 1 micro-rotation is executed, the module has to be instantiated several times and/or has to be used again several times. The exact structure of the submodule rotate_by_cordic is automatically determined by the generics g_coordinate_width, g_angle_width and g_latency_rotate_by_cordic. The module contains the infrastructure to reuse the submodule rotate_by_cordic_step several times. The latency of the module can be configured to be 0 (combinatorial circuit) or different from 0 (sequential circuit). When the configured latency is 0, then for each micro-rotation a submodule rotate_by_cordic_step is instantiated. When the latency is identical to the needed number of micro-rotations, then the submodule rotate_by_cordic_step is instantiated only once. See tab "Configuration of latency" for some examples.
rotate_fix_cordic_length	Each Cordic micro-rotation changes the length of the vector, which is rotated. These changes are ignored in the submodule rotate_by_cordic and are fixed here. As all the changes sum up to a constant factor, the length of the vector can easily be fixed by a multiplication. The latency of the module can be configured to be 0 (combinatorial circuit) or 1 (sequential circuit with 1 flipflop stage).
rotate_shorten_vector	This submodule reverses the change of the vector length which was introduced by the submodule rotate_lengthen_vector. This is done by shifting the coordinates to the right by the same number of bits they were shifted to the left in submodule rotation_lengthen_vector. The latency of the module can be configured to be 0 (combinatorial circuit) or 1 (sequential circuit with 1 flipflop stage).

Configuration considerations for g_angle_width and g_coordinate_width:

The cordic algorithm works by dividing the rotation of the vector in several micro-rotations.
The implemented number of micro-rotations depends on the two generics g_angle_width and g_coordinate_width.

Dependency from g_angle_width:

At each micro-rotation the new overall rotation-angle is updated by using the delta-angle of the actual micro-rotation.
All angles are coded with g_angle_width bits in two's complement and represent a range between -180 and +180 degrees.
The delta-angles for the Cordic-algorithm are stored in a VHDL constant.
The number of bits which are used from this constant is equal to g_angle_width.
The delta-angle which is used at the first micro-rotation is always 45 degrees and has a bit representation of "00100..000".
At each following micro-rotation the delta-angle is approximately halved.
Therefore g_angle_width-2 micro-rotations can be executed, before the delta-angle is 0.
Dependency from g_coordinate_width:

At each micro-rotation the new x and y coordinates are calculated by adding or subtracting a delta-value to the previous coordinates.
These delta-values are determined by shifting the previous coordinates to the right by N bits,
where N is the number of micro-rotations which already took place.
The module rotate_by_cordic works with coordinates which have g_coordinate_width+2 bits (including 1 sign bit), because:
One additional bit is needed for the rotation of a vector with maximum length to the x-axis, the x-coordinate is then increased by factor 1.4.
A second additional bit is needed because the Cordic-algorithm first ignores the cos(phi) factors of the Cordic-algorithm and therefore
the coordinates are increased by factor 1.6.
As at the first micro-rotation the previous coordinates are not shifted, g_coordinate_width+1 micro-rotations can be executed,
before the delta-values are 0.

As the module "rotation" can be used in vector-mode and also in rotation-mode,
the number of implemented micro-rotations is made equal to the maximum of g_angle_width-2 and g_coordinate_width+1.

If a configuration fulfills the equation g_angle_width-2=g_coordinate_width+1 or g_angle_width=g_coordinate_width+3,
neither delta-angle nor delta-values are 0 at any micro-rotation (this is true for delta-values only because of the help of
the sub-module "rotate_lengthen_vector") and the rotation-direction can always be correctly updated.

When this equation is not fulfilled, then there are 2 cases:

g_angle_width>g_coordinate_width+3:

In this case the delta-values for the coordinates are 0 before the last micro-rotation is executed.
In rotation-mode this does not matter, as the coordinates will hold their value and the overall
rotation-angle will change correctly to 0, as the directions of the remaining micro-rotations
still can be determined by the sign of the overall rotation angle.
But in vector-mode, when the rotation-direction is determined by the sign of the y-coordinate, the
rotation-direction of all the remaining micro-rotations will be the same and the overall rotation-angle
may be changed in a wrong way (with small errors) in some of these last micro-rotations.
g_angle_width<g_coordinate_width+3:

In this case the delta-angle for the overall rotation angle is 0 before the last micro-rotation is executed.
In vector-mode this does not matter, as the rotation-angle will hold its value and the calculations for
the coordinates work correctly as the rotation direction still can be determined by the sign of the y-coordinate.
But in rotation-mode, when the rotation-direction is determined by the sign of the overall rotation-angle, the
rotation-direction of all the remaining micro-rotations will be the same and the coordinates may be changed in a
wrong way (with small errors) by some of these last micro-rotations.

So in vector-mode the equation g_angle_width<=g_coordinate_width+3 should be fulfilled.

In rotation-mode the equation g_angle_width>=g_coordinate_width+3 should be fulfilled.

The generics g_latency_lengthen_vector, g_latency_rotate_by_90, g_latency_fix_cordic_length, g_latency_shorten_vector
work all in the same way:

If the generic is 0, then the submodule, to which the generic belongs, works combinatorial.
If the generic is 1, then the submodule, to which the generic belongs, uses a register for its output values.

Which value for each of the generics shall be used, depends on the connected logic and the used
technology and must be derived from the synthesis timing reports and from the requirements.

The generic g_latency_rotate_by_cordic for the submodule rotate_by_cordic works in a different way:

If the generic g_latency_rotate_by_cordic is 0, then the submodule works combinatorial and all micro-rotations are done in 1 clock cycle.
If the generic g_latency_rotate_by_cordic is 1, then the submodule uses a register for its output values and all micro-rotations are also done in 1 clock cycle.
If the generic g_latency_rotate_by_cordic is bigger than 1, then the submodule again uses a register for its output values, but the micro-rotations are not
all done in one clock cycle but splitted to several clock cycles. How many micro-rotations have to be done in 1 clock cycle is calculated by the
submodule itself and based on the result a sufficient number of submodules rotate_by_cordic_step is instantiated.

If for example g_latency_rotate_by_cordic=max(g_angle_width-2, g_coordinate_width+1) then the latency is equal to the number of micro-rotations
and only 1 micro-rotation has to be done in 1 clock cycle, so the submodule rotate_by_cordic_step is only instantiated once.

If for example g_latency_rotate_by_cordic is 16 and the number of micro-rotations is 30, then 2 micro-rotations have to be done in 1 clock cycle,
so the submodule rotate_by_cordic_step is instantiated twice and the number of micro-rotations is increased to 32 (16 clock cycles with each
2 micro-rotations). In the last 2 unnecessary dummy micro-rotations no rotation is performed, but the submodule rotate_by_cordic_step passes the
operands unchanged.

An exact result can only be created, if infinite micro-rotations are executed, which is not possible.

The number of micro-rotations, which the module uses, depends on the number of bits of the coordinates and the angle.
If you increase the number of bits, the results are improved not only by the increased number of micro-rotations but also by the increased bit width.
If the latency of the module is not adapted to the new number of micro-rotations, then more micro-rotations might have to be done in a clock cycle.
And of course all adders in the module have to handle more bits. Both effects make it more difficult to reach timing closure.

In order to increase accuracy without increasing the number of bits you can set the generic g_improved_accuracy to "true".
Then the module rotate calculates how many additional bits should be used internally at the adders to improve accuracy.
The number of micro-rotations is not increased. So reaching timing closure is only affected by the adders with more bits.

Setting the generics in a way that the equation g_angle_width=g_coordinate_width+3 is fulfilled also helps to get
better accuracy, because then there will be no micro-rotation at which either the delta-values for the coordinates or
the delta-angle for the angle has the value 0.

Accuracy in vector-mode (if g_angle_width=g_coordinate_width+3 is fulfilled):

If g_improved_accuracy is "false", then:

You can expect the coordinates to be exact by ± 7 to ± 19 (when increasing the number of bits of the coordinates and angles from 10 to 130 (coordinates) and 13 to 133 (angle)).
You can expect the angle to be exact by ± 2 to ± 8 (when increasing the number of bits of the coordinates and angles from 10 to 130 (coordinates) and 13 to 133 (angle)).

If g_improved_accuracy is "true", then:

You can expect the coordinates to be exact by ± 2.
You can expect the angle to be exact by ± 1, in rare cases ± 2.

In this second case the accuracy is independent from the number of bits coordinates and angle have,
as the number of additional bits is also adapted when the number of bits of coordinates and angles is changed.

Accuracy in rotation-mode (if g_angle_width=g_coordinate_width+3 is fulfilled):

If g_improved_accuracy is "false", then:

You can expect the coordinates to be exact by ± 3 to ± 11 (when increasing the number of bits of the coordinates and angles from 10 to 130 (coordinates) and 13 to 133 (angle)).

If g_improved_accuracy is "true", then:

You can expect the coordinates to be exact by ± 2 (independent from number of bits of coordinates).

In this second case the accuracy is independent from the number of bits coordinates and angle have,
as the number of additional bits is also adapted when the number of bits of coordinates and angles is changed.

Source code for HDL-SCHEM-Editor and HDL-FSM-Editor for module "rotate" and its testbenches (Number of downloads = 15 ).
With these files the schematics and the state-diagram can be loaded into HDL-SCHEM-Editor or HDL-FSM-Editor and can be easily read and modified:

hdl_editor_designs.zip

All module VHDL-files of the module "rotate" (Number of downloads = 10 ).
These files were generated by HDL-SCHEM-Editor and HDL-FSM-Editor:

VHDL_designs.zip

All testbench VHDL-files of the 2 testbenches of the module "rotate" (Number of downloads = 12 ).
These files were generated by HDL-SCHEM-Editor and HDL-FSM-Editor:

VHDL_testbenches.zip

Relocation hints:

You should extract all archives into a folder named "rotate".
Then you must replace "M:/gesicherte Daten/Programmieren/VHDL/rotate" in all "hdl_editor_designs/*.hse" source files by your path to this directory.
Now you can navigate through the design by HDL-SCHEM-Editor and generate HDL by HDL-SCHEM-Editor for all modules except rotate_control,
for which the HDL must be generated by HDL-FSM-Editor.
Of course there is only need for generating HDL, if you change something at the modules, because you can find the HDL in VHDL_designs.zip and VHDL_testbenches.zip.
If you want to simulate or modify the modules by HDL-SCHEM-Editor you also must adapt the information in the Control-tab of the toplevel you want to work on.
There you must define a "Compile through hierarchy command", an "Edit command", the path to your HDL-FSM-Editor and a "Working directory".

Change log:

Version 1.0 (31.01.2024):

Initial version

Version 1.1 (04.03.2024):

Fixed a bug at calculating the number of micro-rotations when g_latency_rotate_by_cordic was 0 or 1.
Accuracy is now improved, if generic g_improved_accuracy is set to true.
The testbenches now do not use big constant-arrays for checks, but the procedure rotate_by_cordic from testbench_rotate_package.vhd.

If you detect any bugs or have any questions,
please send a mail to "matthias.schweikart@gmx.de".

The module "rotate"a VHDL implementation of the Cordic Algorithm