-- Filename: square_root_e.vhd
-- Created by HDL-SCHEM-Editor at Sun Jan 12 11:54:05 2025
-- This module calculates the square root of a positive integer (radicand).
-- As the calculated root has a limited number of bits, it cannot represent the exact square root
-- of the radicand (except when the radicand itself is a square number).
-- The calculated root will always exactly fulfill this equation: 
--     radicand = root**2 + remainder,                 (0)
-- where remainder is always positive or zero and is made as small as possible.
--
-- To find the smallest remainder, the algorithm sets each bit of the root to 1 on a trial basis (starting at the
-- most significand bit, MSB) and checks the sign of the resulting remainder after each set bit.
-- When the remainder is still positive, then setting the bit was correct and can be kept.
-- The algorithm starts with root=0 and remainder=radicand, which is always correct (step i=0, see below).
-- At the next step the MSB of root is set and the sign of the new remainder is determined by calculating the difference radicand-root**2.
-- When the sign of the difference is positive, then setting the bit to 1 was correct, otherwise the bit must be reset to 0.
-- Afterwards this procedure is done with each next bit of root, until the last bit of root is determined.
--
-- Calculating the difference radicand-root**2 is not so easy, as the multiplication root*root is involved.
-- But this multiplication can be avoided:
-- The radicand has 2*n bits (if g_radicand_width is an odd number, an additional MSB with value 0 is added to the radicand),
-- where n is the number of bits of the root.
-- When in step i (i=1..n) of the algorithm a next bit of the root is determined, then
-- the previous result root[i-1] and remainder[i-1] already fulfill equation 0:
--     radicand = root[i-1]**2 + remainder[i-1]                                         (1)
-- After step i the new determined root(i) must also fulfill equation 0:
--     radicand = root[i]**2 + remainder[i]                                             (2)
-- As the sign of remainder[i] must be checked, the equation 2 is resolved to remainder[i]:
--     remainder[i] = radicand - root[i]**2                                             (3)
-- The term root[i] is equal to root[i-1] with the bit in evaluation set to 1 (at step i=1 the MSB is set) which is equivalent to this addition:
--     root[i] = root[i-1] + 2**(n-i)
-- Now in equation 3 the term root[i] can be replaced:
--     remainder[i] = radicand - (root[i-1] + 2**(n-i))**2                              (4)
--     remainder[i] = radicand - root[i-1]**2 - 2*root[i-1]*2**(n-i) - 2**(2*n-2*i)     (5)
-- The difference "radicand - root[i-1]**2" is known from equation 1 and has the value remainder[i-1]:
--     remainder[i] = remainder[i-1] -  2*root[i-1]*2**(n-i)   - 2**(2*n-2*i)
--     remainder[i] = remainder[i-1] - (  root[i-1]*2**(n-i+1) + 2**(2*n-2*i))          (6)
-- Adding 2**(2*n-2*i) to the product root[i-1]*2**(n-i+1) means that bit (2*n-2*i) of the product must be set to 1.
-- The product root[i-1]*2**(n-i+1) means that root[i-1] must be shifted left in each step by a decreasing number (n,n-1,...,1) of bits.
-- Instead of implementing these different left shifts, root[0] is implemented as a signal with 2n bits,
-- which is already shifted left by n bits and has all bits at value 0.
-- If in step i=1 the MSB of root has to be set, then root[1] is calculated by setting the MSB of root[0] (which has 2n bits) to 1.
-- Otherwise root[1] is identical to root[0].
-- Then root[1] is shifted 1 bit to the right for the next step, this is the same as shifting left by (n-2+1) bits in the next step.
-- In the next step when the next bit of root has to be set, the shifted position of root has to be taken into account.
-- The index of the next bit of root to be set is determined by 2*n-1-2*(i-1).
-- Again the new calculated root is shift right by 1 bit at last.
-- After the last step root was shifted right by 1 bit so often, that it is shifted to the correct position.
-- At each step the new calculated remainder can only be used, if it is positive, otherwise the remainder is kept unchanged.
--
-- When g_additional_root_bits is different from 0, then additional root bits below the binary point are calculated and added to the result root_o.
-- If g_latency = (g_radicand_width + g_radicand_width mod 2)/2 + g_additional_root_bits, then the module square_root_step is instantiated once.
-- If g_latency is bigger than this value, then the module square_root_step is instantiated once and additional root bits are calculated (to extend the latency),
-- which are not propagated to the output root_o.
-- If g_latency is smaller than this value, then as much instances of square_root_step are inserted as needed to reach the latency (per clock cylce 
-- more than 1 step of the algorithm is then performed).
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity square_root is
    generic (
        constant g_radicand_width       : natural := 32; -- Allowed values: >0
        constant g_additional_root_bits : natural := 16; -- Allowed values: all
        constant g_latency              : natural := 32  -- Allowed values: all; Recommended: (g_radicand_width+g_radicand_width mod 2)/2 + g_additional_root_bits
    );
    port (
        clk_i      : in  std_logic;
        radicand_i : in  unsigned(g_radicand_width-1 downto 0);
        res_i      : in  std_logic;
        start_i    : in  std_logic;
        ready_o    : out std_logic;
        root_o     : out unsigned(g_radicand_width/2-1+g_additional_root_bits+g_radicand_width mod 2 downto 0)
    );
end entity square_root;
