BASIC ARITHMETIC

Numbers
From the programmers perspective, there are four types of number from which more complicated numeric types such as complex numbers or multivectors can be constructed: integer, rational, fixed-point, and floating-point. Integeres are whole numbers, typically signed and in the range [-2_M-1,2_M-1) with _M typically 8, 16, 32 or 64.  Twos compliment form is usually used for negative values, with -x represented by x  ^  ~0 where ~0 has 1 in all _M bits.
For rational numbers, we represesent x with two integers p and q¹0 with the understanding that x=pq-1.
For fixed-point we represent x with a signed intgere z and the understanding that x=2-_Lz where L is a positive integer less than _M. With _M=16 and _L=14, for example, we can represent values in (-2,2) to within 2-14. One is represented by 0x4000, ½ by 0x2000 and so on. When multiplying two such numbers together, we must shift the 32-bit result down by 14 places. If _M=32 we acheive dynamic range (217,217).
For floating-point we represent x by a fixed-point  mantissa y and an integer  exponent z with the understanding that x=2zy. Typically we would have a seperate sign-bit for the sign of x, forcing y positive, and choose z so that yÎ[1,2), allowing us to store |y|-1 rather than y; enabling _L=_M for y.
However, if we wish to store groups of similar magnitude numbers (vector coordinates (_x1,_x2,_x3) say) it is often sensible to have just one common exponent for all of them. We then have x = 2z(_y1,_y2,_y3) where _y1,_y2, and _y3 are signed values in (-1,1). We would typically choose z so that at least one of the yi has |yi|³½.
Fixed-point numbers are arguably the most fundamental type. They are less forgiving in use, requring greater care with range and accuracy, than floating-point but are much faster to add and subtract. That said, a lot of silicon has been cut to provide fast floating point units, and a lot of standard library routines assume floating point.

Heximal Numbers
Serious programmers should be familiar with heximal numeric notation and recognise key values.Value
 Heximal Decimal ¼ .4000000 0.25000000 1/3 .5555555 0.33333333 ½ .8000000 0.50000000 4-1p .C90FDAA 0.78539816 3-1p 1.0C15238 1.04719755 2-1p 1.921FB54 1.57079633 p 3.243F6A8 3.14159265 2p 6.487ED51 6.28318531 2-½ .B504F33 0.70710678 2½ 1.6A09E66 1.41421356 3-½ .93CD3A2 0.57735027 3½ 1.BB67AE8 1.73205081 5-½ .727C971 0.44721360 5½ 2.3C6EF37 2.23606798 7-½ .60C2479 0.37796447 7½ 2.A54FF53 2.64575131

Multiplication and Division
The fastest way to multiply or divide two numbers is often logarithmically as described in the next section. Failing this, If the processor does not have multiply and divide instructions, they must be explicitly programmed.

Multiplication
We will assume we have an N-bit value P to be multiplied by an M-bit value Q to give a K-bit value R. A fully accurate result requires  K ³ N+M.
We will further assume that P is non-negative.
If we write pi for the ith bit of P (i=0 giving the LSB) then we have:  QP = Qåi=0N-1 2ipi
Let Rn=åi=0n Q2ipN-n+i. Rn is thus Q times the top n bits of P.
Further:
 Rn+1 = åi=0n+1 Q2ipN-n-1+i = åi=0n+1 Q2ipN-n-1+i + Q20pN-n-1 = åi=0n Q2i+1pN-n+i + QpN-n-1 = 2åi=0n Q2ipN-n+1 + QpN-n-1 = 2Rn + QpN-n-1

So our algorithm is
1. Set n = 0 ; Rn = 0
2. If pN-n-1=1 set Rn+1 = 2Rn + Q
Else set Rn+1 = 2Rn
3. Set n = n+1 ; If n < K_Mult goto (ii)
4. RK_Mult = MostSig K bits of PQ. If K=N+M this is exact.
which may be implemented as
1. Set n = 0 ; R = 0
2. If pN-n-1=1 set R = 2R + Q else set R = 2R
3. Set n = n+1; If n < K-M goto (ii)
4. R = MostSig K bits of PQ.

This algorithm works for any value of Q and any non-negative value of P. However, it is often worthwhile checking explicitly for zero valued P or Q before using the algorithm and exiting with R=0. This will give a marked performance improvement if the routine is frequently called with one or other argument zero (eg. multiplication of sparse matrices).
If we assume that Q is also non-negative then the addition of Q to R in step (ii) may be more efficiently codable. There is another advantage to having Q of low-width. If Q is no wider than P it is possible improove on a naive implimentation of the algorithm that shifts P leftwards to extract its bits in the required order for the test in (ii). Since R is also being shifted leftwards (with the occassional addition of Q) we can effectively do both shifts at once by bringing R into P from below. For this to work R must have width ³ N+M. We procede as follows:
• (i)   Set R = 2MP
• REPEAT K-M TIMES
• (ii)       Shift R left by one bit setting carry = resulting N+Mth bit
• (iii)      If carry is set set R = R + Q
• ENDREPEAT
• (end)  LeastSig K bits of R = MostSig K bits of PQ.

Note that if we do not perform N cycles of the loop there will still be some low bits of P at the top of our "result" R.
In step (iii) the addition is performed only if the carry is set. This is annoying on processors such as the 6502 without an "Add without Carry" instruction. Rather than clearing the carry everytime it is possible to set R = 2M times the compliment of P. Alternatively one can decrement Q before entering the loop and restore it later if necessary.
A general multiplication routine is typically of the form:
1. Check for easy values of P and/or Q like 0 or +1.
2. Set S = |P| ; T =|Q|
3. Set R = ST  (unsigned multiplication)
4. If P and Q have different sign negate R
5. R=result

A "narrow" multiplication operartion can be used for wider arguments by repeated application in accordance with the results
 (2ma+b)(2nc+d) = 2n+mac + 2mad + 2nbc + bd = 22nac + 2n(ad+bc) + bd if n=m = (22n+2n)ac + 2n(a-b)(d-c) +(2n+1)bd.

the latter rearrangement giving three rather than four "sub" multiplications.

Division
It is possible to perform division by successive multiplication. We can set P0, Q0 equal to P and Q scaled so that Q is in the range [1/2,1). Writing dn for 1-Qn we then have
Pn/Qn = Pn/(1-dn) = Pn(1+dn)/(1-dn2)
so if we set Pn+1 = Pn(1+dn) ; Qn+1 = 1-dn2 we have that
( i) Pn/Qn = P/Q for all n>0.
(ii) Qn ® 1 as n ® ¥.
Whence Pn ® P/Q as n ® ¥.
Division by specific constants can be done "by hand" for example
 x/3 = x(1+1/2)-1/2 = x(1 - 1/2 + 1/4 - 1/8 + ...)/2 = x(1/2 - 1/4 +1/8 - 1/16 + ...).

General division routines are usually based on binary long division. P is shifted left into a comparison field F and if F is then greater than Q, Q is subtracted from F and a bit is added to the result R.
We will assume for the moment that Q>P>0. The result P/Q is therefore fractional and we will calculate 2K(P/Q).
1. Set n = 0; Rn = 0 ; Fn = P
2. Set Fn = 2Fn
3. If Fn>Q Set Fn = Fn - Q ;  Set Rn = 2Rn + 1
Else Set Rn = 2Rn
4. Set n = n+1 ; If n < K goto (ii)
5. RK = 2K(P/Q)      ie. (2KP) DIV Q ;    FK = 2KP - QRK    ie. (2KP) MOD Q
As for multiplication, we can combine the two shifts and rotate R "into" F.
1. Set R = 2P
REPEAT
2. If R>2NQ Set R =2(R - 2NQ) + 1
Else     Set R =2R
K TIMES
3. LeastSig K bits of R = (2KP) DIV Q
Remainder of R = (2KP) MOD Q
For example, if P=&38, Q=&7E, K=8 since  38/7E = 0.71C71C we would obtain R8 = 71 ; F8 = 28(38-7E*.71) = 62.

The above will typically form the basis of a general division routine of the following form:
1. Check for special cases like Q=0, P=0, or Q=+1.
2. Set P1=|P|/2L ; Q1=|Q| where L is chosen so that P1 will be < Q1.
3. Set R1=(2KP1) DIV Q1 as above; R2=(2KP1) MOD Q1 if required.
4. If P and Q are of different sign negate R1.
5. If P is negative negate R2 if required.
6. R1=(2K-LP) DIV Q ; R2=(2K-LP) MOD Q.

Glossary   Contents   Author
Copyright (c) Ian C G Bell 1998
Web Source: www.iancgbell.clara.net/maths or www.bigfoot.com/~iancgbell/maths
18 Nov 2006.