There are four basic ways to computaionally evaluate functions:
(i) Using a Look Up Table, possibly with some form of iterpolation (see
Look Up Tables).
(ii) Using a "bit-shifting" algorithm similar to those above for
multiplication and division that "builds" the result.
(iii) Iterative Methods using a sequence that approaches the desired value
(see Reccurance Relations).
(iv) Evaluating an Approximation Function (typically a polynomial) which
is close to the desired function over the required range.
Look Up Tables
Lookup tables ( LUTs) provide fast but often imprecise evaluation of
functions. They can take up a lot of memory but may vastly improove
performance. Designing look up tables and the code to access them is an
exercise in balancing storage and speed against accuracy.
To discuss look up
tables in general we will take as a simple example the problem of evaluating
sin(x) where x is measured in 256ths of a rotation for any x Î Z255.
We note first of all that if we know sin(x) over the range [0,64] then we
can readily calculate sin(x) for any x Î Z255.
If we define a function
s11:Z255 ® Z64 by
s11(x) | = | x |   | if 0 < x < 64 |
= | 128-x | if 64 £ x < 128 | ||
= | x-128 | if 128 £ x < 192 | ||
= | 256-x | if 192 £ x < 256 |
s22(y,x) | = | y |   | if 0 £ x < 128 |
= | -y | if 128 £ x < 256 |
Just how accurate is this method of approximating sin(x)?
Suppose we want to calculate sin(5). Since we have only tabulated sin for
even arguments we will actually return sin(4) giving an absolute error of -0.02.
But this assumes that sin(4) is evaluated exactly and this will be subject to an
absolute error of 2-9 due to the limited width of the LUT entry.
This latter error is a thinness or quantisation error due to the limited accuracy of the
LUT entries. These are minimised by tabulating a function with as small a
variation as possible, scaled and translated to fully use the given LUT entry
width.
The former is a spareseness error and is due to the limited number of
LUT entries. These can be reduced by tabulating as small a segment of the
desired function as possible and by some form of interpolation on the
tabulated values.
Note however that the linear function s(x) = x/64 lies below and within
0.21052 of sin(x) over [0,64] and if we exploit this by tabulating
S[i] = [29(sin(2i)-i/32)| for i Î Z32 we can reduce
the absolute error in sin(2i) from 2-9 to 2-11.
In general, we approximate a desired function F: X ® Y via an Nentry LUT
by F(x) =f2(F[f1(x)],F[h(f1(x))],x)
where f1 is a contraction function mapping X ® ZN;
f2 is an expansion function mapping ZM × ZM × X ® Y;
F[i] denotes the ith element of the table (assumed in ZM).
(This is not truly general since it is possible to use more than two LUT
entries for a given evaluation, eg. fitting a quadratic to three tabled
points.)
In choosing the contraction function f1 we first look for a
representative portion of X with respect to F.
This is a subset X* of X such
that if we know F over X* we can readily extend to X
(in the sin example Z64 was a representative portion of Z255 with respect to sin).
f1 is then the composition of two maps:
f11:X ® X* defined so F(x) is easily derived from F(f11(x)) " x Î X.
f12:X* ® ZN.
The expansion function is also the composition of two maps:
f21, maps ZM × ZM × X to Y
and attempts to regain the accuracy lost due to f12;
f22, maps Y × X ® Y and attempts to compensate for f11.
The art of LUT definition lies in choosing expansion and contraction
functions that minimise the total error (arising from thinness and sparseness
errors) without requiring excessive calculation or memory.
ThinnessErrors
To minimise the thinness errors we tabulate the deviation of F from a
(usually fairly crude) approximation function f:X ® Y for F over X*
rather than F itslf, preferably scaled up to use the full width of the table entries.
The simplest choice of this base function f is a constant, obvious choices
being either the minimum or the median value taken by F over X*.
We then tabulate
F[i] = [K(F(f1-1(i))-f(f1-1(i))|
  " i Î ZN
where K may be chosen so that F[i]max is as large as possible within ZM.
Our sin example used base functions of s(x)=0 (giving an absolute error in
sin(2i) of 2-9), and s(x) = x/64 giving an absolute error in sin(2i) of 2-11.
(occuring at (switching to radians) x=cos1(2/p) = 0.28 giving a difference
of approximately 0.21051).
Any base function used must be catered for in the first part of the
expansion function.
When s(x)=0 we can get away with s22(y1,y2,x) = 2-8y1 but we
must compensate for s(x)=x/64 with
s22(y1,y2,x) = 2-10y1+2-6s1(x).
A suitable base function over X*=[a,b] is often given by g(x)=f(a)+(x-
a)(f(b)-f(a))/(b-a). The maximal deviation of g from f then occurs at
x=f'1((f(b)-f(a))/(b-a)).
Sparseness Errors
These are usually a greater problem than thinness errors. To halve
thinness areas it suffices to add a single bit to each LUT entry. To halve
sparseness errors one must double the number of entries.
If X* = [a,a+b] and we map it to ZN by
f12(x) = [N(x-a)/b] then the
sparseness errors are less than f(x+b/N)-f(x) where x is that point in [a,a+b]
where f(x) has maximal absolute gradient
- approximately bf'(x)/N for small b/N.
In the sin example our sparseness errors are high. b/N is 2 and x=0 so
our sparseness errors are potentially S(2)-S(0). Since we will only actually
want S(n) for integral n our worst sparseness error will actually be when S(1)
evaluates to S(0)=0 instead of &0.0324.
We can reduce sparseness errors by incorporating linear interpolation
of the LUT values into f22. To do this we need an approximator G(x) for the
gradient f'(x) at out tabulated point. The interpolated value from a base
evaluation f(a) is then given by g(a+x) = f(a) + xG(a). G(a) can be obtained
from a sperate LUT for the derivative function, or obtained from the LUT of
the function itself. One approximator for f'(a) is given by G(a)=(f(a+b)-
f(a))/(ba) where b is the smallest value such that f(a+b) may be evaluated
from the table distinct from f(a). This particular strategum will be refered
to here as naive linear interpolation.
Naive linear interpolation is very easy for s12(x)=[x/2], s(x)=0. We
simply set
s22(y1,y2,x) | = | 2-8y1 |   | if s11(x) is even |
= | 2-9(y1+y2) | if s11(x) is odd |
f(a+x)-g(a+x) | » | f(a)+xf '(a)+x2f "(a)/2+o(x3)-f(a)-bf '(a)-b2f "(a)/2o(b3) |
= | f'(a)x(x-b)/2 + o(x3,b3). |
In our example the error involved in calculating sin(2i+1) from sin(2i)
and sin(2i+2) is -(2p/256)2sin(2i)/2 < 2-15p2 < &.001A.
In summary, if our sin LUT is set up as S[i]=[29(sin(2i)-i/32)| for iÎZ32
and used by
S(x) | = | h(x)(2-10S[s1(x)]+2-6s1(x)) |   | if x is even |
= | h(x)(2-11(S[s1(x)]+S[s1(x)+1])+2-6s1(x)+2-7) | if x is odd |
s1(x) | = | [x] | if 0 £ x<64 | |
= | [128-x] | if 64 £ x <128 | ||
= | [x-128] | if 128 £ x <192 | ||
= | [256-x] | if 192 £ x <256 |
Suppose now we can have 65 bytes in the LUT. Should we double the width
of each entry or the number of entries?
The former eliminates all the thinness errors because the width of the
LUT entry equals the required result width. Because of this, another advantage
is that s(x)=0 is just as good as s(x)=x/64.
The latter eliminates all the sparseness errors because the length of
the table then equals the length of X*.
But if we are using linear interpolation the sparseness errors are small
compared to the thinness ones so it is better to "widen" the LUT than
"lengthen" it in this as in many other cases.
Note that this is not the optimal way to code sine LUTs. We can do better
than the G described above by exploiting the fact that the derivative of sine
is cosine. This is described in Sine and Cosine Functions below.
Look up table methods are considered in depth later on. We will here consider how the other three methods may be used to evaluate some of the more common functions needed by games programmers.
Bit-Shifting Methods
"And let your speach be Yea, yea; Nay, nay; and whatsoever is more is of the evil one"
--- Matthew 5:37
We will use the notation x{i} to denote the value (0 or 1) of the bit of
the operand x corresponding to 2i.
Thus x = åi=-¥+¥ x{i}2i.
There are two basic sorts of bit shifting algorithm to evaluate y=f(x).
(a) Operand-Based
We build up y working from the bits of x. This approach requires an
identity of the form f(a+b)=g(a,b,f(a),f(b)). A typical algorithm would be:
(i) Set y0=f(0); i=<lowest bit position for x>. s=x{i}2i
(ii) If x{i}=1 then yi+1 = g(s,2i,yi,f(2i)),s=s+2i else yi+i = yi.
(iii) i=i+1. Repeat (ii) unless x{j}=0 " j ³ i.
If g is of the form g(f(a),f(b)) then we do not need to maintain s. This approach usually requires a small lookup table for f(2i).
(b) Result-Based
We build up the bits of y exploiting the monotinicity of f-1.
If f-1 increases we would use:
(i) Set y0=0; N=<maximal posible bit position for f(x))
(ii) If f-1(yn+2N-n) ³ x then yn+1= yn+2N-n else yn+1= yn
(iii) i=i-1. Repeat (ii) unless i<<lowest desired bit posion for y>
This method is equivalent to a binary search for the value of y solving
¦-1(y)=x.
Iterative Methods
We iterate some sequence or sequences a number of times, where one
sequence tends to the desired result. See the chapter on Recurrence Relations.
Approximation Functions
We evaluate a different function, typically a polynomial, that
approximates the desired function over the desired domain. We may use
different approximation functions over different parts of the domain.
For any function p(x) which is meant to approximate f(x) over a given range [a,b] we can construct an error function e(x)=p(x)-f(x) which we would like to be as small as possible over [a,b].
If p is constrained to be a particular type of function (say, a polynomial of degree n) then we want the function of that type that "minimises" |e(x)| over [a,b] (rather than naively taking the first n terms of the Taylor series, for example). Unfortunately it is not immediately clear what it means to "minimise" a function over a range. The two most common criteria for error minimilisation are minimising E¥ = MaxxÎ[a,b]|e(x)| (the minimax approximator) and minimising E2 = òabe(x)2 dx (the least squares approximator). The former gives the smallest maximal deviation of p from f, while the latter minimises the "overall" deviation.
We will show below how to find the minimax and least square polynomials of
degree n appproximating a function f over [a,b].
Least Squares Polynomials
If we write ci for the coefficient of xi in our polynomial p(x) then for p to
be a least squares approximator for f we require
dE2 / dci | = | 0 |   | i Î Zn+1 | |
Û | d/dci[ òab(p(x)-f(x))2 dx] | = | 0 | i Î Zn+1 | |
Û | òabd/dci[(p(x)-f(x))2] dx | = | 0 | i Î Zn+1 | |
Û | òabd/dci[(Sj=0.ncjxj - f(x))2] dx | = | 0 | i Î Zn+1 | |
Û | òab2(Sj=0.ncjxj - f(x))d/d(Sj=0.ncjxj) dx | = | 0 | i Î Zn+1 | |
Û | òab2(Sj=0.ncjxj - f(x))xi dx | = | 0 | i Î Zn+1 | |
Û | òabSj=0.ncjxi+j dx | = | òabxif(x) dx | i Î Zn+1 | |
Û | Sj=0.n òabcjxi+j dx | = | òabxif(x) dx | i Î Zn+1 | |
Û | Sj=0.n(bi+j+1 - ai+j+1)/(i+j+1) | = | òabxif(x) dx | i Î Zn+1 | |
Û | Ac | = | y | i Î Zn+1 |
The basic principle is that the error function of a minimax polynomial p
of degree n approximating a function f over [a,b] will achieve its greatest
magnitude h at exactly n+2 different values of x in the range [a,b] and,
further, that if these values are {h0,h1,...,hn+1} where
a£h0<h1<.....<hn+1£b then
e(hi) = (-1)ig i=0,1,..n+1 where |g|=h.
(The errors alternate in sign).
Suppose we wanted to fit a polynomial of degree n to the n+2 points
(xi,f(xi)) i=0,1,..n+1.
Any polynomial that minimises Maxi=0,n+1|p(xi)-f(xi)|
satisfies p(xi)-f(xi) = (-1)ie i=0,1,..n+1
for some e, and, conversely, any
polynomial of degree n that satisfies this for some e does minimise
Maxi=0,n+1|p(xi)-f(xi)|.
We must also have that |e|£h since if |e|>h then the
minimax polynomial over [a,b] would give a smaller
Maxi=0,n+1|p(xi)-f(xi)|.
If we happened to know what the values h0,h1,...,hn+1 were then we could solve
the n+2 linear equations
p(hi)-f(hi) = (-1)ig i=0,1,..n+1
to find g and, more importantly, the n+1 coefficients of the minimax
polynomial p.
Since we do not know these values we choose any set
S0 of n+2 values from [a,b].
S0={xi : i=0,1,..n+1}
where
a£x0<x1<...<xn+1£b.
xi=a+i(b-a)/(n+1) would do but,
for reasons that need not concern us here, the Chebyshev Points
xi = (a+b)/2 +(ba)cos((n+i+½)p/(n+1))/2
are a more appropriate choice.
At stage M we find the minimax polynomial pM for f over SM by solving the
n+2 linear equations
pM(xi)-f(xi) = (-1)ieM i=0,1,..n+1.
by Gausssian elimination to find the n+1 coefficinets of pM and the unknown
deviation eM.
We then find one or more values {xj: j=0,1,..k} (where k ³ 0) that maximise
|pM(x)-f(x)| over [a,b].
If these points equal some of the points in Sm then we
have |eM|=h and pM is minimax over [a,b].
Otherwise we replace one or more of
the values in SM with these new values such that the new set SM+1 still
satisfies a£x0<x1<...<xn+1£b
and that xi is only replaced by xj if
pM(xi)-f(xi)
and pM(xj)-f(xj)
have the same sign.
Having obtained SM+1 we then find pM+1, use that to obtain SM+2, and so on.
Each stage will bring our set of values closer to the "worst possible" set
of values {h0,h1,...,hn+1} and at every stage we will have
|eM+1|³|eM|.
The algorithm terminates at stage N when
|eN+1| - |eN|<d for some small d since we
will then have
Maxi=0,n+1|pN(xi)-f(xi)| < h+d.
Minimax Rational Functions
If we allow a division in our approximation function then we have the full
range of rational functions available to us. A rational function over [a,b] is
one of the form r(x) = p(x)/q(x) where p(x) is a polynomial of degree m and q(x)
is a polynomial of degree n satisfying q(x)>0 for
x Î [a,b].
The exchange algorithm described above may be used to find minimax rational approximation functions using sets of m+n+2 points although it is no longer the case that at stage M we can find the minimax polynomial rM for f over SM by solving a set of linear equations; instead we are faced with:
pM(xi) = [f(xi)-(-1)ieM]qM(xi) i=0,1,.,m+n+1.
Though it is possible to solve these directly using eigenvalues (see Approximation Theory and Methods) we instead adobt the tools of linear programming such as the Primal Simplex Algorithm.
We choose an estimation h for the deviation eM
and look for a rational function rM satisfying
|f(xi)-rM(xi)|£h for i=0,1,.,m+n+1. That is, we look for
polynomials p and q satisfying
p(xi) - f(xi)q(xi)  £ hq(xi)
i=0,1,..m+n+1
f(xi)q(xi) - p(xi)  £ hq(xi)
i=0,1,..m+n+1
q(xi) > 0
i=0,1,..m+n+1.
If we can find such p and q then we know that h³ eM. If such p and q can be
shown not to exist then we know that h<eM.
This enables us to find eM to any
required accuracy (using a binary search technique) and thus pM and qM.
Finding coefficients of p and q to satisfy these equalities is a linear
programming problem. However we can achieve better performance in terms of h's
approach to eM by using the
differential correction algorithm.
We start stage M
with h=eM-1 and look for polynomials p and q and
scalar d³0 satisfying the m+n+3 equations:
p(xi) - f(xi)q(xi) £; hq(xi) - dqM-1(xi)
i=0,1,..m+n+1
f(xi)q(xi) - p(xi) £; hq(xi) - dqM-1(xi)
i=0,1,..m+n+1
q(xk) = 1
for a fixed kÎ{0,1,..,n+m+1}.
If such can be found with d=0 then p=pM, q=qM, h=eM.
Otherwise d>0 and we
replace h with Maxi=0,..m+n+1|f(xi)-p(xi)/q(xi)| and repeat. h will tend to eM from
above.
For stage 1 we define e0 to be anything known to be larger than the optimal
minimax rational approximation error over [a,b], and q0(x) to be identically
equal to 1.
It only remains to reformulate the above inequalities into the standard
form usual for linear programming techniques. An appropriate reformulation is
obtained by writing the coefficient pr of xr in p as
pr0-pr1 (r=0,1,..,m) and the
coefficient qs of xs in q as
qs0-qs1 (s=0,1,..,n) and also introducing slack
variables yi and zi for i=0,1,..,m+n+1. We then have the linear programming
problem:
Minimise åj=0m+n+1 (yj+zj)
subject to the 2(m+n)+3 constraints
år=0m (pr0-pr1)xir
- (f(xi)+h)[ås=0n (qs0-qs1)xis]
+ yi = - dqM-1(xi)
i=0,1,..m+n+1
-år=0m (pr0-pr1)xir
+ (f(xi)-h)[ås=0=n(qs0-qs1)xis ]
+ zi= - dqM-1(xi)
i=0,1,..m+n+1
åj=0n (qj0-qj1)xkj
= 1
over pr0,pr1,qs0,qs1,yi,zi,
d ³ 0.
Basic Math Functions
Square Root function
Square roots are often required, occasionally in the form Ö(1+x).
In the presence of log and antilog functions one can exploit the trivial result
Öx = antilog( log(x)/2).
Bit-shifting Methods
It is reasonably straightforward to evaluate square roots explicitly in time proportional to the accuracy required.
In the following discussion we will assume that the problem is to evaluate, to N bit accuracy, the square root of a fractional operand x. We assume x to be a positive value strictly less than one. This means that the result y=Öx will be of the same order no matter how wide x is. If x is four bytes wide we will regard it as being in (232)ths, if it is two bytes wide we will regard it as being in (2-16)ths. In both cases, if N=8 the result will be in (2-8)ths.
This assumption is not limiting. If we wish x to represent, say, a 32-bit integer then y=Ö(x/232) need only be multiplied by 216 to give the correct result. Either algorithm with N=16 will return y in (2-16)ths which is effectively the same thing as 216y.
Since Ö(a+b) is not easily expressible in terms of a and b we do not use an
operand-based bit-shifting algorithm. However since f-1(y)=y2 is monotonically
increasing we can use a result-based one.
Suppose at the nth stage the top n bits of yn match those of y=Öx
but the rest of yn is zero (ie. yn =åi=-n-1 y{i}2i)
so that En=x-yn2 is a non-negative error value for yn2
as an underapproximator for x.
y{n+1} = 1
Û (yn+2-(n+1))2 £ x
Û yn2 + 2-nyn + 2-2(n+1) £ x
Û 2-nyn + 2-2(n+1) £ En , with the comparison one of
positive (unsigned) integers.
An algorithm exploiting this is
An alternative approach is to operate on the bits of x in pairs by
shifting them into a comparison field. Each of these double shifts gives one
bit of result data.
At each stage we have yn containing an integral underestimation of the
square root of 4nx. We maintain an error value En=4nx-yn2.
How is En+1 related to En?
If yn+1=2yn then En+1 = 4n+1x-4yn2 = 4En
If yn+1=2yn +1 then En+1 = 4n+1x-4yn2-4yn-1 = 4En - 4yn -1.
So we set yn+1=2yn+1+1 Û 4En ³ 4yn + 1
ie. Û En ³ yn + 1/4.
The algorithm is therefore:
Iterative Methods
It is possible to obtain square roots using an iterative method involving
divisions. Consider the iteration xn+1 = (xn+y/xn)/2.
If en = xn-Öy then we have
en+1 = en2/2xn.
So if xn>Öy we must have 0< en+1 < en2/2Öy;
writing gn=en/Öy for the
fractional error in xn, we have
gn+1 £ gn2/2 Þ gn £ g02n/22n-1.
Thus provided x0 differs from Öy by less than 141%
(g0<Ö2)
then xn®Öy (from above) extremely rapidly.
If, for example, x0 is within 6% of Öy so that the
most signigicant four bits of x0 are accurate, then x1 will be
accurate to nine bits, x2 to nineteen bits, x3 to thirty nine bits, x4 to seventy nine bits, and
so on (though not beyond the accuracy to which the divisions are evaluated).
Since this iteration converges to the square root so fast, we rarely need
to do more than three iterations and two will often suffice if we start from a
resonable estimate for Öy. If Öy is close to one (eg. when normalising an
approximately normal vector), then we set x0=1 and get our first iteration
x1=(1+y)/2 almost for free. To find the exact magnitude of a vector whose
magnitude is not necessarily close to unity we can sum the squares of the
components and then use a distance measure (see Distance Measures) to start the
iteration.
Similar iterations exist to find cube and higher roots. See Inverse Functions
Via Iteration.
Approximation Functions
We can reduce the domain from [0,1] to [0,½] by dividing
any x>½ by four and the doubling its square root.
Indeed, we can reduce to [0,2-m] by dividing by 2m
and then multiplying the squareroot by 22m although we loose
the LeastSig m input bits by doing so.
[Include some rational poly fits here ]
Power Function
Square roots are the b=½ special case of the general power function xb .
In the presence of log and antilog functions one can exploit the trivial result
xb= antilog(b log(x)) .
Of particular interest is xb for x Î [0,1] and a "large" positive b (around twelve) eg.
the ( cosq)b arising in Phong Shading. The result is then also in [0,1].
For known integer b>0, xb can be evaluated in less than b multiplications in a "hardwired" manner.
x18 , for example, requires 4-Mult as (((x2)2)2)(x2) [
(x2) being evaluated once and used twice ].
Differentiating power functions
If ' denotes ¶/¶x then we have (ax)' = ln(a)ax and (xa)' = axa-1 for a¹0.
If ¦(x)=g(x)_h(x) then
d-1(¦(x+d)-¦(x)) =
d-1(g(x+d)_h(x+d)-g(x+d)_h(x)
+ g(x+d)_h(x)-g(x)_h(x))
»
ln(g(x+d))g(x+d)_h(x)_h'(x) +
_h(x)g(x)_h(x)-1g'(x)
»
ln(g(x))g(x)_h(x))h'(x) + _h(x)g(x)_h(x)-1g'(x)
= ( ln(g(x))g(x)_h'(x) + _h(x)g'(x) )g(x)_h(x)-1
Approximation Functions
If the result is required only to within a fairly course error tolerance e
[ Typically e=2-8 for an 8-bit shade value
]
then a rough-and-ready piecewise approximator
is given by
¦(x) º 0 for 0£ x <d ;
(Ax+B)m for d£ x £1
where db £ e
(typically d = e(b-1) to give equality) ;
A=(1-d)-1 ; B = 1-A = -d(1__delta)-1 .
This satisfies ¦(d)=0 ; ¦(1)=1 for any m and it remains only
to chose an integer m smaller than b (ideally a power of two) giving the required accuracy.
For a given m, the optimal (minimax) values for aand Bare given by the minimax linear approximator
to x(b/m) over x Î [d,1] though this gives a discontinuity
at x=d and may deviate slightly from [0,1].
In general, a "power bulging" function curve ¦(x) over a given range
can be approximated by computing the mth power of an approximator to
the "flatter" curve ¦(x)(m-1) for a prechosen "friendly" m.
Sine and Cosine functions
Look Up Tables
We will assume we need a q-bit sine (that is a result expressed in
(2q)ths) of a p-bit angle (that is an angle expressed in (2-p)ths of a
complete revolution (2p radians or 360o)),ie. we need
S(i) = 2qsin(2(p1)p)
where iÎZ2p-1; and that we have room for a lookup table containing N=2n m-bit
entries.
We take X* to be the first quarter of X as before.
Suitable base functions
are s(i)=0 and s(i)=2q+2-pi
X=Z2p-1 ; X*=Z2p-2-1
S(i) = 2qsin(2(p1)p) ; s(i) = 0 , 2q+2-pi
Y=Z2q-1.
s11(i) | = | i |   | if 0<i<2p-2 |
= | 2p-1-i | if 2p-2<i<2p-1 | ||
= | i-2p-1 | if 2p-1<i<3*2p-1 | ||
= | 2p-i | if 3*2p-2<i<2p | ||
s12(i) | = | 2n-p+2i. |
s2(y,i) | = | 2q-my |   | if 0 £ i < 2P-1 |
= | -2q-my | if 2p-1 £ i < 2P. |
s2(y,i) | = | 2q-m-2y + 2q-p+2s11(i) |   | if 0 £ i < 2P-1 |
= | -2q-m-2y - 2q-p+2s11(i) | if 2p-1 £ i < 2P. |
Our thinness errors come from an absolute error of 2-1 in yÎZ2m1 and so translate to an absolute error in our final result of 2q-m for s(i)=0 and 2q-m-3 for s(i)=2q+2-pi. If we are using an 8-bit table to give a 16-bit result (m=8, q=16) for example, then our absolute thinness error is 25 for the linear base function, only the top eleven bits of our result are accurate. To eliminate thinness errors we thus need m ³ q-3.
Our sparseness errors derive from any difference between p2 and n. If we
wish to eliminate sparseness errors we must either have n³p2 or use some form
of interpolation. Rather than using naive linear interpolation we note that
S'(x) » 2qp+1pcos(2(p1)px) = 2p1pS(2p-2-x)
so that
S(j+k) » S(j) + (21-ppk) S(2p-2-i).
Using the contraction function s12(i) = 2n-p+2i results in the loss of p-n-2
bits so the k we are interested in will be in Z2p-n-2-1.
We therefore construct a corrective look up table C to approximate the function
c(k)=21-ppk.
We will assume this table to have 2p-n-2 entries (giving a contraction function c1(k)=k)
entries of width 2r (giving a table C[k]=[2r+n-ppk|,
and expansion function
c2(y,k)=21-r-ny if we use a zero base function).
For any x Î X we can therefore define our approximation S of S to be
S(i) = s2(S[j],i)+ 21-r-nC[k]s2(S[2n-j],2p-i)
where j=i DIV 2p-n-2 , k=i MOD 2p-n-2.
Example
If we have n=r=8 and p=q=m=16 then s12(i) = 2-6i.
If we use a zero base function then:
S[] contains 28 16-bit enties: S[i] = [216sin(29pi)|.
C[] contains 26 8-bit entries: C[k] = kp.
S(i) = S[j] + 2-15C[k]*S[28-j]
where j=i DIV 26,k=i MOD 26 for iÎðZ214-1.
If we use a linear base function then:
S[] contains 28 16-bit enties: S[i] = [218(sin(29pi)29pi)|.
C[] contains 26 8-bit entries: C[k] = kp.
S(i) = 2-2S[j] + 28j + 2-15C[k]*(2-2S[28-j]+216-28j)
where j=i DIV 26, k=i MOD 26 for iÎðZ214-1.
The multiplication is of the form 2-15(8-bit)*(16-bit) and we only need the top nine bits of the 24-bit result.
For a zero base function this system gives values for S(i) within ±2 of the desired 16-bit scaled sine. If we only evaluate the multiplication to eight bits then the errors become ±3. Raising r to 16 reduces the errors to ±1 except for just 14 (out of 214) +2s. Taking n=9, r=16 and a 9-bit multiply gives errors of ±1 with only 75 +1s. Increasing n to 11 (a 4 Kbyte lookup table) ensures that the value is only ever out by -1.
Bit-Shifting Methods
The identity sin(a+b) = sinasinb + cosacosb allows an operand-based method (if we tabulate sin(2i) and cos(2i)) but this would be extremely extremely costly requiring two multiplications per non-zero operand bit.
The monotonicity of sin-1(y) allows a result-based method but only if sin-1(y) is available.
Approximation Functions
The following tables give minimax polynomial approximations to sin(px/2),
that is, we assume x=1 corresponds to a right angle.
e1 is the maximal negative error in the
approximation, e2 the maximal gives the maximal positve error.
e3 and e4 give the maximal negative and positive errors if all
multiplications are only performed to four heximal places (sixteen bit fractions).
Approximations to sin(px/2) over [0,1].
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1(-ve) | e2 | e3 | e4(+ve) |
-.038CA7 | +1.D87297 | -.D1594A | -.038CA7 | +.038CA7 | -.038C28 | +.038E13 | ||||
-.005998 | +1.9C47AC | -.2CA6C2 | -.6FA0EA | -.005998 | +.005998 | -.00595D | +.005AF5 | |||
+.00070F | +1.90AF50 | +.0C0D72 | -.C915CE | +.2C4AEE | -.00070F | +.00070F | -.00070F | +.0009E9 | ||
+.000077 | +1.91FFC8 | +.0162BE | -.AAE54A | +.0980D0 | +.0E01F4 | -.000077 | +.000077 | -.000077 | +.0003FA | |
-.000007 | +1.922253 | -.002AE6 | -.A45511 | -.030FD3 | +.191CAC | -.03AF27 | -.000007 | +.000007 | -.000001 | +.000459 |
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1 | e2 | e3 | e4 |
+.03D13F | +1.6A09E6 | -.03D13F | +.03D13F | -.03D13F | +.03D2BE | |||||
-.009760 | +1.A7DEC3 | -.76EEB9 | -.009760 | +.009760 | -.009730 | +.0098CA | ||||
-.00031C | +1.92D222 | -.060D8B | -.9705DA | -.00031C | +.00031C | -.00030C | +.0004B1 | |||
+.00004B | +1.92022C | +.01DD56 | -.AFF0E3 | +.18A0E6 | -.00004B | +.00004B | -.00004B | +.000209 | ||
+.000001 | +1.921F29 | +.000C12 | -.A5BD88 | +.0145CD | +.12B5CB | -.000001 | +.000001 | -.000001 | +.0001B8 | |
-.000000 | +1.921FC2 | -.0001B0 | -.A5495E | -.007687 | +.15C668 | -.020835 | -.000000 | +.000000 | 0 | +.0001DD |
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1 | e2 | e3 | e4 |
+.730C08 | +.95F61A | -.090222 | +.090222 | -.090215 | +.0902C9 | |||||
-.2646B8 | +2.45AA3C | -1.1F23F8 | -.003F8C | +.003F8C | -.003F48 | +.004042 | ||||
-.0E8F7A | +1.E064E7 | -.934EDE | -.3E8E00 | -.000772 | +.000772 | -.0006E6 | +.00097A | |||
+.023D41 | +1.816DEE | +.31ECE6 | -.F10DB9 | +.3B7586 | -.00001F | +.00001F | 0 | +.000327 | ||
+.00AB6F | +1.8C8EBF | +.12D800 | -.C64B6C | +.1E795B | +.07BFE6 | -.000002 | +.000002 | 0 | +.0003D9 | |
-.0010BA | +1.92CAD4 | -.02ECBD | -.9E3EEA | -.0A7B74 | +.1DD4D2 | -.04E7D0 | -.000000 | +.000000 | 0 | +.0003DC |
Bit-Shifting Methods
The monotonicity of tan(y) makes result-based algorithms viable in the presense of a routine to evaluate tan.
Iterative Methods
The iteration xn+1 = ((2-sin2xn)xn + (1+cos2xn)y)/2 will approach tan-1(y) extremely rapidly. See Inverse Functions Via Iteration.
Approximation Functions
It is possible to find polynomials that approximate tan-1(x) fairly well over xÎ[0,1], or indeed over xÎ[-1,1]. Unfortunately we need a polynomial of the form C1x+C3x3+C5x5+C7x7 (requiring five multiplications) merely for an accuracy of ±&0.0006 (achieved by C1=.9992150,C3=-.3211819,C5=.1462766,C7=-.0389929).
The simplest and most efficient approach is to have two or more different
polynomials for different portions of the required domain but alternatively we
can reduce the required domain to [0,Ö2-1]»[0,&0.6A09] using the identity
tan-1(x) = 2 tan-1((Ö(1+x2-1)/x)
which of course we need only use when x>Ö2-1.
Alternatively we use identities of the form
tan-1(x) = a + tan-1((x-tana)/(1+xtana))
for various a Î (0,1).
Taking a= tan-1(½) » &0.76B2 and using
tan-1(x) = tan-1(½) + tan-1((2x-1)/(2+x))
on any x>½ will reduce the domain to [0,½].
If we then take a = tan-1(1/4) » &0.3EB7 and use
tan-1(x) = tan-1(1/4) + tan-1((4x-1)/(4+x))
on any x>½ we will further reduce the domain to [0,1/4].
While it is possible to continue this process to obtain an arbitarily
small domain over which to use a simple approximation function (a linear
polynomial for example) or a short look up table, it is probably more
computatonally efficient to use a higher degree polynomial over a larger range
since whereas each additional degree of a polynomial adds one multiplication,
each domain reduction may require one division.
The tables below gives good approximation polynomials for tan-1(x) over the ranges [-1,1], [0,1], [0,Ö2-1], [0,½],[½,1],[0,1/4],[1/4,½],[½,3/4], and [3/4,1]. Coefficients and errors are all given in heximal.
Approximations to tan-1(x) over [-1,1].
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1 | e2 | e3 | e4 |
0 | +1 | 0 | -1/3 | -.1E6530 | +.1E6530 | -.1E65DB | +.1E65DB | |||
0 | +1 | 0 | -1/3 | 0 | +1/5 | -.14CE03 | +.14CE03 | -.14CD25 | +.14CD25 | |
0 | +.F8EED5 | 0 | -.312382 | -.014488 | +.014488 | -.0144DB | +.0146DB | |||
0 | +.FECFC7 | 0 | -.49E799 | 0 | +.144F8F | -.0027E2 | +.0027E2 | -.0028CA | +.002A69 |
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1 | e2 | e3 | e4 |
+.091A49 | +.C90FDB | -.091A49 | +.091A49 | -.091A49 | +.091B60 | |||||
-.00A0B5 | +1.10F8B3 | -.46A76D | -.00A0B5 | +.00A0B5 | -.00A067 | +.00A1F4 | ||||
-.004868 | +1.096170 | -.2F9CC5 | -.10B4D0 | -.004868 | +.004868 | -.004857 | +.00498E | |||
-.0006F6 | +1.00FB4D | -.04AE78 | -.568A15 | +.235B05 | -.0006F6 | +.0006F6 | -.0006D0 | +.000971 | ||
+.00015F | +0.FF8D86 | +.060E9D | -.737E60 | +.439E18 | -.0CAC00 | -.00015F | +.00015F | -.00015F | +.0004DA | |
+.00006B | +0.FFD785 | +.0279D9 | -.637579 | +.237551 | +.10D9B4 | -.0A1BE6 | -.00006B | +.00006B | -.00006B | +.000493 |
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1 | e2 | e3 | e4 |
+.01B5FA | +.ED6338 | -.01B5FA | +.01B5FA | -.01B5FA | +.01B6D3 | |||||
-.003925 | +1.08E16D | -.35333F | -.003925 | +.003925 | -.0038FA | +.003A47 | ||||
-.000580 | +1.0147E9 | -.0B97C8 | -.386334 | -.000580 | +.000580 | -.000572 | +.0006B5 | |||
+.000041 | +.FFE25C | +.022CD9 | -.641216 | +.2B6F86 | -.000041 | +.000041 | -.000041 | +.000180 | ||
+.000010 | +.FFF6E6 | +.00D1F3 | -.5C29DE | +.18C5B2 | +.0F519B | -.000010 | +.000010 | -.000010 | +.000192 | |
+.000000 | +.FFFFF5 | -.0001C0 | -.5518BD | -.02860C | +.3FE832 | -.206E10 | -.000000 | +.000000 | -.000000 | +.000198 |
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1 | e2 | e3 | e4 |
-.0243E7 | +1.18C9E2 | -.4D8472 | -.000E51 | +.000E51 | -.000DFD | +.000FC2 | ||||
-.07BC4A | +1.2FCF56 | -.6CD820 | +.0DD30D | -.0001E1 | +.0001E1 | -.000192 | +.0003AF | |||
-.03A604 | +1.188DAC | -.3C36BC | -.1E7BC7 | +.0EDAF8 | -.000042 | +.000042 | 0 | +.0003C2 | ||
-.004AC5 | +1.00C75F | +.0620D5 | -.79AA51 | +.4C9B9E | -.107EE6 | -.000003 | +.000003 | 0 | +.000378 | |
+.00B718 | +.F83E19 | +.23E7B1 | -.B065E7 | +.848836 | -.2E9F4F | +.06AFFA | -.000000 | +.000000 | 0 | +.00051A |
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1 | e2 | e3 | e4 |
+.0106BD | +.F2B406 | -.0106BD | +.0106BD | -.0106BD | +.0107AB | |||||
-.0024B2 | +1.06ABCD | -.2E8926 | -.0024B2 | +.0024B2 | -.00249D | +.0025D7 | ||||
-.000268 | +1.00AAE7 | -.0731BF | -.40058E | -.000268 | +.000268 | -.000257 | +.0003AF | |||
+.000026 | +.FFEC79 | +.019FEC | -.61B5CC | +.285BA4 | -.000026 | +.000026 | -.000026 | +.00015B | ||
+.000005 | +.FFFC70 | +.0061C5 | -.591C11 | +.1031BB | +.17AC22 | -.000005 | +.000005 | -.000005 | +.000161 | |
+.000000 | +1.00001C | -.000519 | -.54FC8A | -.02F392 | +.40AD59 | -.20F2B8 | -.000000 | +.000000 | 0 | +.000166 |
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1 | e2 | e3 | e4 |
+.003E90 | +.FADBB0 | -.003E90 | +.003E90 | -.003E90 | +.003F94 | |||||
-.00099D | +1.02C5A5 | -.1E7437 | -.00099D | +.00099D | -.000995 | +.000AA3 | ||||
-.00003A | +1.001A58 | -.01CD2D | -.4CB5D4 | -.00003A | +.00003A | -.000030 | +.000148 | |||
+.000005 | +.FFFC08 | +.00830D | -.5B4B77 | +.1CDD39 | -.000005 | +.000005 | -.000005 | +.00012C | ||
+.000000 | +.FFFFC6 | +.000A22 | -.55F79F | +.0460BD | +.27587F | -.000000 | +.000000 | -.000000 | +.000129 | |
+.000000 | +1.000004 | -.000121 | -.553915 | -.0153B5 | +.3B86E1 | -.1ACC2C | -.000000 | +.000000 | 0 | +.000129 |
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1 | e2 | e3 | e4 |
+.074E6A | +.DFEAC1 | -.00922F | +.00922F | -.0091FF | +.009355 | |||||
-.026B6D | +1.16AEA2 | -.48D849 | -.000435 | +.000435 | -.00041C | +.000581 | ||||
-.00D40D | +1.091C8A | -.236D49 | -.216D4F | -.000060 | +.000060 | -.00003C | +.0001C2 | |||
+.00045C | +.FF8B21 | +.045CFE | -.698805 | +.30171E | -.000001 | +.000001 | 0 | +.000164 | ||
+.00113A | +.FED65A | +.084545 | -.742E60 | +.3E6582 | -.079517 | -.000000 | +.000000 | 0 | +.000173 | |
+.000315 | +.FFC6C9 | +.01B343 | -.5BF600 | +.0CC46B | +.2E07EF | -.17DD12 | -.000000 | +.000000 | 0 | +.000171 |
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1 | e2 | e3 | e4 |
+.1B4083 | +.B82B84 | -.00A4A9 | +.00A4A9 | -.00A467 | +.00A59A | |||||
-.04377A | +1.1EF1EB | -.523CBC | -.0000B1 | +.0000B1 | -.00007C | +.00020B | ||||
-.056DBA | +1.24D1E0 | -.5BA27A | +.04F684 | -.000034 | +.000034 | 0 | +.0001F1 | |||
-.019D91 | +1.0BE19D | -.1EE41C | -.3C58F8 | +.1A2825 | -.000003 | +.000003 | 0 | +.0001FC | ||
+.004576 | +.FC78F8 | +.132F0E | -.8D4BD5 | +.5B40F6 | -.14D5C5 | -.000000 | +.000000 | 0 | +.000244 | |
+.00719F | +.FACA68 | +.19FD08 | -.9BEB52 | +.6CDBE0 | -.201827 | +.02FD16 | -.000000 | +.000000 | 0 | +.000242 |
x0 | x1 | x2 | x3 | x4 | x5 | x6 | e1 | e2 | e3 | e4 |
+.3851F9 | +.914D76 | -.008F94 | +.008F94 | -.008F3C | +.009082 | |||||
+.020481 | +1.0EC7EC | -.47BF10 | -.00027D | +.00027D | -.000197 | +.0004A6 | ||||
-.0B1716 | +1.3C2B04 | -.7BDC68 | +.13D848 | -.00000B | +.00000B | 0 | +.000273 | |||
-.07E796 | +1.2D6C09 | -.6259FE | +.004C9C | +.0598CC | -.000001 | +.000001 | 0 | +.0003AA | ||
-.0257DE | +1.0D4B72 | -.184E59 | -.54CF00 | +.3661EE | -.0B284E | -.000000 | +.000000 | 0 | +.000410 | |
+.0106A3 | +0.F5E41C | +.2B4CCC | -.BCC31E | +.901EC3 | -.3464A4 | +.07E150 | -.000000 | +.000000 | 0 | +.000450 |
Look Up Tables
The dynamic range of loga(x) for x=1,2,3,...N is usually small for
reasonable N. In order to use logs efficiently it is often necessary to scale
them up.
For example, if a=2 and N=255, loga(N) is less than
8 so storing [log2(x)| only gives three bits of data and a corresponding fractional error of about
12%. If L(x) approximates 212log2(x) so that
L[255] » 212*7.994=&7FE7, 15 bits are used profitably.
We define L(x) = Kloga(x) for x Î X and some convenient K.
A convenient representative portion of Â+ wrt L is an interval of the form
[1,at) for some postive (perhaps integral) t.
If x Î Â+ is expressed in the form x = am(aq+r) where m is an integer
(the exponent) and r (the mantissa) Î [0,at+q-aq)
(Standard binary floating point format, for example, has a=2,t=1,q=-1) then,
writing s for ra-q Î [0,at-1),
L(x) = L(am(aq+r)) = L(am+q(1+s)) = K(m+q) + L(1+s).
A contraction function can thus be based on l11(am(aq+r)) = 1+raq and an
expansion function on l22(y,am(aq+r)) = y + K(m+q).
We thus have:
X = R+ ; X*= [1,at) ;
L(x) = Kloga(x) ; l(x)= 0 , Kt(x-1)/(at-1) ,or Ktxat
l11(am(aq+r)) = 1+raq
l22(y,am(aq+r)) = y + K(m+q).
Assuming that x=am(aq+r), combining l11(x) = 1+raq with l12(1+s)=[NLs/(at-1)]
gives a contraction function l1(x)=NLraq/(at-1).
If we choose l(x)=0 and store L[i] = [Kloga(1+i(at1)/NL)| iÎZN
(where K is chosen so that Kt is as close to (but not greater than) ML) then,
setting
_2B l2(y1,y2,x) = y1 + K(m+q),
we obtain
l2(L[l1(x)].y.x) | = | L[l1(x)] + K(m+q) |
= | L[[Nra-q(at-1))|] + K(m+q) | |
= | Kloga(1+(NLra-q/(at-1)))(at-1)/NL) + K(m+q) | |
= | Kloga(1+ra-q) + K(m+q) | |
= | Kloga((1+ra-q)am+q) | |
= | Kloga(am(aq+r)) | |
= | Kloga(x) |
Look Up Tables
Any interval in  is a representative portion of  wrt A. We will
consider an interval [d,d+w) and suppose that a given xÎÂ is readily
expressed in the form mS+r where mÎZ, rÎ[d,d+w) and S is some constant
(typically d=0,w=S, so that m=x DIV S, r=x MOD S and we have
A(x) = A(mS+r) = amS/KA(r)).
Thus a contraction function can be based on a11(mS+r)=r and an expansion
function on a22(y,mS+r)=amS/Ky, this latter only being really feasible if mS/K
is an integer.
Antilog generates the same non-uniform variation problem as log but with the
low spread of values occurring at the beginning of the range. This is again
only a problem if a constant base function is used and can be avoided even
then by using d=0,w=K giving a dynamic range of [1,a).
The obvious constant base function for A over [d,d+w] is
a(x)=ad/K.
Typically we would take d=0,w=S=1 giving:
X = R+ ; X* = [0,1)
A(x) = ax/K ; a(x) = 1 , 1+x(a1/K-1) , xa1/K
a11(z+x) = x (zÎZ,xÎ[0,1))
a22(y,z+x) = az/Ky.
Thinness Errors
The maximum deviation of A(x)=ax/K from its linear approximation
a(x)=ad/K(1+x(aw/K-1)/w) which overestimates
A(x) over (d,d+w) occurs at x0=A'-1((A(d+w)-A(d))/w).
Now A'(x)=ln(a)ax/K/K
so A‘-1(y) = Kloga(Ky/ln(a)).
Thus x0 | = | Kloga(Kad/K(aw/K-1)/w/ln(a)) |
= | d+Kloga(K(aw/K-1)/w/ln(a)) | |
= | d+Kloga(m) |
A(x0)-g(x0) | = | ad/K(m-1-Kloga(m)/w(aw/k-1)) |
= | ad/K(m-1-mln(a)loga(m)) | |
= | ad/K(m1-mln(m)). |
Sparseness Errors
Setting b=w/NA and a12(r)=[(r-d)/b] will generate fractional sparseness
errors of approximately e=ln(a)b/K. For a=2 this is approximately 0.69b/K.
One way to reduce sparseness errors is to note that A(x+d) = A(d)A(x) and
use a corrective LUT of the form C(y)=A(y/b).
Alternatively we can use naive linear interpolation.
Since A'(x) = ln(a)2ax/k/K2
has its greatest absolute value over [a,b] at x=b and is there
(ln(a)/K)2A(b),
an estimation of the theoretical fractional sparseness error after
using naive linear interpolation is
e'=(bln(a)/K)2/8=e2/8.
For a=2 this is approximately 0.06(b/K)2.
Linear interpolation, however, involves multiplication so we can use L
and A again. Suppose c=d+yb for some yÎZ and xÎ(c,c+b). Let z=(xc)/b which
we know lies inside (0,1). (if d=0 we have y=[x/b], z=(x/b)-y, c=yb).
g(x) | = | A(c) + z(A(c+b)-A(c)) |
= | A(c) + zA(c)(A(b)-1) | |
= | A(c) + alogz+c/K+ log(A(b)-1) | |
= | A(c) + a(Klogz+c+K log(A(b)-1))/K | |
= | A(c) + A(Kloga(z)+c+Kloga(A(b)-1)) |
g(x) | = | A(c) + A(c +l21(L[l12(Rz)])+G) |
» | a2(A[y],c) + a2(A[[y+l21(L[l12(Rz)])/b+G/b|],c+l21(L[l12(Rz)])+G)) |
An example should make this clearer. With a=2,K=212,d=0,w=S=214 we have a11(x)=x MOD 214, a22(y,x)=24(x DIV 2^10)y. If we have NA=28 we need to complete our contraction function with a12(x)=2-6 and are then interested in interpolating for values of the form 26i+j where jÎZ261. If we use a constant base function a(x)=20=1 then replacing c with 26i, b with 26, and z with 2-6j (Rz=j) we have
g(26i+j) | = | A(26i) + A(26i + 212log2(j) + G) |
= | A(26i) + A(26i + L(j) + G) |
Let us consider how approximating g in this manner effects our theoretical
sparseness eror of e'=e2/8.
A(c) will not be prone to sparseness errors since c divides b. However
both it and the corrective term A(Kloga(z)+c+Kloga(A(b)-1)) are subject to
thinness errors in A so we have effectively doubled the thinness errors. The
corrective term is bounded in magnitude by A(c+b)-A(c). But this is just the
original absolute sparseness error bound for A over [c,c+b) and equals eA(c).
If the errors in L are small compared to the sparseness errors in A we can
take all the errors in the corrective term as deriving from A and we have a
maximal sparseness error of
e(Max corrective term)=e(eA(c))=e2A(c).
Our fractional sparseness error is then e2A(c)/A(x) < e2 since A(c)<A(x).
Adding this to the theorectical error e2/8 derived above gives a final
sparseness error in A of e'=9e2/8.
We have also doubled our thinness errors.
(The gradient of A is greatest at the end of the range so our crude absolute
error bound is
A(d+w)-A(d+w-b)=a(d+w)/K-a(d+w-b)/K »
a(d+w)/K(ln(a)b/K).
We can similarly derive a maximal fractional error of ln(a)b/K.).
Reciprocal function
The function f(x)=1/x is often tabulated to provide a way of doing
divisions using multiplications; xy being evaluated as x*(1/y).
Bit-Shifting Methods
Any of the bit-shifting division algorithms for P/Q can be used with P=1.
Gamma Function
The Gamma function is a generalisation of the factorial integer function to real x
defined so that G(x+1)=xG(x) . This enables us
to express any G(x) for x³0 as a multiple of G(x') for x'Î[½,1½] .
G(1)=1 ensures G(k) = (k-1)! for positive integer k.
We also have
G(½) = p½ providing
G(k+½) = 1×3×5×...(2k-1) 2-k p½ .
G(x) can be extended over negative x via G(-x) = -p(x sin(px)G(x))-1
having sign (-1)k+1 between -k and -(k+1) .
G(x) = ò01 dt ( ln(t-1))x-1 = ò0¥ dt tx-1e-t
provide alternate integration-based definitions of G(x).over x>0.
Gauss's defintion is G(x) º Limn ® ¥ n!nx-1(x(x+1)..(x+n-1))-1
for noninteger real x.
G(-(k+d))
= (-1)k+1p((k+d) sin(pd)G(k+d))-1
is,
for k>3 , small for most dÎ(0,1)
ramping rapidly to ±¥ as d®0 and 1.
Dropoff functions have associated blendings,
If g satisfies g(1-t) = 1-g(t) over t Î [0,1] we say the blending is
½-symmetric and we have g(½)=½.
If g is ½-symmetric then g'(1-t) = g'(t). If g is both ½- and 0-symmetric we have
g(t-1) = 1-g(t)
; g'(t-1) = -g'(t)
; g"(t-1) = -g"(t).
Dropoff Functions
A dropoff function of order N is a function g:[-1,1]®[-1,1]
satisfying
g(0)=1 . g(1)=0 , g(i)(0)=g(i)(1)=0
" 1 £ i £ N
where (i) denotes the ith derivative.
Though defined over [-1,1], we typicaly only require evaluation over [0,1].
If we have defined ¦(x)=g(x) for x £ a, h(x) for x ³ b
where a < b then the blended function
¦(x)=
g((x-a)/(b-a))g(x)
+ g((b-x)/(b-a))h(x)
for (a £ x £ b)
ensures N-order continuity of ¦ over [a,b].
If g satisfies g(-t) = g(t) over t Î [-1,1] we say the blending is
0-symmetric. 0-symmetric dropoff functions are also known as splines.
½-symmetrising
We can create a ½-symmetric droppof function from any g(t) defined over [0,½]
with g(0)=1 ; g(½)=½ ; g'(0)=0 ; g"(½)=0
by defining g(t)=g(|t|) for t<0 ;
g(t) = 1-g(1-t) for ½<t£1 ;
and g(t)=0 for t>1. We will refer to this as a ½-symmetrised dropoff function.
The g"(½)=0 condition ensures continuity in g" at ½ and we have
g'(t) = g'(1-t) and g"(t) = -g"(1-t) for t>½ .
There will be a kth order discontinuity at t=½ for even k³4 if
g(k)(½)¹0
Linear
g(t)= 1-|t|
Linear blending is ½-symmetric of order 0. It gives first order discontinuities.
g'(t)= -1 " t Î [0,1].
Cubic
g(t)= 1-t2(3-2|t|)
Cubic blending is not ½-symmetric (although g(½)=½)
and is only 0-symmetric if defined as here with a |t|. It has order 1
and cubic blends are consequently often discontinuous in the second derivative as a result.
g'(t) ³ g'(½) = -3/2 " t Î [0,1].
By ½-extending 1-3t2+2t3 we obtain a ½-symmetric order 1
blend.
Quartic
g(t) = 1-8|t|3(1-|t|) over [-½,½] ½-symmetrised over [-1,1]
provides an order 2 ½-symmetric drop off which can be used to constuct blendings with third
order continuity (ie. continuous second derivatives). There is a discontinuity in
g(4)(t) at tau=½.
g'(t) ³ g'(½) = -2 " t Î [0,1] so quartic dropoff is only a third
steeper than cubic dropoff at t=½, though of course it requires one further multiply to evaluate.
|g'(t)|£2 and |g"(t)|£4.
More generally, for integer k³3 we can ½-symmetrise k-degree polynomial
g(t)= 1 - 2k-3tk-1(k - 2t(k-2))
to obtain an order k-2 droppoff function with g'(½) = -½k .
Squared cosine
g(t)=½(1+ cos(pt)) = cos2(pt/2)
Squared cosine blending is 0- and ½-symmetric and has order 1, but with
g(i)(0)=g(i)(1)=0 for all odd i.
g'(t) = -½p sin(pt) so
g'(t) ³ g'(½) = -p/2 " t Î [0,1].
Bias
ga(t) = Biasa(1-t)
where Biasa(t) º
t-log2a.
a=½ gives linear blending.
ga(½) = a.
Bias blending has order 0 since g'(1) = -log2a.
Gain
gb(t) = Gainb(1-t)
where
Gainb(t) º Bias1-b(2t)/2 if t £ ½ ;
1-Bias1-b(2-2t)/2 else.
Gain blending is ½-symmetric. It has order Int((2b +1)/2).
Slope1
Define Slope1d(t) where d > 0 as
equal to
(d+1)/(1+t/d) - d
= d(1-t)/(t+d)
for t £ 1, 0 else.
Slope blending has order 0 since
Slope1d'(0) = -(d+1)/d,
Slope1d'(1) = -d/(d+1).
Of particular interest is
Slope12a-1(t)
= (2a-1)(1-t)/(t+2a-1).
which has gradients -2a/(2a-1)
and -(2a-1)/2a at t=0 and 1 respectively.
For large a these gradients are just below and just above -1.
As a®0 they tend to -¥ and 0.
For a=1 we have Slope11(t) = 2/(1+t) - 1
with gradient -2 at t=0 and -½ at t=1.
[ Insert graph image here ]
Sinc Function
g(t) = (kpt)-1 sin(kpt) for integer k is a 0th order dropoff function because
g'(t) = t-1 ( cos(kpt) - g(t)) has g'(_0)=0 but
g'(1)=(-1)k. It has k zeroes in [0,1].
Dropoff derivatives
g(t) | g'(t) | g"(t) | g"'(t) | g'(½) | |
Linear | 1-|t| | Sign(-t) | 0 | 0 | -1 |
Cubic | 1-3t2+2t3 | 6t(t-1) | 12t-6 | 12 | 3/2 |
Quartic | 1-8t3+8t4 for t£½
8t(1-t)3 for t³½ | -8t2(4(1-t)-1) for t£½
-8(1-t)2(4t-1) for t³½ | -48t(1-2t) for t£½
-48(1-t)(1-2t) for t³½ | -48(1-4t) for t£½
-48(4t-3) for t³½ | -2 |
Polynomial |
1 - 2k-3xk-1(k - 2(k-2)x) for x£½
2k-3(1-x)k-1(1-k + 2(k-2)x) for x³½ | -2k-3kxk-2(k-1 - 2(k-2)x) | -½k | ||
Squared Cosine | ½(1+ cos(pt)) | -(p/2) sin(pt) | -(p2/2) cos(pt) = p2(½-g(t)) | (p3/2) sin(pt) | -½p |
Bias | (1-t)-log2a | (log2a)(1-t)-(1+log2a) | (log2a(1-log2a) (1-t)-(2+log2a) | (log2a) (1-log2a) (2-log2a) (1-t)-(3+log2a) | 2alog2a |
Sinc | (kpt)-1 sin(kpt) | t-1( cos(kpt)-Sinc(t)) | t-1(-kp sin(kpt)-t-1( cos(kpt)-Sinc(t)) -t-2( cos(kpt)-Sinc(t)) = ??? |