There are four basic ways to computaionally evaluate functions:
(i) Using a Look Up Table, possibly with some form of iterpolation (see
Look Up Tables).
(ii) Using a "bit-shifting" algorithm similar to those above for
multiplication and division that "builds" the result.
(iii) Iterative Methods using a sequence that approaches the desired value
(see Reccurance Relations).
(iv) Evaluating an Approximation Function (typically a polynomial) which
is close to the desired function over the required range.
Look Up Tables
Lookup tables ( LUTs) provide fast but often imprecise evaluation of
functions. They can take up a lot of memory but may vastly improove
performance. Designing look up tables and the code to access them is an
exercise in balancing storage and speed against accuracy.
To discuss look up
tables in general we will take as a simple example the problem of evaluating
sin(x) where x is measured in 256^{th}s of a rotation for any x Î Z_{255}.
We note first of all that if we know sin(x) over the range [0,64] then we
can readily calculate sin(x) for any x Î Z_{255}.
If we define a function
s_{11}:Z_{255} ® Z_{64} by
s_{11}(x) | = | x | if 0 < x < 64 | |
= | 128-x | if 64 £ x < 128 | ||
= | x-128 | if 128 £ x < 192 | ||
= | 256-x | if 192 £ x < 256 |
s_{22}(y,x) | = | y | if 0 £ x < 128 | |
= | -y | if 128 £ x < 256 |
Just how accurate is this method of approximating sin(x)?
Suppose we want to calculate sin(5). Since we have only tabulated sin for
even arguments we will actually return sin(4) giving an absolute error of -0.02.
But this assumes that sin(4) is evaluated exactly and this will be subject to an
absolute error of 2^{-9} due to the limited width of the LUT entry.
This latter error is a thinness or quantisation error due to the limited accuracy of the
LUT entries. These are minimised by tabulating a function with as small a
variation as possible, scaled and translated to fully use the given LUT entry
width.
The former is a spareseness error and is due to the limited number of
LUT entries. These can be reduced by tabulating as small a segment of the
desired function as possible and by some form of interpolation on the
tabulated values.
Note however that the linear function s(x) = x/64 lies below and within
0.21052 of sin(x) over [0,64] and if we exploit this by tabulating
S[i] = [2^{9}(sin(2i)-i/32)| for i Î Z_{32} we can reduce
the absolute error in sin(2i) from 2^{-9} to 2^{-11}.
In general, we approximate a desired function F: X ® Y via an Nentry LUT
by F(x) =f_{2}(F[f_{1}(x)],F[h(f_{1}(x))],x)
where f_{1} is a contraction function mapping X ® Z_{N};
f_{2} is an expansion function mapping Z_{M} × Z_{M} × X ® Y;
F[i] denotes the i^{th} element of the table (assumed in Z_{M}).
(This is not truly general since it is possible to use more than two LUT
entries for a given evaluation, eg. fitting a quadratic to three tabled
points.)
In choosing the contraction function f_{1} we first look for a
representative portion of X with respect to F.
This is a subset X^{*} of X such
that if we know F over X^{*} we can readily extend to X
(in the sin example Z_{64} was a representative portion of Z_{255} with respect to sin).
f_{1} is then the composition of two maps:
f_{11}:X ® X^{*} defined so F(x) is easily derived from F(f_{11}(x)) " x Î X.
f_{12}:X^{*} ® Z_{N}.
The expansion function is also the composition of two maps:
f_{21}, maps Z_{M} × Z_{M} × X to Y
and attempts to regain the accuracy lost due to f_{12};
f_{22}, maps Y × X ® Y and attempts to compensate for f_{11}.
The art of LUT definition lies in choosing expansion and contraction
functions that minimise the total error (arising from thinness and sparseness
errors) without requiring excessive calculation or memory.
ThinnessErrors
To minimise the thinness errors we tabulate the deviation of F from a
(usually fairly crude) approximation function f:X ® Y for F over X^{*}
rather than F itslf, preferably scaled up to use the full width of the table entries.
The simplest choice of this base function f is a constant, obvious choices
being either the minimum or the median value taken by F over X^{*}.
We then tabulate
F[i] = [K(F(f_{1}^{-1}(i))-f(f_{1}^{-1}(i))|
" i Î Z_{N}
where K may be chosen so that F[i]_{max} is as large as possible within Z_{M}.
Our sin example used base functions of s(x)=0 (giving an absolute error in
sin(2i) of 2^{-9}), and s(x) = x/64 giving an absolute error in sin(2i) of 2^{-11}.
(occuring at (switching to radians) x=cos^{1}(2/p) = 0.28 giving a difference
of approximately 0.21051).
Any base function used must be catered for in the first part of the
expansion function.
When s(x)=0 we can get away with s_{22}(y_{1},y_{2},x) = 2^{-8}y_{1} but we
must compensate for s(x)=x/64 with
s_{22}(y_{1},y_{2},x) = 2^{-10}y_{1}+2^{-6}s_{1}(x).
A suitable base function over X^{*}=[a,b] is often given by g(x)=f(a)+(x-
a)(f(b)-f(a))/(b-a). The maximal deviation of g from f then occurs at
x=f'^{1}((f(b)-f(a))/(b-a)).
Sparseness Errors
These are usually a greater problem than thinness errors. To halve
thinness areas it suffices to add a single bit to each LUT entry. To halve
sparseness errors one must double the number of entries.
If X^{*} = [a,a+b] and we map it to Z_{N} by
f_{12}(x) = [N(x-a)/b] then the
sparseness errors are less than f(x+b/N)-f(x) where x is that point in [a,a+b]
where f(x) has maximal absolute gradient
- approximately bf'(x)/N for small b/N.
In the sin example our sparseness errors are high. b/N is 2 and x=0 so
our sparseness errors are potentially S(2)-S(0). Since we will only actually
want S(n) for integral n our worst sparseness error will actually be when S(1)
evaluates to S(0)=0 instead of &0.0324.
We can reduce sparseness errors by incorporating linear interpolation
of the LUT values into f_{22}. To do this we need an approximator G(x) for the
gradient f'(x) at out tabulated point. The interpolated value from a base
evaluation f(a) is then given by g(a+x) = f(a) + xG(a). G(a) can be obtained
from a sperate LUT for the derivative function, or obtained from the LUT of
the function itself. One approximator for f'(a) is given by G(a)=(f(a+b)-
f(a))/(ba) where b is the smallest value such that f(a+b) may be evaluated
from the table distinct from f(a). This particular strategum will be refered
to here as naive linear interpolation.
Naive linear interpolation is very easy for s_{12}(x)=[x/2], s(x)=0. We
simply set
s_{22}(y_{1},y_{2},x) | = | 2^{-8}y_{1} | if s_{11}(x) is even | |
= | 2^{-9}(y_{1}+y_{2}) | if s_{11}(x) is odd |
f(a+x)-g(a+x) | » | f(a)+xf '(a)+x^{2}f "(a)/2+_{o}(x^{3})-f(a)-bf '(a)-b^{2}f "(a)/2_{o}(b^{3}) |
= | f'(a)x(x-b)/2 + _{o}(x^{3},b^{3}). |
In our example the error involved in calculating sin(2i+1) from sin(2i)
and sin(2i+2) is -(2p/256)2sin(2i)/2 < 2-15p^{2} < &.001A.
In summary, if our sin LUT is set up as S[i]=[2^{9}(sin(2i)-i/32)| for iÎZ_{32}
and used by
S(x) | = | h(x)(2^{-10}S[s_{1}(x)]+2^{-6}s_{1}(x)) | if x is even | |
= | h(x)(2^{-11}(S[s_{1}(x)]+S[s_{1}(x)+1])+2^{-6}s_{1}(x)+2^{-7}) | if x is odd |
s1(x) | = | [x] | if 0 £ x<64 | |
= | [128-x] | if 64 £ x <128 | ||
= | [x-128] | if 128 £ x <192 | ||
= | [256-x] | if 192 £ x <256 |
Suppose now we can have 65 bytes in the LUT. Should we double the width
of each entry or the number of entries?
The former eliminates all the thinness errors because the width of the
LUT entry equals the required result width. Because of this, another advantage
is that s(x)=0 is just as good as s(x)=x/64.
The latter eliminates all the sparseness errors because the length of
the table then equals the length of X^{*}.
But if we are using linear interpolation the sparseness errors are small
compared to the thinness ones so it is better to "widen" the LUT than
"lengthen" it in this as in many other cases.
Note that this is not the optimal way to code sine LUTs. We can do better
than the G described above by exploiting the fact that the derivative of sine
is cosine. This is described in Sine and Cosine Functions below.
Look up table methods are considered in depth later on. We will here consider how the other three methods may be used to evaluate some of the more common functions needed by games programmers.
Bit-Shifting Methods
"And let your speach be Yea, yea; Nay, nay; and whatsoever is more is of the evil one"
--- Matthew 5:37
We will use the notation x_{{i}} to denote the value (0 or 1) of the bit of
the operand x corresponding to 2^{i}.
Thus x = å_{i=-¥}^{+¥} x_{{i}}2^{i}.
There are two basic sorts of bit shifting algorithm to evaluate y=f(x).
(a) Operand-Based
We build up y working from the bits of x. This approach requires an
identity of the form f(a+b)=g(a,b,f(a),f(b)). A typical algorithm would be:
(i) Set y_{0}=f(0); i=<lowest bit position for x>. s=x_{{i}}2^{i}
(ii) If x_{{i}}=1 then y_{i+1} _{}= g(s,2^{i},y_{i},f(2^{i})),s=s+2^{i} else y_{i+i} = y_{i}.
(iii) i=i+1. Repeat (ii) unless x_{{j}}=0 " j ³ i.
If g is of the form g(f(a),f(b)) then we do not need to maintain s. This approach usually requires a small lookup table for f(2^{i}).
(b) Result-Based
We build up the bits of y exploiting the monotinicity of f^{-1}.
If f^{-1} increases we would use:
(i) Set y_{0}=0; N=<maximal posible bit position for f(x))
(ii) If f^{-1}(y_{n}+2^{N-n}) ³ x then y_{n+1}= y_{n}+2^{N-n} else y_{n+1}= y_{n}
(iii) i=i-1. Repeat (ii) unless i<<lowest desired bit posion for y>
This method is equivalent to a binary search for the value of y solving
¦^{-1}(y)=x.
Iterative Methods
We iterate some sequence or sequences a number of times, where one
sequence tends to the desired result. See the chapter on Recurrence Relations.
Approximation Functions
We evaluate a different function, typically a polynomial, that
approximates the desired function over the desired domain. We may use
different approximation functions over different parts of the domain.
For any function p(x) which is meant to approximate f(x) over a given range [a,b] we can construct an error function e(x)=p(x)-f(x) which we would like to be as small as possible over [a,b].
If p is constrained to be a particular type of function (say, a polynomial of degree n) then we want the function of that type that "minimises" |e(x)| over [a,b] (rather than naively taking the first n terms of the Taylor series, for example). Unfortunately it is not immediately clear what it means to "minimise" a function over a range. The two most common criteria for error minimilisation are minimising E_{¥} = Max_{xÎ[a,b]}|e(x)| (the minimax approximator) and minimising E_{2} = ò_{a}^{b}e(x)^{2} dx (the least squares approximator). The former gives the smallest maximal deviation of p from f, while the latter minimises the "overall" deviation.
We will show below how to find the minimax and least square polynomials of
degree n appproximating a function f over [a,b].
Least Squares Polynomials
If we write c_{i} for the coefficient of x^{i} in our polynomial p(x) then for p to
be a least squares approximator for f we require
dE_{2} / dc_{i} | = | 0 | i Î Z_{n+1} | ||
Û | d/dc_{i}[ ò_{a}^{b}(p(x)-f(x))^{2} dx] | = | 0 | i Î Z_{n+1} | |
Û | ò_{a}^{b}d/dc_{i}[(p(x)-f(x))^{2}] dx | = | 0 | i Î Z_{n+1} | |
Û | ò_{a}^{b}d/dc_{i}[(S_{j=0.n}c_{j}x^{j} - f(x))^{2}] dx | = | 0 | i Î Z_{n+1} | |
Û | ò_{a}^{b}2(S_{j=0.n}c_{j}x^{j} - f(x))d/d(S_{j=0.n}c_{j}x^{j}) dx | = | 0 | i Î Z_{n+1} | |
Û | ò_{a}^{b}2(S_{j=0.n}c_{j}x^{j} - f(x))x^{i} dx | = | 0 | i Î Z_{n+1} | |
Û | ò_{a}^{b}S_{j=0.n}c_{j}x^{i+j} dx | = | ò_{a}^{b}x^{i}f(x) dx | i Î Z_{n+1} | |
Û | S_{j=0.n} ò_{a}^{b}c_{j}x^{i+j} dx | = | ò_{a}^{b}x^{i}f(x) dx | i Î Z_{n+1} | |
Û | S_{j=0.n}(b^{i+j+1} - a^{i+j+1})/(i+j+1) | = | ò_{a}^{b}x^{i}f(x) dx | i Î Z_{n+1} | |
Û | Ac | = | y | i Î Z_{n+1} |
The basic principle is that the error function of a minimax polynomial p
of degree n approximating a function f over [a,b] will achieve its greatest
magnitude h at exactly n+2 different values of x in the range [a,b] and,
further, that if these values are {h_{0},h_{1},...,h_{n+1}} where
a£h_{0}<h_{1}<.....<h_{n+1}£b then
e(h_{i}) = (-1)^{i}g i=0,1,..n+1 where |g|=h.
(The errors alternate in sign).
Suppose we wanted to fit a polynomial of degree n to the n+2 points
(x_{i},f(x_{i})) i=0,1,..n+1.
Any polynomial that minimises Max_{i=0,n+1}|p(x_{i})-f(x_{i})|
satisfies p(x_{i})-f(x_{i}) = (-1)^{i}e i=0,1,..n+1
for some e, and, conversely, any
polynomial of degree n that satisfies this for some e does minimise
Max_{i=0,n+1}|p(x_{i})-f(x_{i})|.
We must also have that |e|£h since if |e|>h then the
minimax polynomial over [a,b] would give a smaller
Max_{i=0,n+1}|p(x_{i})-f(x_{i})|.
If we happened to know what the values h_{0},h_{1},...,h_{n+1} were then we could solve
the n+2 linear equations
p(h_{i})-f(h_{i}) = (-1)^{i}g i=0,1,..n+1
to find g and, more importantly, the n+1 coefficients of the minimax
polynomial p.
Since we do not know these values we choose any set
S_{0} of n+2 values from [a,b].
S_{0}={x_{i} : i=0,1,..n+1}
where
a£x_{0}<x_{1}<...<x_{n}_{+1}£b.
x_{i}=a+i(b-a)/(n+1) would do but,
for reasons that need not concern us here, the Chebyshev Points
x_{i} = (a+b)/2 +(ba)cos((n+i+½)p/(n+1))/2
are a more appropriate choice.
At stage M we find the minimax polynomial p_{M} for f over S_{M} by solving the
n+2 linear equations
p_{M}(x_{i})-f(x_{i}) = (-1)^{i}e_{M} i=0,1,..n+1.
by Gausssian elimination to find the n+1 coefficinets of p_{M} and the unknown
deviation e_{M}.
We then find one or more values {x_{j}: j=0,1,..k} (where k ³ 0) that maximise
|p_{M}(x)-f(x)| over [a,b].
If these points equal some of the points in S_{m} then we
have |e_{M}|=h and p_{M} is minimax over [a,b].
Otherwise we replace one or more of
the values in S_{M} with these new values such that the new set S_{M+1} still
satisfies a£x_{0}<x_{1}<...<x_{n}_{+1}£b
and that x_{i} is only replaced by x_{j} if
p_{M}(x_{i})-f(x_{i})
and p_{M}(x_{j})-f(x_{j})
have the same sign.
Having obtained S_{M+1} we then find p_{M+1}, use that to obtain S_{M+2}, and so on.
Each stage will bring our set of values closer to the "worst possible" set
of values {h_{0},h_{1},...,h_{n+1}} and at every stage we will have
|e_{M+1}|³|e_{M}|.
The algorithm terminates at stage N when
|e_{N+1}| - |e_{N}|<d for some small d since we
will then have
Max_{i=0,n+1}|p_{N}(x_{i})-f(x_{i})| < h+d.
Minimax Rational Functions
If we allow a division in our approximation function then we have the full
range of rational functions available to us. A rational function over [a,b] is
one of the form r(x) = p(x)/q(x) where p(x) is a polynomial of degree m and q(x)
is a polynomial of degree n satisfying q(x)>0 for
x Î [a,b].
The exchange algorithm described above may be used to find minimax rational approximation functions using sets of m+n+2 points although it is no longer the case that at stage M we can find the minimax polynomial r_{M} for f over S_{M} by solving a set of linear equations; instead we are faced with:
p_{M}(x_{i}) = [f(x_{i})-(-1)^{i}e_{M}]q_{M}(x_{i})_{} i=0,1,.,m+n+1.
Though it is possible to solve these directly using eigenvalues (see Approximation Theory and Methods) we instead adobt the tools of linear programming such as the Primal Simplex Algorithm.
We choose an estimation h_{} for the deviation e_{M}
and look for a rational function r_{M} satisfying
|f(x_{i})-r_{M}(x_{i})|£h for i=0,1,.,m+n+1. That is, we look for
polynomials p and q satisfying
p(x_{i}) - f(x_{i})q(x_{i}) £ hq(x_{i})
i=0,1,..m+n+1
f(x_{i})q(x_{i}) - p(x_{i}) £ hq(x_{i})
i=0,1,..m+n+1
q(x_{i}) > 0
i=0,1,..m+n+1.
If we can find such p and q then we know that h³ e_{M}. If such p and q can be
shown not to exist then we know that h<e_{M}.
This enables us to find e_{M} to any
required accuracy (using a binary search technique) and thus p_{M} and q_{M}.
Finding coefficients of p and q to satisfy these equalities is a linear
programming problem. However we can achieve better performance in terms of h's
approach to e_{M} by using the
differential correction algorithm.
We start stage M
with h=e_{M-1} and look for polynomials p and q and
scalar d³0 satisfying the m+n+3 equations:
p(x_{i}) - f(x_{i})q(x_{i}) £; hq(x_{i}) - dq_{M-1}(x_{i})
i=0,1,..m+n+1
f(x_{i})q(x_{i}) - p(x_{i}) £; hq(x_{i}) - dq_{M-1}(x_{i})
i=0,1,..m+n+1
q(x_{k}) = 1
for a fixed kÎ{0,1,..,n+m+1}.
If such can be found with d=0 then p=p_{M}, q=q_{M}, h=e_{M}.
Otherwise d>0 and we
replace h with Max_{i=0,..m+n+1}|f(x_{i})-p(x_{i})/q(x_{i})| and repeat. h will tend to e_{M} from
above.
For stage 1 we define e_{0} to be anything known to be larger than the optimal
minimax rational approximation error over [a,b], and q_{0}(x) to be identically
equal to 1.
It only remains to reformulate the above inequalities into the standard
form usual for linear programming techniques. An appropriate reformulation is
obtained by writing the coefficient p_{r} of x^{r} in p as
p_{r0}-p_{r1} (r=0,1,..,m) and the
coefficient q_{s} of x^{s} in q as
q_{s0}-q_{s1} (s=0,1,..,n) and also introducing slack
variables y_{i} and z_{i} for i=0,1,..,m+n+1. We then have the linear programming
problem:
Minimise å_{j=0}^{m+n+1} (y_{j}+z_{j})
subject to the 2(m+n)+3 constraints
å_{r=0}^{m} (p_{r0}-p_{r1})x_{i}^{r}
- (f(x_{i})+h)[å_{s=0}^{n} (q_{s0}-q_{s1})x_{i}^{s}]
+ y_{i }= - dq_{M-1}(x_{i})
i=0,1,..m+n+1
-å_{r=0}^{m} (p_{r0}-p_{r1})x_{i}^{r}
+ (f(x_{i})-h)[å_{s=0=n}^{(qs0-qs1)xis} ]
+ z_{i}= - dq_{M-1}(x_{i})
i=0,1,..m+n+1
å_{j=0}^{n} (q_{j0}-q_{j1})x_{}_{k}^{j}
= 1
over p_{r}_{0},p_{r1},q_{s0},q_{s1},y_{i},z_{i},
d ³ 0.
Basic Math Functions
Square Root function
Square roots are often required, occasionally in the form Ö(1+x).
In the presence of log and antilog functions one can exploit the trivial result
Öx = antilog( log(x)/2).
Bit-shifting Methods
It is reasonably straightforward to evaluate square roots explicitly in time proportional to the accuracy required.
In the following discussion we will assume that the problem is to evaluate, to N bit accuracy, the square root of a fractional operand x. We assume x to be a positive value strictly less than one. This means that the result y=Öx will be of the same order no matter how wide x is. If x is four bytes wide we will regard it as being in (2^{}^{32})ths, if it is two bytes wide we will regard it as being in (2^{-16})ths. In both cases, if N=8 the result will be in (2^{-8})ths.
This assumption is not limiting. If we wish x to represent, say, a 32-bit integer then y=Ö(x/2^{32}) need only be multiplied by 2^{16} to give the correct result. Either algorithm with N=16 will return y in (2^{-16})ths which is effectively the same thing as 2^{16}y.
Since Ö(a+b) is not easily expressible in terms of a and b we do not use an
operand-based bit-shifting algorithm. However since f^{-1}(y)=y^{2} is monotonically
increasing we can use a result-based one.
Suppose at the n^{th} stage the top n bits of y_{n} match those of y=Öx
but the rest of y_{n} is zero (ie. y_{n} =å_{i=-n}^{-1} y_{{i}}2^{i})
so that E_{n}=x-y_{n}^{2} is a non-negative error value for y_{n}^{2}
as an underapproximator for x.
y_{{n+1}} = 1
Û (y_{n}+2^{-(n+1)})^{2} £ x
Û y_{n}^{2} + 2^{-n}y_{n} + 2^{-2(n+1)} £ x
Û 2^{-n}y_{n} + 2^{-2(n+1)} £ E_{n} , with the comparison one of
positive (unsigned) integers.
An algorithm exploiting this is
An alternative approach is to operate on the bits of x in pairs by
shifting them into a comparison field. Each of these double shifts gives one
bit of result data.
At each stage we have y_{n} containing an integral underestimation of the
square root of 4^{n}x. We maintain an error value E_{n}=4^{n}x-y_{n}^{2}.
How is E_{n+1} related to E_{n}?
If y_{n+1}=2y_{n} then E_{n+1} = 4^{n+1}x-4y_{n}^{2} = 4En
If y_{n+1}=2y_{n} +1 then En+1 = 4^{n+1}x-4y_{n}^{2}-4y_{n}-1 = 4E_{n} - 4y_{n} -1.
So we set y_{n+1}=2y_{n+1}+1 Û 4E_{n} ³ 4y_{n} + 1
ie. Û E_{n} ³ y_{n} + 1/4.
The algorithm is therefore:
Iterative Methods
It is possible to obtain square roots using an iterative method involving
divisions. Consider the iteration x_{n+1} = (x_{n}+y/x_{n})/2.
If e_{n} = x_{n}-Öy then we have
e_{n+1} = e_{n}^{2}/2x_{n}.
So if x_{n}>Öy we must have 0< e_{n+1} < e_{n}^{2}/2Öy;
writing g_{n}=e_{n}/Öy for the
fractional error in x_{n}, we have
g_{n+1} £ g_{n}^{2}/2 Þ g_{n} £ g_{0}^{2n}/2^{2n-1}.
Thus provided x_{0} differs from Öy by less than 141%
(g_{0}<Ö2)
then x_{n}®Öy (from above) extremely rapidly.
If, for example, x_{0} is within 6% of Öy so that the
most signigicant four bits of x_{0} are accurate, then x_{1} will be
accurate to nine bits, x_{2} to nineteen bits, x_{3} to thirty nine bits, x_{4} to seventy nine bits, and
so on (though not beyond the accuracy to which the divisions are evaluated).
Since this iteration converges to the square root so fast, we rarely need
to do more than three iterations and two will often suffice if we start from a
resonable estimate for Öy. If Öy is close to one (eg. when normalising an
approximately normal vector), then we set x_{0}=1 and get our first iteration
x_{1}=(1+y)/2 almost for free. To find the exact magnitude of a vector whose
magnitude is not necessarily close to unity we can sum the squares of the
components and then use a distance measure (see Distance Measures) to start the
iteration.
Similar iterations exist to find cube and higher roots. See Inverse Functions
Via Iteration.
Approximation Functions
We can reduce the domain from [0,1] to [0,½] by dividing
any x>½ by four and the doubling its square root.
Indeed, we can reduce to [0,2^{-m}] by dividing by 2^{m}
and then multiplying the squareroot by 2^{2m} although we loose
the LeastSig m input bits by doing so.
[Include some rational poly fits here ]
Power Function
Square roots are the b=½ special case of the general power function x^{b} .
In the presence of log and antilog functions one can exploit the trivial result
x^{b}= antilog(b log(x)) .
Of particular interest is x^{b} for x Î [0,1] and a "large" positive b (around twelve) eg.
the ( cosq)^{b} arising in Phong Shading. The result is then also in [0,1].
For known integer b>0, x^{b} can be evaluated in less than b multiplications in a "hardwired" manner.
x^{18} , for example, requires 4-Mult as (((x^{2})^{2})^{2})(x^{2}) [
(x^{2}) being evaluated once and used twice ].
Differentiating power functions
If ' denotes ¶/¶x then we have (a^{x})' = ln(a)a^{x} and (x^{a})' = ax^{a-1} for a¹0.
If ¦(x)=g(x)^{_h(x)} then
d^{-1}(¦(x+d)-¦(x)) =
d^{-1}(g(x+d)^{_h(x+d)}-g(x+d)^{_h(x)}
+ g(x+d)^{_h(x)}-g(x)^{_h(x)})
»
ln(g(x+d))g(x+d)^{_h(x)}_h'(x) +
_h(x)g(x)^{_h(x)-1}g'(x)
»
ln(g(x))g(x)^{_h(x)})h'(x) + _h(x)g(x)^{_h(x)-1}g'(x)
= ( ln(g(x))g(x)_h'(x) + _h(x)g'(x) )g(x)^{_h(x)-1}
Approximation Functions
If the result is required only to within a fairly course error tolerance e
[ Typically e=2^{-8} for an 8-bit shade value
]
then a rough-and-ready piecewise approximator
is given by
¦(x) º 0 for 0£ x <d ;
(Ax+B)^{m} for d£ x £1
where d^{b} £ e
(typically d = e^{(b-1)} to give equality) ;
A=(1-d)^{-1} ; B = 1-A = -d(1__delta)^{-1} .
This satisfies ¦(d)=0 ; ¦(1)=1 for any m and it remains only
to chose an integer m smaller than b (ideally a power of two) giving the required accuracy.
For a given m, the optimal (minimax) values for a_{and B}are given by the minimax linear approximator
to x^{(b/m)} over x Î [d,1] though this gives a discontinuity
at x=d and may deviate slightly from [0,1].
In general, a "power bulging" function curve ¦(x) over a given range
can be approximated by computing the m^{th} power of an approximator to
the "flatter" curve ¦(x)^{(m-1)} for a prechosen "friendly" m.
Sine and Cosine functions
Look Up Tables
We will assume we need a q-bit sine (that is a result expressed in
(2^{q})ths) of a p-bit angle (that is an angle expressed in (2^{-p})ths of a
complete revolution (2p radians or 360^{o})),ie. we need
S(i) = 2^{q}sin(2^{(p1)}p)
where iÎZ_{2p-1}; and that we have room for a lookup table containing N=2^{n} m-bit
entries.
We take X^{*} to be the first quarter of X as before.
Suitable base functions
are s(i)=0 and s(i)=2^{q+2-p}i
X=Z_{2p-1} ; X^{*}=Z_{2p-2-1}
S(i) = 2^{q}sin(2^{(p1)}p) ; s(i) = 0 , 2^{q+2-p}i
Y=Z_{}_{2q-1}.
s_{11}(i) | = | i | if 0<i<2^{p-2} | |
= | 2^{p-1}-i | if 2^{p-2}<i<2^{p-1} | ||
= | i-2^{p-1} | if 2^{p-1}<i<3*2^{p-1} | ||
= | 2^{p}-i | if 3*2^{p-2}<i<2^{p} | ||
s_{12}(i) | = | 2^{n-p+2}i. |
s_{2}(y,i) | = | 2^{q-m}y | if 0 £ i < 2^{P-1} | |
= | -2^{q-m}y | if 2^{p-1} £ i < 2^{P}. |
s_{2}(y,i) | = | 2^{q-m-2}y + 2^{q-p+2}s_{11}(i) | if 0 £ i < 2^{P-1} | |
= | -2^{q-m-2}y - 2^{q-p+2}s_{11}(i) | if 2^{p-1} £ i < 2^{P}. |
Our thinness errors come from an absolute error of 2^{-1} in yÎZ_{2m1} and so translate to an absolute error in our final result of 2^{q-m} for s(i)=0 and 2^{q-m-3} for s(i)=2^{q+2-p}i. If we are using an 8-bit table to give a 16-bit result (m=8, q=16) for example, then our absolute thinness error is 2^{5} for the linear base function, only the top eleven bits of our result are accurate. To eliminate thinness errors we thus need m ³ q-3.
Our sparseness errors derive from any difference between p2 and n. If we
wish to eliminate sparseness errors we must either have n³p2 or use some form
of interpolation. Rather than using naive linear interpolation we note that
S'(x) » 2^{q}^{p+1}pcos(2^{(p1)}px) = 2^{p1}pS(2^{p-2}-x)
so that
S(j+k) » S(j) + (2^{1-p}pk) S(2^{p-2}-i).
Using the contraction function s_{12}(i) = 2^{n-p+2}i results in the loss of p-n-2
bits so the k we are interested in will be in Z_{2p-n-2-1}.
We therefore construct a corrective look up table C to approximate the function
c(k)=2^{1-p}pk.
We will assume this table to have 2^{p-n-2} entries (giving a contraction function c_{1}(k)=k)
entries of width 2^{r} (giving a table C[k]=[2^{r+n-p}pk|,
and expansion function
c_{2}(y,k)=2^{1-r-n}y if we use a zero base function).
For any x Î X we can therefore define our approximation S of S to be
S(i) = s_{2}(S[j],i)+ 2^{1-r-n}C[k]s_{2}(S[2^{n}-j],2^{p}-i)
where j=i DIV 2^{p-n-2} , k=i MOD 2^{p-n-2}.
Example
If we have n=r=8 and p=q=m=16 then s_{12}(i) = 2^{-6}i.
If we use a zero base function then:
S[] contains 2^{8} 16-bit enties: S[i] = [2^{16}sin(2^{9}pi)|.
C[] contains 2^{6} 8-bit entries: C[k] = kp.
S(i) = S[j] + 2^{-15}C[k]*S[2^{8}-j]
where j=i DIV 2^{6},k=i MOD 2^{6} for iÎðZ_{214-1}.
If we use a linear base function then:
S[] contains 2^{8} 16-bit enties: S[i] = [2^{18}(sin(2^{9}pi)2^{9}pi)|.
C[] contains 2^{6} 8-bit entries: C[k] = kp.
S(i) = 2^{-2}S[j] + 2^{8}j + 2^{-15}C[k]*(2^{-2}S[2^{8}-j]+2^{16}-2^{8}j)
where j=i DIV 2^{6}, k=i MOD 2^{6} for iÎðZ_{214-1}.
The multiplication is of the form 2^{-15}(8-bit)*(16-bit) and we only need the top nine bits of the 24-bit result.
For a zero base function this system gives values for S(i) within ±2 of the desired 16-bit scaled sine. If we only evaluate the multiplication to eight bits then the errors become ±3. Raising r to 16 reduces the errors to ±1 except for just 14 (out of 2^{14}) +2s. Taking n=9, r=16 and a 9-bit multiply gives errors of ±1 with only 75 +1s. Increasing n to 11 (a 4 Kbyte lookup table) ensures that the value is only ever out by -1.
Bit-Shifting Methods
The identity sin(a+b) = sinasinb + cosacosb allows an operand-based method (if we tabulate sin(2^{i}) and cos(2^{i})) but this would be extremely extremely costly requiring two multiplications per non-zero operand bit.
The monotonicity of sin^{-1}(y) allows a result-based method but only if sin^{-1}(y) is available.
Approximation Functions
The following tables give minimax polynomial approximations to sin(px/2),
that is, we assume x=1 corresponds to a right angle.
e_{1} is the maximal negative error in the
approximation, e_{2} the maximal gives the maximal positve error.
e_{3} and e_{4} give the maximal negative and positive errors if all
multiplications are only performed to four heximal places (sixteen bit fractions).
Approximations to sin(px/2) over [0,1].
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1}(-ve) | e_{2} | e_{3} | e_{4}(+ve) |
-.038CA7 | +1.D87297 | -.D1594A | -.038CA7 | +.038CA7 | -.038C28 | +.038E13 | ||||
-.005998 | +1.9C47AC | -.2CA6C2 | -.6FA0EA | -.005998 | +.005998 | -.00595D | +.005AF5 | |||
+.00070F | +1.90AF50 | +.0C0D72 | -.C915CE | +.2C4AEE | -.00070F | +.00070F | -.00070F | +.0009E9 | ||
+.000077 | +1.91FFC8 | +.0162BE | -.AAE54A | +.0980D0 | +.0E01F4 | -.000077 | +.000077 | -.000077 | +.0003FA | |
-.000007 | +1.922253 | -.002AE6 | -.A45511 | -.030FD3 | +.191CAC | -.03AF27 | -.000007 | +.000007 | -.000001 | +.000459 |
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1} | e_{2} | e_{3} | e_{4} |
+.03D13F | +1.6A09E6 | -.03D13F | +.03D13F | -.03D13F | +.03D2BE | |||||
-.009760 | +1.A7DEC3 | -.76EEB9 | -.009760 | +.009760 | -.009730 | +.0098CA | ||||
-.00031C | +1.92D222 | -.060D8B | -.9705DA | -.00031C | +.00031C | -.00030C | +.0004B1 | |||
+.00004B | +1.92022C | +.01DD56 | -.AFF0E3 | +.18A0E6 | -.00004B | +.00004B | -.00004B | +.000209 | ||
+.000001 | +1.921F29 | +.000C12 | -.A5BD88 | +.0145CD | +.12B5CB | -.000001 | +.000001 | -.000001 | +.0001B8 | |
-.000000 | +1.921FC2 | -.0001B0 | -.A5495E | -.007687 | +.15C668 | -.020835 | -.000000 | +.000000 | 0 | +.0001DD |
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1} | e_{2} | e_{3} | e_{4} |
+.730C08 | +.95F61A | -.090222 | +.090222 | -.090215 | +.0902C9 | |||||
-.2646B8 | +2.45AA3C | -1.1F23F8 | -.003F8C | +.003F8C | -.003F48 | +.004042 | ||||
-.0E8F7A | +1.E064E7 | -.934EDE | -.3E8E00 | -.000772 | +.000772 | -.0006E6 | +.00097A | |||
+.023D41 | +1.816DEE | +.31ECE6 | -.F10DB9 | +.3B7586 | -.00001F | +.00001F | 0 | +.000327 | ||
+.00AB6F | +1.8C8EBF | +.12D800 | -.C64B6C | +.1E795B | +.07BFE6 | -.000002 | +.000002 | 0 | +.0003D9 | |
-.0010BA | +1.92CAD4 | -.02ECBD | -.9E3EEA | -.0A7B74 | +.1DD4D2 | -.04E7D0 | -.000000 | +.000000 | 0 | +.0003DC |
Bit-Shifting Methods
The monotonicity of tan(y) makes result-based algorithms viable in the presense of a routine to evaluate tan.
Iterative Methods
The iteration x_{n}_{+1} = ((2-sin2x_{n})x_{n} + (1+cos2x_{n})y)/2 will approach tan^{-1}(y) extremely rapidly. See Inverse Functions Via Iteration.
Approximation Functions
It is possible to find polynomials that approximate tan^{-1}(x) fairly well over xÎ[0,1], or indeed over xÎ[-1,1]. Unfortunately we need a polynomial of the form C_{1}x+C_{3}x^{3}+C_{5}x^{5}+C_{7}x^{7} (requiring five multiplications) merely for an accuracy of ±&0.0006 (achieved by C_{1}=.9992150,C_{3}=-.3211819,C_{5}=.1462766,C_{7}=-.0389929).
The simplest and most efficient approach is to have two or more different
polynomials for different portions of the required domain but alternatively we
can reduce the required domain to [0,Ö2-1]»[0,&0.6A09] using the identity
tan^{-1}(x) = 2 tan^{-1}((Ö(1+x^{2}-1)/x)
which of course we need only use when x>Ö2-1.
Alternatively we use identities of the form
tan^{-1}(x) = a + tan^{-1}((x-tana)/(1+xtana))
for various a Î (0,1).
Taking a= tan^{-1}(½) » &0.76B2 and using
tan^{-1}(x) = tan^{-1}(½) + tan^{-1}((2x-1)/(2+x))
on any x>½ will reduce the domain to [0,½].
If we then take a = tan^{-1}(1/4) » &0.3EB7 and use
tan^{-1}(x) = tan^{-1}(1/4) + tan^{-1}((4x-1)/(4+x))
on any x>½ we will further reduce the domain to [0,1/4].
While it is possible to continue this process to obtain an arbitarily
small domain over which to use a simple approximation function (a linear
polynomial for example) or a short look up table, it is probably more
computatonally efficient to use a higher degree polynomial over a larger range
since whereas each additional degree of a polynomial adds one multiplication,
each domain reduction may require one division.
The tables below gives good approximation polynomials for tan^{-1}(x) over the ranges [-1,1], [0,1], [0,Ö2-1], [0,½],[½,1],[0,1/4],[1/4,½],[½,3/4], and [3/4,1]. Coefficients and errors are all given in heximal.
Approximations to tan^{-1}(x) over [-1,1].
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1} | e_{2} | e_{3} | e_{4} |
0 | +1 | 0 | -1/3 | -.1E6530 | +.1E6530 | -.1E65DB | +.1E65DB | |||
0 | +1 | 0 | -1/3 | 0 | +1/5 | -.14CE03 | +.14CE03 | -.14CD25 | +.14CD25 | |
0 | +.F8EED5 | 0 | -.312382 | -.014488 | +.014488 | -.0144DB | +.0146DB | |||
0 | +.FECFC7 | 0 | -.49E799 | 0 | +.144F8F | -.0027E2 | +.0027E2 | -.0028CA | +.002A69 |
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1} | e_{2} | e_{3} | e_{4} |
+.091A49 | +.C90FDB | -.091A49 | +.091A49 | -.091A49 | +.091B60 | |||||
-.00A0B5 | +1.10F8B3 | -.46A76D | -.00A0B5 | +.00A0B5 | -.00A067 | +.00A1F4 | ||||
-.004868 | +1.096170 | -.2F9CC5 | -.10B4D0 | -.004868 | +.004868 | -.004857 | +.00498E | |||
-.0006F6 | +1.00FB4D | -.04AE78 | -.568A15 | +.235B05 | -.0006F6 | +.0006F6 | -.0006D0 | +.000971 | ||
+.00015F | +0.FF8D86 | +.060E9D | -.737E60 | +.439E18 | -.0CAC00 | -.00015F | +.00015F | -.00015F | +.0004DA | |
+.00006B | +0.FFD785 | +.0279D9 | -.637579 | +.237551 | +.10D9B4 | -.0A1BE6 | -.00006B | +.00006B | -.00006B | +.000493 |
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1} | e_{2} | e_{3} | e_{4} |
+.01B5FA | +.ED6338 | -.01B5FA | +.01B5FA | -.01B5FA | +.01B6D3 | |||||
-.003925 | +1.08E16D | -.35333F | -.003925 | +.003925 | -.0038FA | +.003A47 | ||||
-.000580 | +1.0147E9 | -.0B97C8 | -.386334 | -.000580 | +.000580 | -.000572 | +.0006B5 | |||
+.000041 | +.FFE25C | +.022CD9 | -.641216 | +.2B6F86 | -.000041 | +.000041 | -.000041 | +.000180 | ||
+.000010 | +.FFF6E6 | +.00D1F3 | -.5C29DE | +.18C5B2 | +.0F519B | -.000010 | +.000010 | -.000010 | +.000192 | |
+.000000 | +.FFFFF5 | -.0001C0 | -.5518BD | -.02860C | +.3FE832 | -.206E10 | -.000000 | +.000000 | -.000000 | +.000198 |
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1} | e_{2} | e_{3} | e_{4} |
-.0243E7 | +1.18C9E2 | -.4D8472 | -.000E51 | +.000E51 | -.000DFD | +.000FC2 | ||||
-.07BC4A | +1.2FCF56 | -.6CD820 | +.0DD30D | -.0001E1 | +.0001E1 | -.000192 | +.0003AF | |||
-.03A604 | +1.188DAC | -.3C36BC | -.1E7BC7 | +.0EDAF8 | -.000042 | +.000042 | 0 | +.0003C2 | ||
-.004AC5 | +1.00C75F | +.0620D5 | -.79AA51 | +.4C9B9E | -.107EE6 | -.000003 | +.000003 | 0 | +.000378 | |
+.00B718 | +.F83E19 | +.23E7B1 | -.B065E7 | +.848836 | -.2E9F4F | +.06AFFA | -.000000 | +.000000 | 0 | +.00051A |
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1} | e_{2} | e_{3} | e_{4} |
+.0106BD | +.F2B406 | -.0106BD | +.0106BD | -.0106BD | +.0107AB | |||||
-.0024B2 | +1.06ABCD | -.2E8926 | -.0024B2 | +.0024B2 | -.00249D | +.0025D7 | ||||
-.000268 | +1.00AAE7 | -.0731BF | -.40058E | -.000268 | +.000268 | -.000257 | +.0003AF | |||
+.000026 | +.FFEC79 | +.019FEC | -.61B5CC | +.285BA4 | -.000026 | +.000026 | -.000026 | +.00015B | ||
+.000005 | +.FFFC70 | +.0061C5 | -.591C11 | +.1031BB | +.17AC22 | -.000005 | +.000005 | -.000005 | +.000161 | |
+.000000 | +1.00001C | -.000519 | -.54FC8A | -.02F392 | +.40AD59 | -.20F2B8 | -.000000 | +.000000 | 0 | +.000166 |
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1} | e_{2} | e_{3} | e_{4} |
+.003E90 | +.FADBB0 | -.003E90 | +.003E90 | -.003E90 | +.003F94 | |||||
-.00099D | +1.02C5A5 | -.1E7437 | -.00099D | +.00099D | -.000995 | +.000AA3 | ||||
-.00003A | +1.001A58 | -.01CD2D | -.4CB5D4 | -.00003A | +.00003A | -.000030 | +.000148 | |||
+.000005 | +.FFFC08 | +.00830D | -.5B4B77 | +.1CDD39 | -.000005 | +.000005 | -.000005 | +.00012C | ||
+.000000 | +.FFFFC6 | +.000A22 | -.55F79F | +.0460BD | +.27587F | -.000000 | +.000000 | -.000000 | +.000129 | |
+.000000 | +1.000004 | -.000121 | -.553915 | -.0153B5 | +.3B86E1 | -.1ACC2C | -.000000 | +.000000 | 0 | +.000129 |
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1} | e_{2} | e_{3} | e_{4} |
+.074E6A | +.DFEAC1 | -.00922F | +.00922F | -.0091FF | +.009355 | |||||
-.026B6D | +1.16AEA2 | -.48D849 | -.000435 | +.000435 | -.00041C | +.000581 | ||||
-.00D40D | +1.091C8A | -.236D49 | -.216D4F | -.000060 | +.000060 | -.00003C | +.0001C2 | |||
+.00045C | +.FF8B21 | +.045CFE | -.698805 | +.30171E | -.000001 | +.000001 | 0 | +.000164 | ||
+.00113A | +.FED65A | +.084545 | -.742E60 | +.3E6582 | -.079517 | -.000000 | +.000000 | 0 | +.000173 | |
+.000315 | +.FFC6C9 | +.01B343 | -.5BF600 | +.0CC46B | +.2E07EF | -.17DD12 | -.000000 | +.000000 | 0 | +.000171 |
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1} | e_{2} | e_{3} | e_{4} |
+.1B4083 | +.B82B84 | -.00A4A9 | +.00A4A9 | -.00A467 | +.00A59A | |||||
-.04377A | +1.1EF1EB | -.523CBC | -.0000B1 | +.0000B1 | -.00007C | +.00020B | ||||
-.056DBA | +1.24D1E0 | -.5BA27A | +.04F684 | -.000034 | +.000034 | 0 | +.0001F1 | |||
-.019D91 | +1.0BE19D | -.1EE41C | -.3C58F8 | +.1A2825 | -.000003 | +.000003 | 0 | +.0001FC | ||
+.004576 | +.FC78F8 | +.132F0E | -.8D4BD5 | +.5B40F6 | -.14D5C5 | -.000000 | +.000000 | 0 | +.000244 | |
+.00719F | +.FACA68 | +.19FD08 | -.9BEB52 | +.6CDBE0 | -.201827 | +.02FD16 | -.000000 | +.000000 | 0 | +.000242 |
x^{0} | x^{1} | x^{2} | x^{3} | x^{4} | x^{5} | x^{6} | e_{1} | e_{2} | e_{3} | e_{4} |
+.3851F9 | +.914D76 | -.008F94 | +.008F94 | -.008F3C | +.009082 | |||||
+.020481 | +1.0EC7EC | -.47BF10 | -.00027D | +.00027D | -.000197 | +.0004A6 | ||||
-.0B1716 | +1.3C2B04 | -.7BDC68 | +.13D848 | -.00000B | +.00000B | 0 | +.000273 | |||
-.07E796 | +1.2D6C09 | -.6259FE | +.004C9C | +.0598CC | -.000001 | +.000001 | 0 | +.0003AA | ||
-.0257DE | +1.0D4B72 | -.184E59 | -.54CF00 | +.3661EE | -.0B284E | -.000000 | +.000000 | 0 | +.000410 | |
+.0106A3 | +0.F5E41C | +.2B4CCC | -.BCC31E | +.901EC3 | -.3464A4 | +.07E150 | -.000000 | +.000000 | 0 | +.000450 |
Look Up Tables
The dynamic range of log_{a}(x) for x=1,2,3,...N is usually small for
reasonable N. In order to use logs efficiently it is often necessary to scale
them up.
For example, if a=2 and N=255, log_{a}(N) is less than
8 so storing [log_{2}(x)| only gives three bits of data and a corresponding fractional error of about
12%. If L(x) approximates 2^{12}log_{2}(x) so that
L[255] » 2^{12}*7.994=&7FE7, 15 bits are used profitably.
We define L(x) = Klog_{a}(x) for x Î X and some convenient K.
A convenient representative portion of Â^{+} wrt L is an interval of the form
[1,a^{t}) for some postive (perhaps integral) t.
If x Î Â^{+} is expressed in the form x = a^{m}(a^{q}+r) where m is an integer
(the exponent) and r (the mantissa) Î [0,a^{t+q}-a^{q})
(Standard binary floating point format, for example, has a=2,t=1,q=-1) then,
writing s for ra^{-q } Î [0,a^{}t-1),
L(x) = L(a^{m}(a^{q}+r)) = L(a^{m+q}(1+s)) = K(m+q) + L(1+s).
A contraction function can thus be based on l_{11}(a^{m}(a^{q}+r)) = 1+ra^{q} and an
expansion function on l_{22}(y,a^{m}(a^{q}+r)) = y + K(m+q).
We thus have:
X = R^{+} ; X^{*}= [1,a^{t}) ;
L(x) = Klog_{a}(x) ; l(x)= 0 , Kt(x-1)/(a^{}t-1) ,or Ktxa^{}t
l_{11}(a^{m}(a^{q}+r)) = 1+ra^{q}
l_{22}(y,a^{m}(a^{q}+r)) = y + K(m+q).
Assuming that x=a^{m}(a^{q}+r), combining l_{11}(x) = 1+ra^{}q with l_{12}(1+s)=[N_{L}s/(a^{t}-1)]
gives a contraction function l_{1}(x)=N_{L}ra^{q}/(a^{t}-1).
If we choose l(x)=0 and store L[i] = [Klog_{a}(1+i(a^{}t1)/N_{L})| iÎZ_{N}
(where K is chosen so that Kt is as close to (but not greater than) M_{L}) then,
setting
_2B l_{2}(y_{1},y_{2},x) = y_{1} + K(m+q),
we obtain
l_{2}(L[l_{1}(x)].y.x) | = | L[l_{}1(x)] + K(m+q) |
= | L[[Nra^{-q}(a^{t}-1))|] + K(m+q) | |
= | Klog_{a}(1+(NLra^{-q}/(a^{t}-1)))(a^{}t-1)/NL) + K(m+q) | |
= | Klog_{a}(1+ra^{-q}) + K(m+q) | |
= | Klog_{a}((1+ra^{-q})a^{m+q}) | |
= | Klog_{a}(a^{m}(a^{q}+r)) | |
= | Klog_{a}(x) |
Look Up Tables
Any interval in Â is a representative portion of Â wrt A. We will
consider an interval [d,d+w) and suppose that a given xÎÂ is readily
expressed in the form mS+r where mÎZ, rÎ[d,d+w) and S is some constant
(typically d=0,w=S, so that m=x DIV S, r=x MOD S and we have
A(x) = A(mS+r) = a^{mS/K}A(r)).
Thus a contraction function can be based on a_{11}(mS+r)=r and an expansion
function on a_{22}(y,mS+r)=a^{mS/K}y, this latter only being really feasible if mS/K
is an integer.
Antilog generates the same non-uniform variation problem as log but with the
low spread of values occurring at the beginning of the range. This is again
only a problem if a constant base function is used and can be avoided even
then by using d=0,w=K giving a dynamic range of [1,a).
The obvious constant base function for A over [d,d+w] is
a(x)=a^{d/K}.
Typically we would take d=0,w=S=1 giving:
X = R^{+} ; X^{*} = [0,1)
A(x) = a^{x/K} ; a(x) = 1 , 1+x(a^{1/K}-1) , xa^{1/K}
a_{11}(z+x) = x (zÎZ,xÎ[0,1))
a_{22}(y,z+x) = a^{z/K}y.
Thinness Errors
The maximum deviation of A(x)=a^{x/K} from its linear approximation
a(x)=a^{d/K(}1+x(a^{}w/K-1)/w) which overestimates
A(x) over (d,d+w) occurs at x_{0}=A'^{-1}((A(d+w)-A(d))/w).
Now A'(x)=ln(a)a^{}x/K/K
so A‘^{-1}(y) = Klog_{a}(Ky/ln(a)).
Thus x_{0} | = | Klog_{a}(Ka^{d/K}(a^{w/K}-1)/w/ln(a)) |
= | d+Klog_{a}(K(a^{w/K}-1)/w/ln(a)) | |
= | d+Klog_{a}(m) |
A(x_{0})-g(x_{0}) | = | a^{d/K}(m-1-Klog_{a}(m)/w(a^{w/k}-1)) |
= | a^{d/K}(m-1-mln(a)log_{a}(m)) | |
= | a^{d/K}(m1-mln(m)). |
Sparseness Errors
Setting b=w/N_{A} and a_{12}(r)=[(r-d)/b] will generate fractional sparseness
errors of approximately e=ln(a)b/K. For a=2 this is approximately 0.69b/K.
One way to reduce sparseness errors is to note that A(x+d) = A(d)A(x) and
use a corrective LUT of the form C(y)=A(y/b).
Alternatively we can use naive linear interpolation.
Since A'(x) = ln(a)^{2}a^{x/k}/K^{2}
has its greatest absolute value over [a,b] at x=b and is there
(ln(a)/K)^{2}A(b),
an estimation of the theoretical fractional sparseness error after
using naive linear interpolation is
e'=(bln(a)/K)^{2}/8=e^{2}/8.
For a=2 this is approximately 0.06(b/K)^{2}.
Linear interpolation, however, involves multiplication so we can use L
and A again. Suppose c=d+yb for some yÎZ and xÎ(c,c+b). Let z=(xc)/b which
we know lies inside (0,1). (if d=0 we have y=[x/b], z=(x/b)-y, c=yb).
g(x) | = | A(c) + z(A(c+b)-A(c)) |
= | A(c) + zA(c)(A(b)-1) | |
= | A(c) + a^{}logz+c/K+ log(A(b)-1) | |
= | A(c) + a^{}(Klogz+c+K log(A(b)-1))/K | |
= | A(c) + A(Klog_{a}(z)+c+Klog_{a}(A(b)-1)) |
g(x) | = | A(c) + A(c +l_{21}(L[l_{12}(R_{z})])+G) |
» | a_{2}(A[y],c) + a_{2}(A[[y+l_{}21(L[l_{12}(R_{z})])/b+G/b|],c+l_{21}(L[l_{}12(R_{z})])+G)) |
An example should make this clearer. With a=2,K=2^{12},d=0,w=S=2^{14} we have a_{11}(x)=x MOD 2^{14}, a_{22}(y,x)=2^{4}^{(x DIV 2}^{^10)}y. If we have N_{A}=2^{8} we need to complete our contraction function with a_{12}(x)=2^{-6} and are then interested in interpolating for values of the form 2^{6}i+j where jÎZ_{261}. If we use a constant base function a(x)=2^{0}=1 then replacing c with 2^{6}i, b with 2^{6}, and z with 2^{-6}j (R_{z}=j) we have
g(2^{6}i+j) | = | A(2^{6}i) + A(2^{6}i + 2^{12}log_{2}(j) + G) |
= | A(2^{6}i) + A(2^{6}i + L(j) + G) |
Let us consider how approximating g in this manner effects our theoretical
sparseness eror of e'=e^{2}/8.
A(c) will not be prone to sparseness errors since c divides b. However
both it and the corrective term A(Klog_{a}(z)+c+Klog_{a}(A(b)-1)) are subject to
thinness errors in A so we have effectively doubled the thinness errors. The
corrective term is bounded in magnitude by A(c+b)-A(c). But this is just the
original absolute sparseness error bound for A over [c,c+b) and equals eA(c).
If the errors in L are small compared to the sparseness errors in A we can
take all the errors in the corrective term as deriving from A and we have a
maximal sparseness error of
e(Max corrective term)=e(eA(c))=e^{2}A(c).
Our fractional sparseness error is then e^{2}A(c)/A(x) < e^{2} since A(c)<A(x).
Adding this to the theorectical error e^{2}/8 derived above gives a final
sparseness error in A of e'=9e^{2}/8.
We have also doubled our thinness errors.
(The gradient of A is greatest at the end of the range so our crude absolute
error bound is
A(d+w)-A(d+w-b)=a^{(d+w)/K}-a^{(d+w-b)/K} »
a^{}(d+w)/K(ln(a)b/K).
We can similarly derive a maximal fractional error of ln(a)b/K.).
Reciprocal function
The function f(x)=1/x is often tabulated to provide a way of doing
divisions using multiplications; xy being evaluated as x*(1/y).
Bit-Shifting Methods
Any of the bit-shifting division algorithms for P/Q can be used with P=1.
Gamma Function
The Gamma function is a generalisation of the factorial integer function to real x
defined so that G(x+1)=xG(x) . This enables us
to express any G(x) for x³0 as a multiple of G(x') for x'Î[½,1½] .
G(1)=1 ensures G(k) = (k-1)! for positive integer k.
We also have
G(½) = p^{½} providing
G(k+½) = 1×3×5×...(2k-1) 2^{-k} p^{½} .
G(x) can be extended over negative x via G(-x) = -p(x sin(px)G(x))^{-1}
having sign (-1)^{k+1} between -k and -(k+1) .
G(x) = ò_{0}^{1} dt ( ln(t^{-1}))^{x-1} = ò_{0}^{¥} dt t^{x-1}e^{-t}
provide alternate integration-based definitions of G(x).over x>0.
Gauss's defintion is G(x) º Lim_{n ® ¥} n!n^{x-1}(x(x+1)..(x+n-1))^{-1}
for noninteger real x.
G(-(k+d))
= (-1)^{k+1}p((k+d) sin(pd)G(k+d))^{-1}
is,
for k>3 , small for most dÎ(0,1)
ramping rapidly to ±¥ as d®0 and 1.
Dropoff Functions
A dropoff function of order N is a function g:[-1,1]®[-1,1]
satisfying
g(0)=1 . g(1)=0 , g^{(i)}(0)=g^{(i)}(1)=0
" 1 £ i £ N
where ^{(i)} denotes the i^{th} derivative.
Though defined over [-1,1], we typicaly only require evaluation over [0,1].
If g is ½-symmetric then g'(1-t) = g'(t). If g is both ½- and 0-symmetric we have
g(t-1) = 1-g(t)
; g'(t-1) = -g'(t)
; g"(t-1) = -g"(t).
½-symmetrising
We can create a ½-symmetric droppof function from any g(t) defined over [0,½]
with g(0)=1 ; g(½)=½ ; g'(0)=0 ; g"(½)=0
by defining g(t)=g(|t|) for t<0 ;
g(t) = 1-g(1-t) for ½<t£1 ;
and g(t)=0 for t>1. We will refer to this as a ½-symmetrised dropoff function.
The g"(½)=0 condition ensures continuity in g" at ½ and we have
g'(t) = g'(1-t) and g"(t) = -g"(1-t) for t>½ .
There will be a k^{th} order discontinuity at t=½ for even k³4 if
g^{(k)}(½)¹0
Linear
g(t)= 1-|t|
Linear blending is ½-symmetric of order 0. It gives first order discontinuities.
g'(t)= -1 " t Î [0,1].
Cubic
g(t)= 1-t^{2}(3-2|t|)
Cubic blending is not ½-symmetric (although g(½)=½)
and is only 0-symmetric if defined as here with a |t|. It has order 1
and cubic blends are consequently often discontinuous in the second derivative as a result.
g'(t) ³ g'(½) = -3/2 " t Î [0,1].
By ½-extending 1-3t^{2}+2t^{3} we obtain a ½-symmetric order 1
blend.
Quartic
g(t) = 1-8|t|^{3}(1-|t|) over [-½,½] ½-symmetrised over [-1,1]
provides an order 2 ½-symmetric drop off which can be used to constuct blendings with third
order continuity (ie. continuous second derivatives). There is a discontinuity in
g^{(4)}(t) at tau=½.
g'(t) ³ g'(½) = -2 " t Î [0,1] so quartic dropoff is only a third
steeper than cubic dropoff at t=½, though of course it requires one further multiply to evaluate.
|g'(t)|£2 and |g"(t)|£4.
More generally, for integer k³3 we can ½-symmetrise k-degree polynomial
g(t)= 1 - 2^{k-3}t^{k-1}(k - 2t(k-2))
to obtain an order k-2 droppoff function with g'(½) = -½k .
Squared cosine
g(t)=½(1+ cos(pt)) = cos^{2}(pt/2)
Squared cosine blending is 0- and ½-symmetric and has order 1, but with
g^{(i)}(0)=g^{(i)}(1)=0 for all odd i.
g'(t) = -½p sin(pt) so
g'(t) ³ g'(½) = -p/2 " t Î [0,1].
Bias
g_{a}(t) = Bias_{a}(1-t)
where Bias_{a}(t) º
t^{-log2a}.
a=½ gives linear blending.
g_{a}(½) = a.
Bias blending has order 0 since g'(1) = -log_{2}a.
Gain
g_{b}(t) = Gain_{b}(1-t)
where
Gain_{b}(t) º Bias_{1-b}(2t)/2 if t £ ½ ;
1-Bias_{1-b}(2-2t)/2 else.
Gain blending is ½-symmetric. It has order Int((2b +1)/2).
Slope1
Define Slope1_{d}(t) where d > 0 as
equal to
(d+1)/(1+t/d) - d
= d(1-t)/(t+d)
for t £ 1, 0 else.
Slope blending has order 0 since
Slope1_{d}'(0) = -(d+1)/d,
Slope1_{d}'(1) = -d/(d+1).
Of particular interest is
Slope1_{2a-1}(t)
= (2^{a}-1)(1-t)/(t+2^{a}-1).
which has gradients -2^{a}/(2^{a}-1)
and -(2^{a}-1)/2^{a} at t=0 and 1 respectively.
For large a these gradients are just below and just above -1.
As a®0 they tend to -¥ and 0.
For a=1 we have Slope1_{1}(t) = 2/(1+t) - 1
with gradient -2 at t=0 and -½ at t=1.
[ Insert graph image here ]
Sinc Function
g(t) = (kpt)^{-1} sin(kpt) for integer k is a 0^{th} order dropoff function because
g'(t) = t^{-1} ( cos(kpt) - g(t)) has g'(_0)=0 but
g'(1)=(-1)^{k}. It has k zeroes in [0,1].
Dropoff derivatives
g(t) | g'(t) | g"(t) | g"'(t) | g'(½) | |
Linear | 1-|t| | Sign(-t) | 0 | 0 | -1 |
Cubic | 1-3t^{2}+2t^{3} | 6t(t-1) | 12t-6 | 12 | 3/2 |
Quartic | 1-8t^{3}+8t^{4} for t£½
8t(1-t)^{3} for t³½ | -8t^{2}(4(1-t)-1) for t£½
-8(1-t)^{2}(4t-1) for t³½ | -48t(1-2t) for t£½
-48(1-t)(1-2t) for t³½ | -48(1-4t) for t£½
-48(4t-3) for t³½ | -2 |
Polynomial |
1 - 2^{k-3}x^{k-1}(k - 2(k-2)x) for x£½
2^{k-3}(1-x)^{k-1}(1-k + 2(k-2)x) for x³½ | -2^{k-3}kx^{k-2}(k-1 - 2(k-2)x) | -½k | ||
Squared Cosine | ½(1+ cos(pt)) | -(p/2) sin(pt) | -(p^{2}/2) cos(pt) = p^{2}(½-g(t)) | (p^{3}/2) sin(pt) | -½p |
Bias | (1-t)^{-log2a} | (log_{2}a)(1-t)^{-(1+log2a)} | (log_{2}a(1-log_{2}a) (1-t)^{-(2+log2a)} | (log_{2}a) (1-log_{2}a) (2-log_{2}a) (1-t)^{-(3+log2a)} | 2alog_{2}a |
Sinc | (kpt)^{-1} sin(kpt) | t^{-1}( cos(kpt)-Sinc(t)) | t^{-1}(-kp sin(kpt)-t^{-1}( cos(kpt)-Sinc(t)) -t^{-2}( cos(kpt)-Sinc(t)) = ??? |