Chapter 5: Successive formation of energy basins as a
model for transfer effects in learning.
(Updated: 18 Sept. 2002, 15 July, 20 Oct., 5,19 Nov. 2004, 13 May 2006)
Summary.
Introduction.
Part I: Energy basins.
Part II. Application in descriptions of concept
learning.
Discussion.
Acknowledgement.
References.
Appendix: Lie brackets of interacting gradients.
Summary.
The change in slope required for the formation of a new energy basin in an energy landscape is studied for varying locations relative to an existing basin. A local maximum for this change is found when the new basin is located on the edge of the existing energy basin. Such a local maximum may explain cases in learning where previously acquired concepts impair the formation of new ones.
Introduction.
Given a system dx/dt = f(x) with an
energy function h, so that dxi/dt = fi(x) ≡
∂
h/∂ xi
(Goles 1995). The energy function induces slopes in
an energy landscape:
dxi/dt = fi(x) ≡ ∂ h/∂ xi =
tan αi
.
To examine the change in energy landscape and its dependence on the
existing landscape, define:
anglei = arctan (∂ h/∂ xi)
For h→hnew, anglei→anglei
new.
The difference between arctan (x+Δ)
and arctan(x) + arctan(Δ)
expresses how a pre-existing landscape influences the ease with which the final
situation can be reached. Mathematically, the arctan
function is a natural way to express the non-additivity of gradients.
Here, the following question will be
studied: given an energy landscape with one basin, how difficult will it be to
form a new basin? The change required
to produce a new basin will vary with its location relative to the existing
basin. This relation will be explored in the first part of this Chapter,
starting with simple basins where goniometric relations of tan(x) can be
used, and extending the results to more
complicated forms.
In the second part of this Chapter, the formation of new basins will be compared to learning phenomena, where previously learned concepts qualitatively influence the formation of new ones.
Part I: Energy basins.
Theorem.
If an energy basin with radius r and center c2
is formed in a one-dimensional energy
landscape with an existing basin (radius R ³ r, center
c1), for a given
volume of this new basin the total change in slope angle is minimal for c2
- c1 = 0, maximal for c2
- c1 = R and intermediate for c2 - c1 ³ R+r, provided
that the slope angle of the new basin is equal to that of the existing basin.
Proof. Consider symmetrical basins of
triangular shape in a one-dimensional
energy landscape. Three cases for the location of the new basin can be
examined:
Case I: new basin with the same center (c2
- c1 = 0) (Fig. 1a).
Case II: new basin centered on the edge of an existing
basin (c2 - c1
= R) (Fig. 1b).
Case III: new basin centered outside existing basin (c2 - c1 ³ R + r) (Fig.
1c).

Figure 1. Triangular basins. Original basin: R=50, slope
angle ß=0.17π, new basin:
r=20, slope angle α3 =
0.22π, as shown in (c). The combined slope angle in (a) is α1
= 0.30π. The combined slope angle on the flank in (b) is α2=
0.06π. Note: all angles and sums of angles are assumed to be between ‑π/2
and +π/2 radians.
In case I, the energy function changes from:
f(x) = -(R-|c1-
x|) . tan ß (for:
|c1- x| ≤ R, 0 elsewhere.)
to:
fnew(x)
= f(x) + ∆f(x)
where ∆f(x) = -(r-|c2-
x|) . tan α3 (for: |c2- x| ≤
r and 0 elsewhere.)
In the overlapping area, the resulting angle α1 is defined by setting :
fnew(x)
= -(r-|c2- x|) . tanα1 (assuming r < R, and
c1=c2).
(α1,2,3 and ß are defined positive, e.g. the
left-hand side of the existing basin has angle - ß)
The volume of the added
basin is the area between the new and the old energy functions:
Δ area = 2r.(fnew
-f) = r².(tan α1 ‑ tan ß)
The total change in angle is calculated by taking the
absolute value of the local change
across the basin: |arctan(dfnew/dx)
- arctan(df/dx) | = α1 – ß over the entire width of the new basin.
In case II:
Δ area =
r².(tan α3)/2 + r²(tan ß)/2 + r²(tan
α2)/2
The change in angle is a2 + ß over one
half and a3 over
the other half of the new basin.
In case III:
Δ area =
r² (tan α3)
The change in angle is a3 over
the entire new basin.
The change in area should be equal in all cases. By
comparing case I and III:
tan α1 ‑ tan ß =
tan α3
Therefore: α3 > α1
for all positive ß. This
implies that the change in angle is greater in case III than in case I. For
cases II and III, the requirement of equal change in area now determines:
tan α2
+ tan ß = tan α3
If α3
> ß (for a given angle ß
of the existing basin), then α2 > 0, so that:
tan (α2 + ß) > tan α2
+ tan ß = tan α3
and therefore: (α2 + ß) > α3.
In total, the required change in angle is greater in case II than in case III.
Case I represents a global minimum and case II a global maximum. This can be shown by examining other locations for the new basin; for R > 2r, consider a location shifted ε to the right from case I, as shown in Fig. 2.

Figure 2: Shifted location of new basin compared to case
I.
In this case, the right half of the new basin follows case I. In Fig. 2a, the left part has 1-ε overlap with the left part of case I and ε with case II. The outcome is a weighted average between case I and II. The situation for ε = 1 is shown in Fig. 2b. In total, the change in angle for shifted cases can be summarized as shown in Fig. 3.
If the new basin is wider than in the examples given above (i.e. r > R/2), the different cases will overlap. For example, if r=R and the new basin is ε to the left of case II, the change is
2.ε.(case
I) + 2.(1-ε).(case II)
Since (case I) < (case II), the maximal change
is found for ε® 0. A
similar reasoning applies to locations to the right of case II. Therefore, the
location where a maximal change is required
remains the same. Similarly, the location of minimal change will remain
the center of the existing basin (case I).

Figure 3. Change in angle (in π/100 rad.) for
various positions of the new basin. R=50, r=20, ß=0.14π, α3=0.22π.
For r > R, the analysis discussed above can be
applied with the existing basin and the new basin interchanged. The location
of maximal change (case II) will therefore be at c2 - c1
= r.
The result can be generalized to basins with shapes other than the triangular shape. For 'trapezium elements' as shown in Figure 4:

Figure 4: Trapezium element, case I. R=50, r=20,
ß=0.17π, α1=0.30π over horizontal distance z=10.
Δ area = 2.
Δ area(triangle, with r replaced by z) + 2.(r ‑ z).z.(tan
α1 ‑ tan ß)
It can be easily seen that the central areas (over which
there is no change in the angle of the existing basin) follow the same
proportionality as the triangles, and the same conditions apply as in the
triangular case. Asymmetrical cases can be treated as the average of two
symmetrical trapezium elements. Since the proportionality is not affected by z,
the result can be generalized to infinitesimal trapezium elements (z®0) and
their sum, and thereby to any shape for which α3 > ß
everywhere. n
Remark: Under the condition discussed above: α3
> ß, case II represents a global maximum and case I a global minimum for
the change in angle. A different picture emerges if the condition is not
met.
If α3
= ß then case II = case III.
If α3
< ß then case II < case III.
In the latter situation, a small change in angle can be
easier on the edge of an existing basin than in neutral territory. However, as
soon as the slope is made more shallow (smaller new ß), the
condition α3 <
ß will be more restrictive. Eventually, the existing basin is neutralized
by successively shallower new basins.
These results also imply that the condition: α3
> ß everywhere (that is: α3i > ßi,
for every stretch i) cannot be relaxed to a condition for the sum of
angles: Σ α3i
> Σ ßi., since this would not ensure that the change in
angle in case II for the entire basin exceeds that in case III. The sufficient condition to ensure a case II
maximum over a sum of stretches is
∑|
arctan (tan(- α3i ) +
tan ßi ) - ßi | > ∑ | α3i |
(Here, α3i and ßi may be positive as well as negative; note that in the overlapping area α3i
has a sign opposite to ßi )
Gaussian functions.
As an additional illustration, the analysis discussed
above can also be applied to Gaussian functions. The extrema of the change in
angle when one function is added to another
f(x) = g(x) + h(x‑s)
are given by:
¥
d/ds ò |arctan(g'(x)+h'(x‑s)) ‑
arctan(g'(x)) | dx = 0
‑¥
where g'(x) = dg(x)/dx.
For Gaussian functions:
g(x) = - exp(‑x²/2σ1²)
and
h(x‑s) = - exp(‑(x‑s)²/2σ2²).
All terms in the integrand approach 0 for s→±¥, which is the relative minimum discussed
before as case III.
The equation for the extrema can be simplified to :
0 ¥
0
= ò d/ds arctan(g'(x)+h'(x-s))dx - ò d/ds arctan(g'(x)+h'(x-s))dx
-¥
0
By
differentiating with respect to s, it can be seen that all functions and derivatives are even, when
s = 0, so that both terms cancel. The solution s = 0
represents the global minimum discussed before as case I.
Under certain conditions, there is also a global
maximum, discussed before as case II. Numerical analysis shows that the location
of this maximum is approximately:
s » (Ö 2π)(σ1
+ σ2)/2.
This value for s differs from that where the
steepest parts of both functions are added (s = σ1 + σ2 ). However, the approximation for s can
be understood from the analogy with case II for the triangular basins. The condition: α3 = ß
with r = R is comparable to a constant sum:
g(x) + h(x-s) over the interval 0 to s, with σ1 = σ2 (see Fig. 5).
When the tail parts of the functions are ignored, the
area defined by g+h in the interval 0 to s is:
width .
maximal depth = s . 1,
so that:
s » (area g)/2 + (area h)/2 = (Ö 2π)(σ1
+ σ2)/2.
The constant depth shown in Fig. 5 is the average for
the sum of two Gaussian functions. The symmetry of the two functions in Fig. 5 is
an approximation for Gaussian functions.

Figure 5. Schematic diagram of the sum of two functions
that are approximately Gaussian.
For steep functions (σ=0.4), the amplitude
of the new function (A2) should be at least that of the
existing function (A1) to produce a case II maximum. For
shallower functions (greater σ), the tail parts of the
function where α3
< ß, are proportionally larger,
so that the required amplitude increases. For example, when σ=1 a
case II maximum occurs only when A2 ³ 1.8 *
A1.
The location of the maximum is not affected by amplitude
factors. This can be seen by calculating the area between 0 and s (with A2
> A1):
s
area = s.A1 + ò ( A2.h(x-s) + A1.g(x)
- A1 ) dx
0
with g(x) + h(x-s) » 1, the integrand is: (A2 -A1).h(x-s).
For both functions approximately half of the total area
is between 0 and s, so that the right-hand side becomes:
s.A1
+ (A2 -A1).
(Ö 2π).σ2/2,
and the left-hand side:
A1.(Ö 2π).σ1/2
+ A2.(Ö 2π).σ2/2.
Therefore: s.A1
» A1.(Ö 2π).σ1/2
+ A1.(Ö 2π).σ2/2,
s » (Ö 2π).σ1/2
+ (Ö 2π).σ2/2,
which is the same expression for s as found before.
When σ1 and σ2
are strongly different, the
approximation: g(x) + h(x-s) = constant cannot be used and has
to be replaced by symmetry in deviations above and below this constant level.
However, numerical analysis shows that also in these cases a case II maximum
occurs. In general, the steeper the new
function, the lower the amplitude ratio
A2/A1 has to be to give a case II maximum. The
location of this maximum is shifted
to higher values of s (by
some 10% for σ1=5.σ2 ), to offset
the lower angles in the tail portion of h.
Extension to higher‑dimensional cases.
The extension to higher-dimensional cases proceeds by treating each dimension separately:
tan(ηi) ≡ ∂ E/∂ xi → tan (ζi) ≡ ∂ Enew/∂ xi by ηi → ζi
For a case II maximum a sufficient condition is therefore that α3i > ßi holds for every dimension i.
[ This requirement can be relaxed - if a length is
defined - to: the length of the vector consisting of the angles of the new basin should be at least the length of the
vector with the angles of the existing basin:
║anglesnew║
> ║anglesexisting║.
For example: consider a
two-dimensional space and add a
new basin with slopes tan(p)
and tan(q) to either a
neutral territory, or the flank of an existing basin with slopes tan(x)
and tan(y).
The vector with elements ∆tanxi is the same in both cases and therefore
has the same length.
For the change in angle, set: c = arctan(tan(x)+tan(p)), d = arctan(tan(y)+tan(q)).
The change in angle becomes (c-x, d-y) vs. (p,q)
The length of
the first vector in Euclidian space is
((c-x)2+(d-y)2)1/2
While the length
of the second vector is : (p2+q2)1/2
For a case II maximum p and q have to fulfill:
( arctan(tan(x)+tan(p)) - x )2 + ( arctan(tan(y)+tan(q)) - y )2 > p2 + q2 .
If |p| > |x| (the same condition as α3 > ß in the geometric examples) then, with p and x opposite sign and x chosen positive, it follows: p < -x , so that:
tan(x) + tan(p) = tan(x+p) . (1- tan(x)tan(p))
> tan(x+p)
and with a similar condition for q and y the above requirement is fulfilled.
Even if p is such that tan(x) + tan(p) < tan(x+p), q can be chosen arbitrarily greater than y to compensate for p. The exact compensation between different dimensions depends on the metric. Since it is not clear beforehand that a Euclidian metric is most apppropriate, the stricter requirement, formulated above, will be used. ]
The requirement α3i > ßi can be demonstrated using geometric examples of energy basins with rotational symmetry, i.e. cones or 'hypercones'. Consider cone‑shaped energy basins over a two-dimensional space. The comparison of the cases I and III is analogous to that in one dimension, by the rotational symmetry of the cones. This can be seen as follows. Add the basins
h1(x,y) = - (R - (x2+y2)1/2) . tan ß and h2x,y) = - (r - (x2+y2)1/2) . tan α .
Take ∂ (h1+h2)/∂ x at y=p, the result is: ( tan ß + tan α ) . x / (x2+p2)1/2.
The result is analogous to that in the central plane (p=0) with a multiplicative factor that does not affect the comparison between case I and case III. In the y-direction, the same result is found, by the rotational symmetry.
In case II, the center of the new cone is located over
the edge of the existing cone (Fig. 6). As in one dimension, the angle of the
cone outside the cross‑section is equal to α3.

Figure 6. Cone-shaped energy basins, seen from below the
horizontal plane; case II.
In the plane through the centers of the two basins (the
''central plane''), the one-dimensional situation is recreated. However, for
the overall change, parallel planes have to be considered. In parallel planes,
the basins have a hyperbolic shape, the slopes of both basins are more shallow
and their overlap is smaller.
To examine such cases, the effect of a shift has to be
studied. For two‑dimensional shifted cases, align the x-dimension with the direction of the shift,
so that the basins are :
h1(x,y)= -(R - (x2+y2)1/2)
. tan ß, and
h2x,y)= -(r -
((x-ε)2+y2)1/2) . tan α.
The angle change in the x-direction
is: |arctan(h'1+h'2) - arctan(h'1)|.
With h'1
= x.tan ß / (x2+y2)1/2 and
h'1+h'2 =
x.tan ß / (x2+y2)1/2
+ (x-ε). tan α / ( (x-ε)2+y2)1/2,
h'1
< 0 for x < 0 and
h'1 > 0 for x > 0,
h'2
< 0 for x < ε and
h'2> 0 for x > ε,
h'1+h'2
< 0 for x < ε/2 and h'1+h'2 > 0 for x > ε/2
(the symmetry for h'1+h'2 shifts to lower
values of x depending on tan α > tan ß.)
Combining this information to evaluate the
absolute value for angle change, the total angle change becomes:
ε
r+ ε
ò ( arctan (h'1) -arctan(h'1+h'2)
) dx + ò ( arctan(h'1+h'2) +
arctan (h'1) ) dx
ε-r ε
Case I is a minimum if the angle change
increases with ε. To test
this, the expression can be differentiated
with respect to ε. The
result is:
2 . arctan(h'1(ε.) ) - 2. arctan(h'1(ε.)+h'2(ε.)
) +
+ arctan(h'1(ε+r) +h'2(ε+r)) +
arctan(h'1(ε-r) +h'2(ε-r)) -
- arctan (h'1(ε+r)) -
arctan (h'1(ε-r)).
Note that h'2 is symmetric around ε. Write +∆
for h'2(ε+r) and -∆
for h'2(ε-r). The
first two terms cancel since h'2(ε)
= 0. The remaining
terms are:
arctan(h'1(ε+r) +
∆ ) + arctan(h'1(ε-r) -
∆ )
- arctan (h'1(ε+r)) -
arctan (h'1(ε-r)).
The asymmetry in the first two terms is greater than that in the last two
terms, due to the ±∆
factors. In conclusion:
∂ (total angle change) / ∂ ε
> 0.
This means that case I is a minimum for
angle change in the x-direction.
In the y-direction, the angles are
∂ h1/∂ y = y.tan ß / (x2+y2)1/2 and ∂ h2/∂ y = y. tan α / ( (x-ε)2+y2)1/2
For
y > 0 both
angles are positive. For y < 0 both angles are negative, with the same
absolute value for the difference. Take
y > 0, the total angle change
is:
ε+r
ò ( arctan(∂ h1/∂ y +∂ h2/∂ y) - arctan (∂ h1/∂ y) ) dx
ε-r
Taking the derivative of this expression
with respect to ε:
∂ (total angle change) /
∂ ε =
arctan(∂ h1/∂ y+∂ h2/∂ y)│x=ε+r
- arctan(∂ h1/∂ y+∂ h2/∂ y)│x=ε-r -
- arctan(∂ h1/∂ y)│x=ε+r +
arctan(∂ h1/∂ y)│x=ε-r
Writing ∂
h2/∂ y as ∆, ∂ h1/∂ y│x=ε+r as p and ∂ h1/∂ y│x=ε-r as q, the expression becomes:
arctan(p+∆) -
arctan(q+ ∆)
- arctan(p) +
arctan(q)
For
ε > 0: p
< q, and the above expression is
positive, due to the decreasing slope of the arctan function Therefore:
∂ (total angle
change) / ∂ ε >
0.
This confirms that case I is a minimum.
For the x-direction the angle
change decreases again for ε
> R, approaching case
III. (More precisely, for every level of y, the trend reverses around
ε > Ry, where Ry is the width of the existing basin at the
given level of y.) The angle
change in the y-direction continues to increase up to case III.
Coming in from case III, two opposite trends are
encountered. Since the angles in the x-direction
are greatest at the edge of the new basin, while the angles in the y-direction
are still close to 0, the increase in angle change in the x-direction
will outweight the facilitation in the y-direction. The initial trend
will therefore be towards greater angle change.
The description above will vary with Ry
and ry (as well as α and ß). For example,
case II will not be a maximum since the angle change in parallel planes will
still increase with further overlap.The exact location of the maximum is
not easily calculated.
However, for the
purpose of the discussion in Part II, it is sufficient that there is
a maximum on the slope of the existing
basin.
Similar conclusions can be reached for n-dimensional
cases (n>2). Again, the comparison between case I and case III
holds, because of
rotational symmetry. The
calculations for the additional dimensions in shifted cases are similar to
those for y in the two-dimensional example.
For absolute coordinates (i.e. there is no rotation
possible to align ε-shift with x-direction), the reasoning
above proceeds with the ε-shift projected onto different
dimensions. Each of these dimensions is comparable to the x-direction in
the example above, with a shift equal to the projection of the ε-shift
onto this dimension.
Part II. Application in descriptions of concept
learning.
Perceptual categories.
Perceptual categories, in which perceptual inputs are
grouped according to their similarity, can be considered basic concepts (Ashby
and Maddox 2005). The formation and the use of perceptual categories can be
studied in the context of neural network models. Here, classification of inputs is linked to the system dynamics,
by describing the attractors for the system (Amari 1983, Cohen and Grossberg
1983). Slightly differing inputs are classified as similar if they lead the
system to the same basin of attraction. For human perception this means that
nearly closed figures are perceived as closed
(i.e. the Gestalt principle of
"closure", Köhler
1940, Kanizsa 1979), since there is no separate attractor for this type of
figures.
In general, according to this description, perception is
guided by basins of attraction within the phase space of the perceiving
system.
Concepts.
Concepts can be considered more general than perceptual
categories, since they can include mental representations of input and more
abstracted items that are not necessarily linked to perception (Hilgard and
Bower 1975, p5-6, Schyns and Rodet 1995, Ashby and Maddox 2005). As a result,
the system-theoretical description will become more general: since a concept
can be vague and can incorporate other concepts, it corresponds to an
attracting region within the phase space rather than an attractor itself
(Dekker 1998, Chapter
4).
The absence of the appropriate concept may be deduced
from failure or errors in problem solving.
For example: failing to find an alternative function of an object
('functional fixedness', Duncker 1945)
or the failure to recognize a more efficient problem solving method
after repeated use of a less efficient one ('set', Luchins 1942).
Concept formation.
Concept formation or concept learning has been studied
experimentally under a variety of headings (Hilgard and Bower 1975, p6). Where
the task involves creating subsets of stimuli, the learning task is called
stimulus categorization or classification. This learning can be supervised,
with experimenter-defined categories, or unsupervised where the subject has to
produce a categorization. If the stimuli are patterns, the learning of one or
more categories of patterns is studied in pattern recognition. If the stimuli
have to be classified on a single dimension, the task is usually called
stimulus discrimination. In such tasks the response that is reinforced depends
on the presence of different stimuli.
In the process of concept formation, new inputs are
classified through a combination of (exemplar-based) generalization and
(rule-based) discrimination (Ashby and Maddox 2005, for examples see Roberts
1996, Saunders et al. 1996). These
basic mechanims also occur in stimulus category formation by animals (Roberts
1996, Zentall 1996).
Transfer.
In general, the formation of a new concept will be
influenced by concepts learned previously (transfer). Both positive and
negative transfer in concept formation or stimulus discrimination have been
demonstrated experimentally.
For example, in animals a difficult discrimination can
be learned faster after training on an easy version of the task than after an
equal amount of training on the task itself (transfer along a continuum,
Mackintosh 1974, p593-597).
An example of negative transfer is the finding that it
is hard to learn that a previously irrelevant attribute in categorization
becomes a valid predictor of category membership (Estes 1994, p163-166). A
similar phenomenon occurs in discrimination learning in animals. After
discriminating between compound stimuli, subsequent discrimination on a
previously redundant attribute is impaired ('blocking', Mackintosh 1974,
p582-583). This is even more dramatically demonstrated (both for animals and
for human subjects) by first explicitly training on one attribute of the
stimuli and then switching to a previously irrelevant attribute. This switch
(called extradimensional shift) is more difficult than learning a new - or even
reversed - discrimination for the trained attribute (intradimensional shift)
(Mackintosh 1974, p597-598).
Interestingly, the difference between
these two types of shift is reduced when the presentation of irrelevant
attributes is minimized during initial training (Mackintosh 1974, p598) or when
the extradimensional shift is to values different from those presented
previously (Kruschke 1996).
Negative transfer can involve complete new categories: it
is difficult to form a subcategory of stimuli that have already been
categorized as similar in a previous task (Estes 1994, p166), even though they
can be distinguished by untrained subjects (Estes 1994, p166, compare blocking
vs. concurrent training, Hilgard and Bower 1975, p572). This may also explain a finding from maze learning in rats : if
one of two paths with the same starting point and end point is blocked, the rat
will choose a completely different path, rather than distinguishing between the
two (Deutsch and Clarkson 1959, Hilgard and Bower, 1975, p147).
The strength of the previous categories can be
manipulated. With increasing categorization, stimuli within a category are
perceived as more similar (Homa et al. 1979). In addition, the dissimilarity
between categories increases (Homa et al. 1979). Changing the rules of categorization early or late in learning
does not affect the rate of learning itself, but the drop in performance (and
the extent of new learning that is required) is greater after a late change in
the categorization rule (Estes 1994, p62-63). Similarly, overtraining on a
discrimination will in general facilitate an intradimensional shift, but has a
neutral to negative effect on extradimensional shifts (Mackintosh 1974, p607).
Finally, it has been found that 'set' becomes stronger over trials: after
presentation of a greater number of examples in which only one solution could
be used, discovery of a more efficient solution in subsequent examples is less
likely (Luchins 1942).
Interpretation.
If the categorization of inputs is allowed to follow
basins of attraction, it is expected that new inputs are accomodated within
existing concepts (deepening the existing basins, case I minimum), or by the creation of new concepts in neutral
territory ( III relative minimum). The
modification of existing concepts will be avoided ( II maximum). Positive transfer is expected for
strengthening existing concepts (case I minimum), while negative transfer is
expected when new learning requires the formation of a concept on the edge of
an existing one (case II maximum). The
finding that extradimensional shift is easier if it involves shifting to values
not present during training suggests
that the new concepts do not overlap the existing ones on the previously
irrelevant dimension (case III relative minimum). Finally, the positive transfer effect observed in transfer along
a continuum may be explained by noting that
a case II -maximum can be avoided by starting the creation of the new
basin in the region of the case III relative minimum.
Discussion.
The choice for arctan(x) has been
influenced by the tan(α) in
energy landscapes. This choice can be generalized to cases where there is a local energy function (but no global
energy function). At first glance, there would be no difference in taking arctan(fi(x)) even when the
system is not a gradient system, but this would no longer allow a geometric
interpretation.
The arctan function is neutral to the
mechanism of change of the system dynamics and can therefore be used when this
mechanism is not yet known. (In cases where it is allowed to make assumptions
about the changing process, a more familiar mathematical formulation can be
used, see below.)
The arctan function expresses the action of a process operating on f(x). Although one could identify f(x) with the input and arctan(f(x)) with the sigmoid activation function of a higher-order system, the nature of the changing process is left unspecified. At this point it is even left open whether the changing process is additive or gradient-sensitive. Consider for example the addition of energy:
E1 → E1 + E2
The gradient vector field can be resolved into additive components:
∂ (E1 + E2)/ ∂ xi = ∂ E1/∂ xi + ∂ E2/∂ xi,
and this is similar for case I, II or III. However, the length and direction of the gradient vector differ across cases I, II and III. If the changing process depends on E2, it is not gradient-sensitive, if it depends on the gradient vectors, it is gradient-sensitive.
Another example is the addition of g(x) to dx/dt = f(x) to produce dx/dt = f(x) + g(x). In other words:
fnew = fold + g.
The addition of g(x) is independent of f(x) and any gradient-sensitivity has to come from dx/dt and appear in the state-transition equations. (In contrast, if fnew = fold . g, the changing process would be gradient-sensitive at all levels.)
Brockett (1976) discusses systems of the form:
dx/dt = f(x) + u(t).g(x).
where u(t) is a 'control'; u(t) can be seen as an input, and g(x) the immediate result of this input. Under certain conditions, f(x) and g(x) remain separable, also at the level of the integrated system equations: the state-transition equations, where f(x) generates an additive term in the exponent exp( ....+ tf) x0. In these cases, there is no gradient-sensitivity. In other cases, the functions f and g interact: their Lie bracket [f,g] is not zero and generates additional terms in the state transition equation (Brockett 1976, Choquet-Bruhat et al. 1977, p158). Therefore, for systems of this form, gradient-sensitivity can be expressed as the Lie bracket between functions. Examples of calculations of the Lie brackets will be given in the Appendix.
For multidimensional cases, the length of a difference vector according to Euclidean metric has been selected. There is some experimental evidence that an alternative metric is more appropriate.
Blough (1972) studied generalization gradients for stimuli varying in one or two dimensions. The gradient for two-dimensional stimuli was shallower than for one-dimensional stimuli (Blough 1972, Fig. 1 vs. Fig. 8 or 10). Assuming that the intensity of a response to a stimulus is determined by its distance to a learned stimulus, these data provide information as to the metric involved in the calculation of distance. The dominance metric, in which the distance along one dimension can outweigh the distance along another:
ds2 = max ( |x|, |y| )
could be excluded by these findings. The city-block metric:
ds2 = |x| + |y|
and the Euclidean metric
ds2 = x2 + y2
gave a better fit, but the response rates suggested a multiplicative interaction between dimensions. Although not considered by Blough (1972), the metric for a non-orthogonal frame
ds2 = x2 + y2 + a. |x|.|y| (a constant)
includes a multiplicative term and would therefore be more appropriate than the Euclidean metric.
The angle change is integrated over the
volume of the basin. This integration represents the total effect of the
changing process over all possible trajectories in the basin. The effect of
volume can be demonstrated experimentally: categories that occupy a wider
volume are more difficult to learn than narrower concepts (Homa and Cultice
1984).
Since the mechanism of change is not specified, the conclusions in part I can also be used to explore the formation of energy basins by an unknown system. If the ease with which a basin can be formed depends on its location relative to existing basins, it can be deduced that there is a sensitivity to the change required to create a new basin (case I ≠ case II ≠ case III). If the sensitivity to change avoids greater angle changes (case I > case III > caseII) it is likely that there is a resistance to change. Note that passive diffusion would tend to reduce steeper slopes and produce results opposite to those discussed in the first part of this chapter. Resistance to change therefore points to control and/or feedback in the changing process. A similar reasoning can be applied to volume sensitivity, mentioned above. If it is more difficult to create a basin of greater volume, there is some feedback on the changing process that acts more strongly against changes over a greater volume.
Cone-shaped basins are an illustration of basins with a global minimum at a point. Their generalization to arbitrary energy basins with a global minimum at a single point includes energy basins around point attractors for dynamical systems. However, the description in part I applies also to changes in the slopes of basins around higher-dimensional attractors, for attracting regions in general and for basins around transient attractors. With these additional examples, the description can be applied to concept formation, discussed in part II, and to transfer effects in concept learning in general.
The experimental examples in Part II form an illustration, but no verification of the principles described in part I, since they do not provide measurement of the system variables. Similarly, verification will not be possible through simulation with models (e.g. neural network models, Buhmann 1995, Schyns and Rodet 1995). Although such models can re-create the principles (for example, similarity criteria for classifying new inputs will produce case I minima, while resistance to change in the system parameters (or inertia) can be introduced to create case II maxima), they do not show that such mechanisms occur in practice.
A more controlled test is possible by restricting the experimental situation to one in which the distance between inputs can be manipulated systematically along a single and defined dimension (e.g. unidimensional concept formation, Feldman 1997). By presenting selected examples, the formation of categories can be directed at various places along this dimension and a specific difficulty is expected for the formation of a new category at the flank of an existing one. In addition, the size of a category can be manipulated and greater difficulty is expected for the formation of a category that requires change over a greater volume. By manipulating the overlap between categories, additional aspects of the changing process can be detected. The difficulty will increase with the overlapping volume, i.e. it will vary with the nth power of the overlap distance R + r - |c1-c2| in n dimensions (with overlap distance < r). This principle may be used to estimate the number of dimensions along which changes have to take place to form a new category.
It would be interesting to measure functional
neuroimaging responses during the classification of single stimuli; both
qualitative and quantitative differences are expected depending on the location
of the stimulus. Specifically, a higher energy use and/or activity in
additional brain regions can be predicted for stimuli at the location of a case
II maximum.
Acknowledgement.
The author gratefully acknowledges helpful suggestions by Prof. Dr. D. Siersma and Dr. E. P. van den Ban, who commented on previous drafts of this manuscript.
References.
Amari
S‑I. (1983): Field theory of self‑organizing neural nets. IEEE
Trans. Systems, Man Cybern. SMC 13. 741‑748.
Ashby
F. G., Maddox W. T. (2005): Human Category Learning. Ann. Rev. Psychol. 56,
149-178.
Blough
D. S. (1972): Recognition by the pigeon of stimuli varying in two dimensions.
J. Exp. Anal. Behav. 18, 345-367.
Brockett
R. W. (1976): Nonlinear systems and differential geometry. Proc. IEEE, 64,
61-72.
Buhmann
J. M. (1995): Data clustering and
learning. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural
Networks. The MIT Press, Cambridge,
Massachusetts.
Choquet‑Bruhat Y., Dewitt‑Morette C.,
Dillard‑Bleick M. (1977): Analysis, Manifolds and
Physics. North Holland Publishing Co. Amsterdam.
Cohen
M. A., Grossberg S. (1983): Absolute stability of global pattern formation and
parallel memory storage by competitive neural networks. IEEE Trans. Syst. Man
Cybern. 13, 815‑826.
Dekker,
A. J. (1998): Neurochemical networks, nonlinear systems and functional
neuroimaging. Report series of the
faculty of mathematics and technical informatics, Delft University of
Technology, Delft, the Netherlands.
Deutsch
J. A., Clarkson J. K. (1959): Reasoning in the hooded rat. Quart. J. Exp.
Psychol. 11, 150-154.
Duncker
K. (1945): On problem solving. (Translated by L. S. Rees from the 1935
original). Psychol. Monogr. 58, no. 270.
Estes
W. K. (1994): Classification and Cognition. Oxford University Press, Oxford.
Feldman
J. (1997): The structure of perceptual categories. J. Math. Psychol. 41,
145-170.
Goles
E. (1995): Energy functions for neural networks. In: Arbib M. A. (Ed.) The
Handbook of Brain Theory and Neural Networks.
The MIT Press, Cambridge, Massachusetts.
Hilgard
E. R., Bower G. H. (1975): Theories of Learning. Prentice-Hall Inc. Englewood
Cliffs, New Jersey.
Homa
D., Rhoads D., Chambliss D. (1979): Evolution of conceptual structure. J. Exp.
Psychol. (Human Learning and Memory). 5, 11‑23.
Homa
D., Cultice J. (1984): Role of feedback, category size and stimulus distortion
on the acquisition and utilization of
ill-defined categories. J. Exp. Psychol. (Learning, Memory and
Cognition) 10, 83-94.
Kanizsa
G. (1979): Organization in Vision: Essays on Gestalt Perception. Praeger
Publishers, New York.
Köhler
W. (1940): Dynamics in Psychology. Liveright Publishing Corp. New York.
Kruschke
J. K. (1996): Dimensional relevance shifts in category learning. Connection
Science 8, 225-248.
Luchins
A. S. (1942): Mechanization in problem solving: the effect of 'Einstellung'.
Psychol. Monogr., 54, no 248.
Mackintosh
N. J. (1974): The Psychology of Animal Learning. Academic Press, New York.
Roberts
W. A. (1996): Stimulus generalization and hierarchal structure in
categorization by animals. In: Zentall T. R. and Smeets P. M. (Eds.) Stimulus
Class Formation in Humans and Animals. (Advances in Psychology 117). Elsevier
Science B.V. Amsterdam.
Saunders
K. J., Williams D. C., Spradkin J. E. (1996): Derived stimulus control: are
there differences among procedures and processes. In: Zentall T. R. and Smeets
P. M. (Eds.) Stimulus Class Formation in Humans and Animals. (Advances in
Psychology 117). Elsevier Science B.V. Amsterdam.
Schyns
P. G., Rodet L. (1995): Concept learning. In: Arbib M. A. (Ed.)
The Handbook of Brain Theory and Neural Networks. The MIT Press, Cambridge, Massachusetts.
Zentall. T. R. (1996): An analysis of stimulus class formation in
animals. In: Zentall T. R. and Smeets P. M. (Eds.) Stimulus Class Formation in
Humans and Animals. (Advances in Psychology 117). Elsevier Science B.V.
Amsterdam.
Appendix. Lie brackets of interacting gradients.
Consider systems of the form: dx/dt = f(x), where a control or input is added: dx/dt = f(x) + u(t).g(x).
Set f(x) generating vector field w
and g(x) generating vector field v, then [f,g] =
vw.
(
vw)i = ∑vq ∂ wi/∂ xq -
∑wq ∂ vi/∂ xq (Choquet-Bruhat et al. 1977, p148).
q q
[Note: superscript indices for contravariant variables.]
Case III.
If the basins are only defined within a
certain radius, w and v are 0 outside that radius and
vw equals
0 for case III. If the vector fields are defined as unit vector
fields outside the basins, case III is purely additive.
Case I.
vw equals
zero for identical vector fields in case I, and differs from zero if the
added basin is steeper than the
existing one. This is comparable to the mimimum angle requirement for the
arctan function.
Case II.
Example 1. Set wi = -xi, vi = ci - xi ,
(
vw)i = ( vi.-1) - (wi.-1) = -vi + wi =
-ci. The Lie derivative increases with the offset c, as
long as the edge of the basin is not crossed. This relation is sufficient to
produce a case II masimum.
Example II. One-dimensional gaussian functions:
E = exp(-x2), F = exp(-(x-p)2), v = ∂ F/∂ x, w= ∂ E/∂ x.
vw = ∂ F/∂ x ∂ 2E/∂ x2 - ∂ E/∂ x ∂ 2F/∂ x2 = (-8x2p
+ 8xp2 - x2) exp( - (x-p)2
- x2).
Using: exp( - (x-p)2 - x2) = exp ( - 2 . (x - p/2)2 - p2/2),
¥
ò exp ( -ax2) dx =
Ö π /Öa,
- ¥
¥
ò x.
exp ( -ax2) dx = 0,
- ¥
¥
ò x2. exp (-ax2) dx = Ö π /
2a3/2,
- ¥
the overall interaction can be calculated :
¥
ò [ ∂ E/∂ x, ∂ F/∂ x ] dx = -3p . Ö (2π) . exp (-p2/2).
- ¥
This function has the value 0 for p = 0, has extrema for p = +1 and p = -1 and approaches 0 for p → ¥ and p→ -¥, cases comparable to case I, II and III with the arctan function, except for the sign.