Chapter 5: Successive formation of energy basins as a model for transfer effects in learning.

(Updated: 18 Sept. 2002,  15 July, 20 Oct., 5,19 Nov. 2004, 13 May 2006)

Summary.

Introduction.

Part I: Energy basins.

Part II. Application in descriptions of concept learning.

Discussion.

Acknowledgement.

References.

Appendix: Lie brackets of interacting gradients.

 

 

 

Summary.

 

The change in slope required for the formation of a new energy basin in an energy landscape is studied for varying locations relative to an existing basin. A local maximum for  this  change is found when the new basin is located on the edge of the existing energy basin. Such a local maximum may explain cases in learning where previously acquired concepts impair the formation of new ones.

 

 

 

Introduction.

 

Given a system dx/dt = f(x) with an energy function h, so that dxi/dt = fi(x) ∂ h/∂ xi (Goles 1995). The energy function induces slopes in an energy landscape:

       dxi/dt = fi(x) ∂ h/∂ xi = tan αi .

To examine the change in  energy landscape and its dependence on the existing landscape, define:

       anglei = arctan (∂ h/∂ xi)

For hhnew,  angleianglei new.

The difference between arctan (x+Δ) and arctan(x) + arctan(Δ) expresses how a pre-existing landscape influences the ease with which the final situation can be reached. Mathematically, the arctan function is a natural way to express the non-additivity of gradients.

Here, the following question will be studied: given an energy landscape with one basin, how difficult will it be to form a new basin?  The change required to produce a new basin will vary with its location relative to the existing basin. This relation will be explored in the first part of this Chapter, starting with simple basins where goniometric relations of tan(x) can be used,  and extending the results to more complicated forms.

In the second part of this Chapter, the formation of new basins will be compared to learning phenomena, where previously learned concepts qualitatively influence the formation of new ones.

 

 

Part I: Energy basins.

 

Theorem.

If an energy basin with radius r and center c2 is formed in a   one-dimensional energy landscape  with  an existing basin (radius   R ³ r, center c1),  for a given volume of this new basin the total change in slope angle is minimal for c2 - c1 = 0, maximal for   c2 - c1 = R and intermediate for c2 - c1 ³ R+r, provided that the slope angle of the new basin is equal to that of the existing basin.

 

Proof. Consider symmetrical basins of triangular shape in a   one-dimensional energy landscape. Three cases for the location of the new basin can be examined:

Case I: new basin with the same center (c2 - c1 = 0) (Fig. 1a).

Case II: new basin centered on the edge of an existing basin   (c2 - c1 = R) (Fig. 1b).

Case III: new basin centered outside existing basin  (c2 - c1 ³ R + r) (Fig. 1c).

 

 

Figure 1. Triangular basins. Original basin: R=50, slope angle ß=0.17π,  new basin: r=20,  slope angle α3 = 0.22π, as shown in (c). The combined slope angle in (a) is α1 = 0.30π. The combined slope angle on the flank in (b) is α2= 0.06π. Note: all angles and sums of angles are assumed to be between ‑π/2 and +π/2 radians.

 

 

In case I, the energy function changes from:

   f(x) = -(R-|c1- x|) . tan ß          (for:  |c1- x| ≤  R,   0  elsewhere.)

 to:

    fnew(x) = f(x) + ∆f(x)

where  ∆f(x) =  -(r-|c2- x|) . tan α3      (for:   |c2- x| ≤ r  and   0  elsewhere.)

In the overlapping area,  the resulting angle α1  is defined by setting :  

    fnew(x)   =  -(r-|c2- x|) . tanα1       (assuming r < R,  and  c1=c2).

(α1,2,3  and ß are defined positive, e.g. the left-hand side of the existing basin has angle - ß)

The volume of the added  basin is the area between the new and the old energy functions:

   Δ area = 2r.(fnew -f) = r².(tan α1 ‑ tan ß)

The total change in angle is calculated by taking the absolute value of the local change  across the basin:  |arctan(dfnew/dx) - arctan(df/dx) |  =  α1 – ß    over the entire width of the new basin.

In case II:

   Δ area = r².(tan α3)/2 + r²(tan ß)/2 + r²(tan α2)/2  

The change in angle is a2 + ß over one half and a3 over the other half of the new basin.

In case III:

   Δ area = r² (tan α3)

The change in angle is a3 over the entire new basin.

 

 

The change in area should be equal in all cases. By comparing case I and III:

   tan α1 ‑ tan ß  =  tan α3

Therefore: α3 > α1  for all positive ß. This implies that the change in angle is greater in case III than in case I. For cases II and III, the requirement of equal change in area now determines:  

   tan α2 + tan ß  =  tan α3

If  α3 > ß  (for a given angle ß of the existing basin),  then  α2 > 0, so that:

   tan (α2 + ß) > tan α2 + tan ß = tan α3

and therefore:2 + ß) > α3. In total, the required change in angle is greater in case II than in case III.

 

Case I represents a global minimum and case II a global maximum. This can be shown by examining other locations for the new basin; for R > 2r,  consider a location shifted  ε  to the right from case I,   as shown in Fig. 2.

 

 

Figure 2: Shifted location of new basin compared to case I.

 

 

In this case, the right half of the new basin follows case I. In Fig. 2a, the left part has 1overlap with the left part of case I and ε with case II. The outcome is a weighted average between case I and II. The situation for ε = 1 is shown in Fig. 2b. In total, the change in angle for shifted cases can be summarized as shown in Fig. 3.

 

If the new basin is wider than in the examples given above (i.e. r > R/2), the different cases will overlap. For example, if r=R and the new basin is ε to the left of case II, the change is

   2.ε.(case I) + 2.(1-ε).(case II)

Since (case I) < (case II), the maximal change is found for ε® 0. A similar reasoning applies to locations to the right of case II. Therefore, the location where a maximal change is required  remains the same. Similarly, the location of minimal change will remain the center of the existing basin (case I).

 

Figure 3. Change in angle (in π/100 rad.) for various positions of the new basin. R=50, r=20, ß=0.14π, α3=0.22π.

 

 

For r > R, the analysis discussed above can be applied with the existing basin and the new basin interchanged. The location of  maximal  change  (case II)  will therefore be at c2 - c1 = r.

 

 

The result can be generalized to basins with shapes other than the triangular shape. For 'trapezium elements' as shown in Figure 4:

 

Figure 4: Trapezium element, case I. R=50, r=20, ß=0.17π, α1=0.30π over horizontal distance z=10.

 

 

   Δ area = 2. Δ area(triangle, with r replaced by z) + 2.(r ‑ z).z.(tan α1 ‑ tan ß)

 

It can be easily seen that the central areas (over which there is no change in the angle of the existing basin) follow the same proportionality as the triangles, and the same conditions apply as in the triangular case. Asymmetrical cases can be treated as the average of two symmetrical trapezium elements. Since the proportionality is not affected by z, the result can be generalized to infinitesimal trapezium elements (z®0) and their sum, and thereby to any shape for which α3 > ß everywhere. n

 

Remark: Under the condition discussed above: α3 > ß, case II represents a global maximum and case I a global minimum for the change in angle. A different picture emerges if the condition is not met. 

If  α3 = ß then  case II = case III.

If  α3 < ß  then  case II < case III. 

In the latter situation, a small change in angle can be easier on the edge of an existing basin than in neutral territory. However, as soon as the slope is made more shallow (smaller new ß), the condition    α3 < ß will be more restrictive. Eventually, the existing basin is neutralized by successively shallower new basins.

These results also imply that the condition: α3 > ß everywhere (that is: α3i > ßi, for every stretch i) cannot be relaxed to a condition for the sum of angles:  Σ α3i > Σ ßi., since this would not ensure that the change in angle in case II for the entire basin exceeds that in case III.  The sufficient condition to ensure a case II maximum over  a sum of stretches is  

   ∑| arctan (tan(- α3i ) +  tan ßi )   -   ßi |   >     | α3i | 

(Here, α3i and  ßi  may be positive as well as negative;  note that in the overlapping area α3i has a sign opposite to ßi )

 

 

Gaussian functions.

 

As an additional illustration, the analysis discussed above can also be applied to Gaussian functions. The extrema of the change in angle when one function is added to another

   f(x) = g(x) + h(x‑s)

are given by:

            ¥

  d/ds ò  |arctan(g'(x)+h'(x‑s)) ‑ arctan(g'(x)) | dx = 0               

      ¥

where g'(x) = dg(x)/dx.

For Gaussian functions: 

   g(x) = - exp(‑x²/2σ1²)  and 

   h(x‑s) = - exp(‑(x‑s)²/2σ2²).

All terms in the integrand approach 0 for s→±¥,  which is the relative minimum discussed before as case III.

 

The equation for the extrema can be simplified to :

             0                                                   ¥

   0 =  ò  d/ds arctan(g'(x)+h'(x-s))dx   -   ò  d/ds arctan(g'(x)+h'(x-s))dx

       -¥                                                   0

 

By  differentiating  with  respect to s,  it can be seen that all functions and derivatives are even,   when  s = 0, so that both terms cancel. The solution s = 0 represents the global minimum discussed before as case I.

 

Under certain conditions, there is also a global maximum, discussed before as case II. Numerical analysis shows that the location of this maximum is approximately:

   s  »  (Ö 2π)(σ1 + σ2)/2.

This value for s differs from that where the steepest parts of both functions are added (s =  σ1 + σ2 ).  However, the approximation for s can be understood from the analogy with case II for the triangular basins.  The condition:  α3 = ß   with  r = R   is comparable to a constant sum:

   g(x) + h(x-s) over the interval 0 to s,   with σ1 = σ2   (see Fig. 5).  

When the tail parts of the functions are ignored, the area defined by g+h in the interval 0 to s is:

   width . maximal depth  =  s . 1,

so that: 

   s »  (area g)/2 + (area h)/2  =  (Ö 2π)(σ1 + σ2)/2.

The constant depth shown in Fig. 5 is the average for the sum of two Gaussian functions. The symmetry of the two functions in Fig. 5 is an approximation for Gaussian functions.  

 

Figure 5. Schematic diagram of the sum of two functions that are approximately Gaussian.

 

 

For steep functions (σ=0.4), the amplitude of the new function (A2) should be at least that of the existing function (A1) to produce a case II maximum. For shallower functions (greater σ), the tail parts of the function  where α3 < ß,  are proportionally larger, so that the required amplitude increases. For example, when σ=1 a case II maximum occurs only when A2 ³ 1.8 * A1. 

The location of the maximum is not affected by amplitude factors. This can be seen by calculating the area between 0 and s (with A2 > A1):

                              s

   area = s.A1  +  ò  ( A2.h(x-s) + A1.g(x) - A1 ) dx

                          0

with g(x) + h(x-s) »  1, the integrand is: (A2 -A1).h(x-s).

For both functions approximately half of the total area is between 0 and s, so that the right-hand side becomes:

   s.A1 +  (A2 -A1). (Ö 2π).σ2/2,

and the left-hand side:

   A1.(Ö 2π).σ1/2 + A2.(Ö 2π).σ2/2.

Therefore:   s.A1 »  A1.(Ö 2π).σ1/2 + A1.(Ö 2π).σ2/2,    

                          s »  (Ö 2π).σ1/2 + (Ö 2π).σ2/2,

which is the same expression for s  as found before.

When σ1 and σ2 are strongly different,  the approximation: g(x) + h(x-s) = constant   cannot  be used and has to be replaced by symmetry in deviations above and below this constant level. However, numerical analysis shows that also in these cases a case II maximum occurs. In general,  the steeper the new function,  the lower the amplitude ratio A2/A1 has to be to give a case II maximum. The location of this maximum  is  shifted  to higher values of s  (by some 10% for σ1=52 ), to offset the lower angles in the tail portion of h. 

 

 

Extension to higher‑dimensional cases.

 

The extension to higher-dimensional cases proceeds by treating each dimension separately:

    tan(ηi) ≡  ∂ E/∂ xi    tan (ζi) ≡  ∂ Enew/∂ xi          by      ηi   ζi       

For a case II maximum a sufficient condition is therefore that  α3i  >  ßi  holds  for every dimension i.

[ This requirement can be relaxed - if a length is defined -  to:  the length of the vector consisting of  the angles of the new basin should be at least the length of the vector with the angles of the existing basin:

    ║anglesnew║ >  ║anglesexisting║.

For example: consider a  two-dimensional space and  add a new basin with slopes   tan(p)   and   tan(q) to either a neutral territory,  or  the flank of an existing basin with slopes   tan(x)   and   tan(y).   The vector  with elements  ∆tanxi  is the same in both cases and therefore has the same length. 

For the change in angle,  set:  c =  arctan(tan(x)+tan(p)),   d = arctan(tan(y)+tan(q)).

The change in angle becomes   (c-x, d-y)  vs. (p,q) 

The length of the first vector in Euclidian space is    ((c-x)2+(d-y)2)1/2   

While the length of the second vector is : (p2+q2)1/2

For a case II maximum  p and q have to fulfill:

  ( arctan(tan(x)+tan(p)) - x )2  +  (  arctan(tan(y)+tan(q)) - y )2   >   p2 + q2 .

If |p| > |x|     (the same condition as α3  >  ß in the geometric examples)   then,  with p and x opposite sign and x chosen positive,  it follows:   p <  -x ,   so that:

   tan(x) + tan(p)   =   tan(x+p) . (1- tan(x)tan(p))   >   tan(x+p)       

and with a similar condition for q and y  the above requirement is fulfilled.

Even if  p is such that   tan(x) + tan(p) < tan(x+p),    q can be chosen arbitrarily greater than y to compensate for p.  The exact compensation between different dimensions depends on the metric. Since  it  is  not  clear beforehand that a Euclidian metric is most apppropriate,  the stricter requirement, formulated  above,  will be used. ]

 

The requirement α3i  >  ßi  can be demonstrated  using  geometric examples  of energy basins  with  rotational symmetry, i.e. cones or 'hypercones'. Consider cone‑shaped energy basins over a two-dimensional space. The comparison of the cases I and III is analogous to that in one dimension, by the rotational symmetry of the cones.  This can be seen as follows.   Add the basins

    h1(x,y) = - (R - (x2+y2)1/2) . tan ß    and    h2x,y) = - (r - (x2+y2)1/2) . tan α .

Take ∂ (h1+h2)/∂ x   at  y=p,  the result  is:  ( tan ß  + tan α ) . x / (x2+p2)1/2.

The result is analogous to that in the central plane (p=0) with a multiplicative factor that does not affect the comparison between case I and case III.   In the y-direction, the same result is found, by the rotational symmetry.

In case II, the center of the new cone is located over the edge of the existing cone (Fig. 6). As in one dimension, the angle of the cone outside the cross‑section is equal to α3.

 

Figure 6. Cone-shaped energy basins, seen from below the horizontal plane; case II.

 

 

In the plane through the centers of the two basins (the ''central plane''), the one-dimensional situation is recreated. However, for the overall change, parallel planes have to be considered. In parallel planes, the basins have a hyperbolic shape, the slopes of both basins are more shallow and their overlap is smaller.

 

To examine such cases, the effect of a shift has to be studied. For two‑dimensional shifted cases,    align the x-dimension with the direction of the shift, so that the basins are :

   h1(x,y)= -(R - (x2+y2)1/2) . tan ß,      and       h2x,y)= -(r - ((x-ε)2+y2)1/2) . tan α.

The angle change in the x-direction is:     |arctan(h'1+h'2) - arctan(h'1)|.

With     h'1 = x.tan ß / (x2+y2)1/2    and   h'1+h'2 =  x.tan ß / (x2+y2)1/2 +  (x-ε). tan α / ( (x-ε)2+y2)1/2,

   h'1 < 0    for   x < 0   and   h'1 > 0   for    x > 0,      

   h'2 < 0    for   x < ε    and   h'2> 0   for    x > ε,

   h'1+h'2 < 0 for   x < ε/2 and   h'1+h'2 > 0  for    x > ε/2

(the symmetry for h'1+h'2  shifts to lower values of x depending on   tan α > tan ß.)

Combining this information to evaluate the absolute value for angle change, the total angle change becomes:
        ε                                                                  r+ ε

      ò  ( arctan (h'1) -arctan(h'1+h'2) ) dx  +     ò  ( arctan(h'1+h'2) + arctan (h'1) ) dx

   ε-r                                                                   ε

Case I is a minimum if the angle change increases with  ε. To test this,  the expression can be differentiated with respect to ε.  The result is:

   2 . arctan(h'1(ε.) ) -  2. arctan(h'1(ε.)+h'2(ε.) ) +

   +  arctan(h'1(ε+r) +h'2(ε+r)) + arctan(h'1(ε-r) +h'2(ε-r)) -

   -   arctan (h'1(ε+r)) - arctan (h'1(ε-r)).

Note that h'2  is symmetric around ε.   Write  +   for   h'2(ε+r)  and  -   for   h'2(ε-r).  The first two terms cancel since  h'2(ε) = 0.  The remaining terms are:

   arctan(h'1(ε+r) + )  +  arctan(h'1(ε-r) - )   -   arctan (h'1(ε+r)) - arctan (h'1(ε-r)).

The asymmetry in the first two  terms is greater than that in the last two terms,  due to the  ±∆ factors.  In conclusion:

   ∂ (total angle change) / ∂  ε  > 0.

This means that case I is a minimum for angle change in the x-direction.

 

In the y-direction, the angles are

    h1/∂ y =  y.tan ß / (x2+y2)1/2  and  h2/∂ y = y. tan α / ( (x-ε)2+y2)1/2

For  y > 0  both angles are positive. For   y < 0   both angles are negative, with the same absolute value for the difference.  Take y > 0,  the total angle change is:

        ε+r                                                                  

      ò  ( arctan(h1/∂ y +h2/∂ y)  -  arctan (h1/∂ y) ) dx

   ε-r           

Taking the derivative of this expression with respect to   ε:

    ∂ (total angle change) / ∂  ε =  

                   arctan(h1/∂ y+h2/∂ y)│x=ε+r  - arctan(h1/∂ y+h2/∂ y)│x=ε-r  -  

                                                      - arctan(h1/∂ y)x=ε+r    +    arctan(h1/∂ y)x=ε-r      

Writing  h2/∂ y  as  ∆,   h1/∂ yx=ε+r  as  p   and   h1/∂ yx=ε-r  as   q,  the expression  becomes:

     arctan(p+∆)  -  arctan(q+ ∆)  -  arctan(p)  +  arctan(q)   

For  ε > 0:     p < q,  and the above expression is positive, due to the decreasing slope of the arctan function  Therefore:

      ∂ (total angle change) / ∂  ε  > 0.

This confirms that case I is a minimum.

 

For the x-direction the angle change decreases again for  ε > R,  approaching case III. (More precisely, for every level of  y, the trend reverses around   ε  > Ry, where Ry  is the width of the existing basin at the given level of y.)  The angle change in the y-direction continues to increase up to case III.

Coming in from case III, two opposite trends are encountered.  Since the angles in the x-direction are greatest at the edge of the new basin, while the angles in  the  y-direction are still close to 0, the increase in angle change in the x-direction will outweight the facilitation in the y-direction. The initial trend will therefore be towards greater angle change.

The description above will vary with Ry and ry (as well as α and ß). For example, case II will not be a maximum since the angle change in parallel planes will still increase with further overlap.The exact location of the maximum  is  not easily calculated.  However,  for  the  purpose of the discussion in Part II, it is sufficient that there is a  maximum on the slope of the existing basin.

 

Similar conclusions can be reached for n-dimensional cases (n>2). Again, the comparison between  case I  and  case III  holds,  because  of  rotational  symmetry. The calculations for the additional dimensions in shifted cases are similar to those for y in the two-dimensional example.

 

For absolute coordinates (i.e. there is no rotation possible to align ε-shift with x-direction), the reasoning above proceeds with the ε-shift projected onto different dimensions. Each of these dimensions is comparable to the x-direction in the example above, with a shift equal to the projection of the ε-shift onto this dimension.

 

 

 

 

Part II. Application in descriptions of concept learning.

 

Perceptual categories.

Perceptual categories, in which perceptual inputs are grouped according to their similarity, can be considered basic concepts (Ashby and Maddox 2005). The formation and the use of perceptual categories can be studied in the context of neural network models.  Here, classification of inputs is linked to the system dynamics, by describing the attractors for the system (Amari 1983, Cohen and Grossberg 1983). Slightly differing inputs are classified as similar if they lead the system to the same basin of attraction. For human perception this means that nearly closed figures are perceived as closed  (i.e. the Gestalt principle of  "closure",  Köhler 1940, Kanizsa 1979), since there is no separate attractor for this type of figures. 

In general, according to this description, perception is guided by basins of attraction within the phase space of the perceiving system.  

 

Concepts.

Concepts can be considered more general than perceptual categories, since they can include mental representations of input and more abstracted items that are not necessarily linked to perception (Hilgard and Bower 1975, p5-6, Schyns and Rodet 1995, Ashby and Maddox 2005). As a result, the system-theoretical description will become more general: since a concept can be vague and can incorporate other concepts, it corresponds to an attracting region within the phase space rather than an attractor itself (Dekker 1998,  Chapter 4).

The absence of the appropriate concept may be deduced from failure or errors in problem solving.  For example: failing to find an alternative function of an object ('functional fixedness', Duncker 1945)  or the failure to recognize a more efficient problem solving method after repeated use of a less efficient one ('set', Luchins 1942).

 

 

Concept formation.

Concept formation or concept learning has been studied experimentally under a variety of headings (Hilgard and Bower 1975, p6). Where the task involves creating subsets of stimuli, the learning task is called stimulus categorization or classification. This learning can be supervised, with experimenter-defined categories, or unsupervised where the subject has to produce a categorization. If the stimuli are patterns, the learning of one or more categories of patterns is studied in pattern recognition. If the stimuli have to be classified on a single dimension, the task is usually called stimulus discrimination. In such tasks the response that is reinforced depends on the presence of different stimuli.

In the process of concept formation, new inputs are classified through a combination of (exemplar-based) generalization and (rule-based) discrimination (Ashby and Maddox 2005, for examples see Roberts 1996,  Saunders et al. 1996). These basic mechanims also occur in stimulus category formation by animals (Roberts 1996, Zentall 1996). 

 

 

Transfer.

In general, the formation of a new concept will be influenced by concepts learned previously (transfer). Both positive and negative transfer in concept formation or stimulus discrimination have been demonstrated experimentally.

For example, in animals a difficult discrimination can be learned faster after training on an easy version of the task than after an equal amount of training on the task itself (transfer along a continuum, Mackintosh 1974, p593-597). 

An example of negative transfer is the finding that it is hard to learn that a previously irrelevant attribute in categorization becomes a valid predictor of category membership (Estes 1994, p163-166). A similar phenomenon occurs in discrimination learning in animals. After discriminating between compound stimuli, subsequent discrimination on a previously redundant attribute is impaired ('blocking', Mackintosh 1974, p582-583). This is even more dramatically demonstrated (both for animals and for human subjects) by first explicitly training on one attribute of the stimuli and then switching to a previously irrelevant attribute. This switch (called extradimensional shift) is more difficult than learning a new - or even reversed - discrimination for the trained attribute (intradimensional shift) (Mackintosh 1974,  p597-598). Interestingly, the difference  between these two types of shift is reduced when the presentation of irrelevant attributes is minimized during initial training (Mackintosh 1974, p598) or when the extradimensional shift is to values different from those presented previously (Kruschke 1996).

Negative transfer can involve complete new categories: it is difficult to form a subcategory of stimuli that have already been categorized as similar in a previous task (Estes 1994, p166), even though they can be distinguished by untrained subjects (Estes 1994, p166, compare blocking vs. concurrent training, Hilgard and Bower 1975, p572).  This may also explain  a finding from maze learning in rats : if one of two paths with the same starting point and end point is blocked, the rat will choose a completely different path, rather than distinguishing between the two (Deutsch and Clarkson 1959, Hilgard and Bower, 1975, p147).

 

The strength of the previous categories can be manipulated. With increasing categorization, stimuli within a category are perceived as more similar (Homa et al. 1979). In addition, the dissimilarity between categories increases (Homa et al. 1979).  Changing the rules of categorization early or late in learning does not affect the rate of learning itself, but the drop in performance (and the extent of new learning that is required) is greater after a late change in the categorization rule (Estes 1994, p62-63). Similarly, overtraining on a discrimination will in general facilitate an intradimensional shift, but has a neutral to negative effect on extradimensional shifts (Mackintosh 1974, p607). Finally, it has been found that 'set' becomes stronger over trials: after presentation of a greater number of examples in which only one solution could be used, discovery of a more efficient solution in subsequent examples is less likely (Luchins 1942).

 

 

Interpretation.

If the categorization of inputs is allowed to follow basins of attraction, it is expected that new inputs are accomodated within existing concepts (deepening the existing basins,  case I minimum), or by the creation of new concepts in neutral territory ( III relative minimum).  The modification of existing concepts will be avoided ( II maximum).  Positive transfer is expected for strengthening existing concepts (case I minimum), while negative transfer is expected when new learning requires the formation of a concept on the edge of an existing one (case II maximum).  The finding that extradimensional shift is easier if it involves shifting to values not present during training suggests  that the new concepts do not overlap the existing ones on the previously irrelevant dimension (case III relative minimum).  Finally, the positive transfer effect observed in transfer along a continuum may be explained by noting that  a case II -maximum can be avoided by starting the creation of the new basin in the region of the case III relative minimum.

 

 

 

 

Discussion.

 

The choice for arctan(x) has been influenced by the tan(α) in energy landscapes. This choice can be generalized to cases where there is  a local energy function (but no global energy function). At first glance, there would be no difference in taking arctan(fi(x)) even when the system is not a gradient system, but this would no longer allow a geometric interpretation. 

The arctan function is neutral to the mechanism of change of the system dynamics and can therefore be used when this mechanism is not yet known. (In cases where it is allowed to make assumptions about the changing process, a more familiar mathematical formulation can be used, see below.)

 

The arctan function expresses the action of a  process operating on f(x). Although one could identify f(x) with the input and arctan(f(x)) with the sigmoid activation function of a higher-order system, the nature of the changing process is left unspecified. At this point it is even left open whether the changing process is additive or gradient-sensitive. Consider for example the addition of energy:

    E1  → E1 + E2 

The gradient vector field can be resolved into additive components:

    ∂ (E1 + E2)/ ∂ xi  = ∂ E1/∂ xi  +  ∂ E2/∂ xi,

and this is similar for case I, II or III.  However,  the length and direction of the gradient vector differ across cases I, II and III.  If the changing process depends on E2, it is not gradient-sensitive,  if it depends on the gradient vectors, it is gradient-sensitive. 

Another example is the addition of g(x) to dx/dt = f(x) to produce dx/dt = f(x) + g(x). In other words:

     fnew = fold + g.

The addition of g(x) is independent of f(x) and any gradient-sensitivity has to come from dx/dt and appear in the state-transition equations. (In contrast, if   fnew = fold . g,  the changing process would be gradient-sensitive at all levels.)

Brockett (1976) discusses systems of the form:

    dx/dt = f(x) + u(t).g(x).

where u(t) is a 'control';  u(t) can be seen as an input, and g(x) the immediate result of this input. Under certain conditions, f(x) and g(x) remain separable, also at the level of the integrated  system equations: the state-transition equations, where f(x) generates an additive term in the exponent exp( ....+ tf) x0. In these cases, there is no gradient-sensitivity. In other cases, the functions f and g interact: their Lie bracket  [f,g]  is not zero and generates additional terms in the state transition equation (Brockett 1976, Choquet-Bruhat et al. 1977, p158). Therefore, for systems of this form, gradient-sensitivity can be expressed as the Lie bracket between functions. Examples of calculations of the Lie brackets will be given in the Appendix.

 

For multidimensional cases, the length of a difference vector according to Euclidean metric has been selected. There is some experimental evidence that an alternative metric is more appropriate.

Blough (1972) studied generalization gradients for stimuli varying in one or two dimensions. The gradient for two-dimensional stimuli was shallower than for one-dimensional stimuli (Blough 1972, Fig. 1 vs. Fig. 8 or 10). Assuming that the intensity of a response to a stimulus is determined by its distance to a learned stimulus, these data provide information as to the metric involved in the calculation of distance. The dominance metric, in which the distance along one dimension can outweigh the distance along another:

     ds2 = max ( |x|, |y| )

could be excluded by these findings. The city-block metric:

     ds2 = |x| + |y|

and the Euclidean metric

     ds2 = x2 + y2

gave a better fit, but the response rates suggested a multiplicative interaction between dimensions. Although not considered by Blough (1972), the metric for a non-orthogonal frame

     ds2 = x2 + y2 +  a. |x|.|y|           (a constant)

includes a multiplicative term and would therefore be more appropriate than the Euclidean metric.

 

The angle change is integrated over the volume of the basin. This integration represents the total effect of the changing process over all possible trajectories in the basin. The effect of volume can be demonstrated experimentally: categories that occupy a wider volume are more difficult to learn than narrower concepts (Homa and Cultice 1984).

 

Since the mechanism of change is not specified, the conclusions in part I can also be used  to explore the formation of energy basins by an unknown system. If the ease with which a basin can be formed depends on its location relative to existing basins, it can be deduced that there is a sensitivity to the change required to create a new basin (case I ≠ case II ≠ case III). If the sensitivity to change avoids greater angle changes (case I  > case III > caseII)  it is likely that there is a resistance to change. Note that passive diffusion would tend to reduce steeper slopes and produce results opposite to those discussed in the first part of this chapter. Resistance to change therefore points to control and/or feedback in the changing process.   A similar reasoning can be applied to volume sensitivity, mentioned above. If it is more difficult to create a basin of greater volume, there is some feedback on the changing process that acts more strongly against changes over a greater volume. 

 

Cone-shaped basins are an illustration of basins with a global minimum at a point. Their generalization to arbitrary energy basins with a global minimum at a single point includes energy basins around point attractors for dynamical systems. However, the description in part I  applies also to changes in the slopes of basins around higher-dimensional attractors, for attracting regions in general and for basins around transient attractors. With these additional examples, the description can be applied to concept formation, discussed in part II, and to transfer effects in concept learning in general.

 

The experimental examples in Part II form an illustration, but no  verification of the principles described in part I, since they do not provide  measurement of the system variables.  Similarly, verification will not be possible through simulation with models (e.g. neural network models, Buhmann 1995, Schyns and Rodet 1995). Although such models can re-create the principles (for example,  similarity criteria for classifying new inputs will produce case I minima, while resistance to change in the system parameters (or inertia) can be introduced to create case II maxima),  they do not show that such mechanisms occur in practice.

A more controlled test is possible by restricting the experimental situation to one  in which the distance between inputs can be manipulated systematically along a single and defined dimension (e.g. unidimensional concept formation, Feldman 1997). By presenting selected examples, the formation of categories can be directed at various places along this dimension and a specific difficulty is expected for the formation of a new category at the flank of an existing one. In addition, the size of a category can be manipulated and greater difficulty is expected for the formation of a category that requires change over a greater volume. By manipulating the overlap between categories, additional aspects of the changing process can be detected. The difficulty will increase with the overlapping volume, i.e. it will vary with the nth  power of  the overlap distance     R + r - |c1-c2|   in  n dimensions (with overlap distance <  r). This principle may be used to estimate the number of dimensions along which changes have to take place to form a new category. 

It would be interesting to measure functional neuroimaging responses during the classification of single stimuli; both qualitative and quantitative differences are expected depending on the location of the stimulus. Specifically, a higher energy use and/or activity in additional brain regions can be predicted for stimuli at the location of a case II maximum.   

 

 

 

Acknowledgement.

 

The author gratefully acknowledges helpful suggestions by Prof. Dr. D. Siersma and Dr. E. P. van den Ban, who commented on previous drafts of this manuscript. 

 

 

 

References.

 

Amari S‑I. (1983): Field theory of self‑organizing neural nets. IEEE Trans. Systems, Man Cybern. SMC 13. 741‑748.

 

Ashby F. G., Maddox W. T. (2005): Human Category Learning. Ann. Rev. Psychol. 56, 149-178.

 

Blough D. S. (1972): Recognition by the pigeon of stimuli varying in two dimensions. J. Exp. Anal. Behav. 18, 345-367.

 

Brockett R. W. (1976): Nonlinear systems and differential geometry. Proc. IEEE, 64, 61-72.

 

Buhmann J.  M. (1995): Data clustering and learning. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural Networks.  The MIT Press, Cambridge, Massachusetts.

 

Choquet‑Bruhat Y.,  Dewitt‑Morette C.,  Dillard‑Bleick M. (1977): Analysis,  Manifolds  and Physics.  North Holland Publishing  Co. Amsterdam.

 

Cohen M. A., Grossberg S. (1983): Absolute stability of global pattern formation and parallel memory storage by competitive neural networks. IEEE Trans. Syst. Man Cybern. 13, 815‑826.

 

Dekker, A. J. (1998): Neurochemical networks, nonlinear systems and functional neuroimaging. Report series of the  faculty of mathematics and technical informatics, Delft University of Technology, Delft, the Netherlands.

 

Deutsch J. A., Clarkson J. K. (1959): Reasoning in the hooded rat. Quart. J. Exp. Psychol. 11, 150-154.

 

Duncker K. (1945): On problem solving. (Translated by L. S. Rees from the 1935 original). Psychol. Monogr. 58, no. 270.

 

Estes W. K. (1994): Classification and Cognition. Oxford University Press, Oxford.

 

Feldman J. (1997): The structure of perceptual categories. J. Math. Psychol. 41, 145-170.

 

Goles E. (1995): Energy functions for neural networks. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural Networks.  The MIT Press, Cambridge, Massachusetts.

 

Hilgard E. R., Bower G. H. (1975): Theories of Learning. Prentice-Hall Inc. Englewood Cliffs, New Jersey.

 

Homa D., Rhoads D., Chambliss D. (1979): Evolution of conceptual structure. J. Exp. Psychol. (Human Learning and Memory). 5, 11‑23.

 

Homa D., Cultice J. (1984): Role of feedback, category size and stimulus distortion on the acquisition and utilization of  ill-defined categories. J. Exp. Psychol. (Learning, Memory and Cognition) 10, 83-94.

 

Kanizsa G. (1979): Organization in Vision: Essays on Gestalt Perception. Praeger Publishers, New York.

 

Köhler W. (1940): Dynamics in Psychology. Liveright Publishing Corp. New York.

 

Kruschke J. K. (1996): Dimensional relevance shifts in category learning. Connection Science 8, 225-248.

 

Luchins A. S. (1942): Mechanization in problem solving: the effect of 'Einstellung'. Psychol. Monogr., 54, no 248.

 

Mackintosh N. J. (1974): The Psychology of Animal Learning. Academic Press, New York.

 

Roberts W. A. (1996): Stimulus generalization and hierarchal structure in categorization by animals. In: Zentall T. R. and Smeets P. M. (Eds.) Stimulus Class Formation in Humans and Animals. (Advances in Psychology 117). Elsevier Science B.V. Amsterdam.

 

Saunders K. J., Williams D. C., Spradkin J. E. (1996): Derived stimulus control: are there differences among procedures and processes. In: Zentall T. R. and Smeets P. M. (Eds.) Stimulus Class Formation in Humans and Animals. (Advances in Psychology 117). Elsevier Science B.V. Amsterdam.

 

Schyns P. G., Rodet L. (1995): Concept learning. In: Arbib M. A.  (Ed.)  The Handbook of Brain Theory and Neural Networks.  The MIT Press, Cambridge, Massachusetts.

 

Zentall. T. R. (1996): An analysis of stimulus class formation in animals. In: Zentall T. R. and Smeets P. M. (Eds.) Stimulus Class Formation in Humans and Animals. (Advances in Psychology 117). Elsevier Science B.V. Amsterdam.

 

 

 

 

Appendix.  Lie brackets of interacting gradients.

 

Consider systems of the form:  dx/dt = f(x), where a control or input is added: dx/dt = f(x) + u(t).g(x).

Set f(x) generating vector field w and g(x) generating vector field v, then [f,g] = vw.

 

(vw)i = ∑vq  ∂ wi/∂ xq  -  ∑wq ∂ vi/∂ xq      (Choquet-Bruhat et al. 1977, p148).

                       q                                    q

[Note: superscript indices for contravariant variables.]

 

Case III.

If the basins are only defined within a certain radius, w and v are 0 outside that radius and  vw  equals  0 for case III. If the vector fields are defined as unit vector fields outside the basins, case III is purely additive.

 

 

Case I.

vw equals zero for identical vector fields in case I, and differs from zero if the added  basin is steeper than the existing one. This is comparable to the mimimum angle requirement for the arctan function.

 

 

Case II.

Example 1. Set wi = -xi,  vi = ci - xi ,

(vw)i = ( vi.-1) - (wi.-1) = -vi + wi = -ci. The Lie derivative increases with the offset c, as long as the edge of the basin is not crossed. This relation is sufficient to produce a case II masimum.

 

Example II. One-dimensional gaussian functions:

E = exp(-x2),  F = exp(-(x-p)2),  v = ∂ F/∂ x, w= ∂ E/∂ x.

vw =  ∂ F/∂ x ∂ 2E/∂ x2  -  ∂ E/∂ x ∂ 2F/∂ x2    =    (-8x2p + 8xp2 - x2) exp( - (x-p)2 - x2).

Using:    exp( - (x-p)2 - x2)  =  exp ( - 2 . (x - p/2)2  -  p2/2),

             ¥

              ò   exp ( -ax2) dx = Ö π /Öa,

           - ¥

              ¥

             ò   x.  exp ( -ax2) dx = 0,

           - ¥

              ¥

             ò    x2. exp (-ax2) dx = Ö π / 2a3/2,

           - ¥

the overall interaction can be calculated :

              ¥

             ò    [ ∂ E/∂ x, ∂ F/∂ x ] dx  =  -3p . Ö (2π)  .  exp (-p2/2).

           - ¥

This function has the value 0  for  p = 0,   has extrema for p = +1 and p = -1  and  approaches  0  for  p → ¥  and  p→ -¥,   cases comparable to case I, II and III with the arctan function,   except for  the sign.

 

 

Back to General introduction