Chapter 3: Gauging the number of variables in a description of brain processes: levels of abstraction.   

(Updated 16 May 2003, 28 April, 16 Dec. 2006, 21 March 2007).

 

Contents:

Summary.

Introduction.

1. Reducing the number of variables in the system description; reducing the number of variables that have to be observed.

2. Increasing the number of variables in the system description; increasing the number of variables that have to be observed.

3. Levels in the system equations, levels of resolution and levels of abstraction.

4. Neurochemical network.

Acknowledgement.

References.

Appendix: levels of abstraction and graph theory.

 

 

Summary.

 

This Chapter provides an overview of criteria and techniques to increase or reduce the number of variables in a description of a (nonlinear) system of which the system equations are not known. The algorithms for increasing and reducing the number of variables in a network model offer the best route for gauging the required number of variables.

Levels in the system equations are defined, which in combination with levels of resolution lead to admissible levels of abstraction. The practical importance of an admissible level of abstraction is that it allows variables on a deeper level to be ignored. For a description of brain processes, a neurochemical network model is introduced, that can be used at various levels of resolution.

 

 

 

Introduction.

 

Descriptions of brain processes span a range of levels of abstraction. At the most detailed level, a description of molecular processes such as reaction kinetics is provided by  neurochemistry. At the most abstract level, a description is based on brain regions, to interpret results from functional neuroimaging (e.g. Coull 1998). A complete description at the molecular level would involve an unwieldy number of variables (both in the description and as the number of variables that have to be observed to test a description at this level of detail). A reduction in the number of variables will be required, which will lead to a more abstracted description. On the other hand, the abstracted description leaves out the basic mechanics, and will have to be expanded when more subtle behavioral phenomena are studied. To test these more specified description, a greater number of variables will have to be observed. To gauge the number of variables starting from either extreme, a type of description is needed that can serve at various levels of abstraction.

 

 

 

1. Reducing the number of variables in the system description; reducing the number of variables that have to be observed.

 

In general, a variable can be redundant in two possible ways. First, a variable has a different time constant: it changes at a much slower or faster rate than the remaining variables and can be replaced by a constant or equilibrium value. Second, a variable is dynamically independent: it does not affect the dynamics of the remaining variables and the system description can be based on the remaining variables.

 

 

Different time constant.

 

If there are different time constants in the system, a variable in a process that is slow relative to the duration of observation can be approximated by a constant. In general, the accuracy of this approximation decreases with the duration of observation and increases with the time constant of the slow dynamics. However, even a small change in a variable can affect the dynamics of the others. Therefore, this approach should be tested with dynamic (in)dependence methods (see below).

 

On the other hand, if the time constants are sufficiently different, the description of the slower process can be based on the equilibrium values for the faster process (e.g. Zaslavsky 1985,  p98, Amari 1983, Hirsch 1989). Variables of the fast processes can then be considered redundant (e.g. Schauer and Heinrich 1983). By taking a suitable ensemble of similar systems or repeated observations of the same system, the faster variables can be treated as white noise - and statistics, such as state transition probablities and stationary distribution, can be derived (Goel and Richter-Dyn, 1974, p36vv).

 

Even when the faster variables are not in equilibrium, but cause a periodic disturbance, the system deduced from the average of repeated observations on a longer time scale can still accurately show the behavior of the slow variables.

Formally, averaging means that a system of differential equations with slow variables evolving on IRm and fast variable evolving on a smooth compact immersed manifold can be described by the slow variables in IRm with accuracy increasing for t→∞ (Dumas 1995, Guckenheimer and Holmes, 1983, p166‑178). Many aspects of the system behavior are preserved under averaging. However, codimension‑2 bifurcations are not correctly preserved and only under certain conditions does the global behavior carry over (for discussion, see Guckenheimer and Holmes, 1983, p180).

If a variable is dropped, the behavior of the system is viewed in a projected space of the remaining variables. The above reasoning holds within  IRm  and it also holds when the immersed manifold is the result of a projection across additional dimensions. Therefore, it is even more appropriate in IRm+p with p being the number of additional dimensions. With the limitations outlined above,  the faster variables can be dropped and the system description can be based on the slower variables.

 

 

 

Dynamical independence.

 

Independent Component Analysis on a set of variables groups the variables into components that have the same time course (Bell and Sejnowski 1995, Mckeown et al. 1998). This technique can suggest candidates for redundancy. However, although there is maximal independence between components (there is no direct or higher-order correlation between them), the variables that are grouped together within components may have subtle differences in their dynamics. In addition, independence according to time course does not imply that the variables are dynamically independent.

 

If the dynamics of one variable do not depend on a given other variable - and vice versa - the two variables are dynamically independent and can be grouped into different subsystems. Such a grouping offers a more interesting way of reducing the number of variables: in this case, a reduced system description involves only the dynamics of the variables that affect one another and leaves out the internal dynamics of subsystems.

The distinction between systems and subsystems does not have to match distinctions between fast and slow dynamics. The choice for a reduced system description therefore does not have to depend on the duration or temporal resolution of the observations, but is linked to the functional resolution of the observation technique: the most detailed level of the system where all variables can still be observed. Possible levels of resolution are further discussed in Section 3.    

 

Dynamical dependence and independence can be explored by delay embedding techniques (Takens 1981, Kantz and Schreiber 1997, see Section 2). Overall dependence between n variables can be demonstrated if the dimensionality of the manifold on which the system moves is less than n. Pair-wise dependence between variables of a system can be investigated by applying the Takens embedding theorem to combinations of two observables. [An observable is a smooth function from the state space to IR and is therefore comparable to an output variable. The theorems for observables will be applied here only to the more restricted class of observed system variables.]   The procedure is analogous to that described in Section 2, but with pairs of time points and multiples of two dimensions, since plotting two variables in different dimensional spaces would distort their possible dependence. Variables from the same subsystem would yield the dimensionality of their common attractor, while variables from different subsystems would yield the sum of dimensionalities of two attractors. However, this procedure would not detect two independent attractors with the same periodicity.

A more direct technique is to assess pair-wise dependence between variables by nonlinear Granger causality (Baek and Brock 1992, Hiemstra and Jones 1994, Chavez et al. 2003, Gourevich et al. 2006). This technique detects whether preceding values from one variable provide extra information on the subsequent values of another variable  - given the information provided by its own preceding values.  An alternative technique is cross-approximate entropy, which compares the irregularity of one variable relative to the other (Pincus and Singer 1996). [This technique is more restrictive in that it tests whether the values of two variables stay within a given distance of each other - rather than each staying individually within a certain range,  cf. definition 5 in Pincus and Singer (1996) with equation 6 in Hiemstra and Jones (1994).]

 

 

A systematic procedure can be used to determine the number of redundant variables when independent data are available in the form of output that the system has to mimic. The intrinsic dimensionality of this dataset can be determined as the smallest number of parameters needed to generate this set (Bennett 1969, Verveer and Duin 1995). This represents the theoretical minimum to which the number of system variables (and thereby the number of variables that have to be observed) can be reduced. An algorithmic approach to delete variables is only available for neural networks. Ash and Cottrell (1995) discuss pruning mechanisms that reduce the number of hidden units in a 3-layer neural network, depending on the error in the output.

 

 

 

 

2. Increasing the number of variables in the system description; increasing the number of variables that have to be observed.

 

In starting from a minimal description the question rises whether extra variables have to be included (in the system description and/or in the number of variables that have to be observed to test this system description) and if there is a systematic procedure to incorporate these additional variables.

 

 

a) No independent data set available.

 

If an independent data set is not available, it has to be deduced from the observations themselves whether the given set of observation variables should be extended. 

 

A time series of a single observable can be used to estimate a lower bound on the dimensionality of the attractor to which the system converges. A block of m successive time points is taken as a point in a  m-dimensional space (the embedding space). The distance to each of the other possible  blocks of m successive time points is calculated. The number of blocks within a given radius r  divided by the total number of distances is called the correlation integral C.   The limit of:  ln C/ln r  for decreasing radius and increasing number of points gives a lower bound for the dimensionality of the space from which these data derive (delay embedding theorem, Takens 1981, Wolf et al. 1985, Pincus 1991, Kantz and Schreiber 1997, p129). [Different authors give slightly different calculations. Usually distance is calculated using the dominance metric or maximum norm. The dimensionality of the embedding space has to be chosen sufficiently high.] 

An analogous method uses the values of n observables in n-dimensional space (Grassberger and Procaccia 1983). These procedures can be combined to determine if n observables, measured at p time points and embedded in np dimensions, reflect dynamics with a dimensionality greater than n (Kantz and Schreiber 1997, p142).

 

Not all time series can be analyzed in this fashion: a limit cycle with a period of twice the sampling interval (or where the sampling interval is a multiple of the cycle period) cannot be reconstructed by a delay embedding (Kantz and Schreiber 1997, p129). A further point of caution is the distinction between chaotic dynamics and noise. The embedding methods were introduced to characterize attractors for chaotic systems, that as a rule should have a fractional dimensionality. However, this rule does not hold for all chaotic systems (Glass 1995) and fractional dimensionality is also observed for filtered noise (Stam et al. 1995 and references therein). Therefore, the dimensionality should be considered a relative measure of complexity, to be used for comparisons between the dynamics of different systems.

With these limitations, the measure of dimensionality can show the need to incorporate more variables in the system description: that is, if the intrinsic dimensionality is higher than the number of observed variables, the number of observed variables has to be increased. A special case of adding new variables arises when the order of the system is underestimated. An analogous procedure to the ones mentioned above can be used in these cases, where the additional variables can be identified with time derivatives (cf. Takens 1981, theorem 3).

 

Finally,  a control u(t) can be rewritten as an additional variable that has a purely multiplicative effect on the remaining variables,  i.e. u(t) = v(xn+1(t))  in the system description:

    f(x1 ...xn)  +   u(t).g(x1 ....xn)     =      h(x1 .... xn+1)                          (1)

   [Note: superscipt indices for contravariant variables.]

This situation can occur if the system description has to be extended to parts of the system that were incorrectly left out.

 

 

The algorithms mentioned above do not indicate which variables should be added. If the system description includes variables that are not observable, these are obvious candidates. In addition, the system description itself can be insufficient, in which case additional variables have to be added. An further problem occurs when the intrinsic dimensionality of the extended set of observed variables reaches the level indicated by the algorithms. If this level is n, the Whitney embedding theorem (Kantz and Schreiber 1997, p126) suggests that maximally 2n+1 variables are needed as global coordinates.

Simulation has to be used to determine which variables should be added to the set of observed variables. For example, when two values of dy/dt are found for the same value of x in testing a system description of the form: dy/dt = f(x), the link between dy/dt and x involves an unobserved variable. This unobserved variable may be part of the ordinary dynamics or part of slower updating dynamics. 

[The 'two values' of dy/dt may in fact be subject to nonparametric statistical testing, such as cluster analysis. In addition, alternative explanations have to be excluded: dy/dt  should not depend on one of the other observed variables and  f(x) should have no discontinuity for this value of x. ]

 

 

 

b) Independent data set available.

 

If independent data are available, for example: output data that the system has to mimic, and the number of independent components within the output data is greater than that of the output generated by the system description, the system description should be extended. Alternatively, if the system can perform certain tasks, it can be deduced from the type and the size of the task how many components the system has. Although the required dimensionality is clearly defined in these cases, the way in which the output is generated or the input is processed is a source of ambiguity. Only within the context of a given system architecture can the extension of a system be derived from the  prescribed  input or output.

  

For example, a percep­tron (a two-layer feedforward neural network) can learn only linearly separable problems (Engel and Zippelius 1995); it cannot learn the Boolean function: exclusive OR; this task requires either a hidden layer or higher-order connecti­ons (Li and Chow 1996). However, the multi-layer per­ceptron is universal ap­proxi­mator in the sense that it can learn any function (Engel and Zippelius 1995, Kurkova 1995).

 

For a given type of task, the size of  the task determines the network needed to perform the task.

For example, the parity problem requi­res log(n) depth (layers) for n inputs; it cannot be solved with a fixed depth (Parberry 1995). In the context of approximation of functions, Kurkova (1995) discusses the possibility to determine the necessary number of hidden units in a three-layer neural network depending on the class of function that has to be generated. If the task is pattern classification, the maximum number of patterns that can be stored (and thereby classified) can be calculated for simple two-state networks, such as the perceptron and the Hopfield network (Zippelius and Engel 1995, Engel and Zippelius 1995). For more general categories of neural networks, the Vapnik-Chervonenkis dimension has to be estimated. For any neuron, this dimension is the maximum degrees of freedom to map input to the set {0,1} and determines the maximal set of inputs that can be distinguished. The Vapnik-Chervonenkis dimension can be extended to networks, multiple layers, different activation functions and to real-valued output sets. Upper bounds on this dimension can be used as criteria for extending the network if a more extensive input set has to be classified.  However, the upper bound holds only when the activation function for a neuron belongs to a given class of functions (linear, piecewise polyno­mial, sigmoid), since smooth step functions can be constructed that give the network the ability to classify an infinite number of stimuli (Maass 1995).  

 

 

In only a few cases a systematic procedure can be followed for adding new variables. Ash and Cottrell (1995) give a short overview of algorithms to expand the number of hidden units in a three-layer neural network, depending on the output error.  However,  these algorithms would not work for systems in general (for example when the input of one unit cannot be written as the sum of of outputs from other units) and an analogous algorithm for networks with higher-order connections still has to be developed (cf. Li and Chow 1996).

 

 

If the process of adding new variables is compared to approximation of functions in general, it is clear that there does not have to be a unique route to optimal approximation [this holds only within the saturation class of an approximation method] (Lorentz 1966). Therefore, in practice, when a new variable has to be included, plausible candidates have to be added to the set of observed variables (and, if not already included, to the system description). Based on simulation, the candidate with the highest improvement of fit has to be selected. 

Since in practice the external data are subject to noise (which can be considered infinite-dimensional), the improvement of fit has to include measures against 'over-fitting'. This can be accomplished by setting smoothness criteria (Wahba 1995, Marroquin 1995) and combining growing algorithms with pruning algorithms (see Section 1).

 

 

 

3. Levels in the system equations, levels of resolution and levels of abstraction.

 

The system equations:

   dxi/dt = f i(x1, .... ,xn)   (i = 1...n)                (2)

incorporate variables at various levels. The number of levels can be traced as follows. A directed graph can be defined with variables as its vertices and with links between variables: a link from xi to xj exists if xj is an argument of f i. The number of levels of the system is the maximum pathlength in such a directed graph:

   max  (min  pathlength (xi,xj))                      (3)

      ij          paths

The level of a variable can be defined as the pathlength from a root point in the directed graph (see appendix).

A similar approach has been discussed by Hirsch (1989) for networks. First, circuits are collapsed by reducing the maximal connected subnetworks (of which every unit is connected directly or indirectly to every other unit) to units in an abstracted network. In a feedforward network of subnetworks, the first level is that of networks that receive no links from other networks, and successive levels are defined by tracing the links between subnetworks. Using the graph defined above, this approach can be applied to systems of subsystems, to find the level of a subsystem (rather than that of an single variable). 

 

The number of levels cannot be determined from observations if general system equations are allowed, since f(x,y,z) is not distinguishable in observations from f(g(x,y),z). Similarly, the procedures to translate general systems into networks do not by themselves indicate (sub)levels.

[  f(x,y,z) and f(g(x,y),z) result in different numbers of summation terms in Kolmogorov superposition. In the case of Bernstein polynomials, the composite function results in a polynomial of polynomials. Since the polynomial includes a term equal to 1, the only difference is the degree of the composite polynomial, at a given level of approximation, which vanishes in the limit. Moreover, no difference should be expected in cases such as: 

   g1(x,y)=x+y,   g2(x,y,z)=x+y+z,   h(x,y)=x+y,    where:   g1(h(x,y),z) = g2(x,y,z).

Finally, as discussed before, a three-level network is a universal approximator.  ]

Additional restrictions to the network model may be introduced: e.g.  the form of the response function, a limit to the 'fan-in' etc. However, it is not clear if these restrictions would allow a plausible model, especially in the case of abstracted networks.

An alternative procedure is to test nonlinear causality between variables (Baek and Brock 1992, Hiemstra and Jones 1994, Chavez et al. 2003, Gourevitch et al. 2006). Since these techniques give information on the direction of the causality, an ordering of the levels can be obtained. Although one cannot determine the  absolute level for each variable or the mathematical form of the relation between variables, the ordering is sufficient for the applications discussed below.

[ Note that the correlation dimension on different sets of variables (see Section 1) only gives information on the existence of a relation between variables and not its direction. Therefore, with this technique one can only establish the maximum difference between the levels of variables.] 

 

If variables on various levels form a hierarchy, a description at a deeper level is derived by including additional variables that form a refinement of the existing set, i.e. are hierarchally subordinate to members of the existing set. This would greatly facilitate the search for new variables. However, the dynamics are in general not hierarchally organized. Therefore, such an organization has to be set up along a different route. The levels of resolution, discussed next, allow a hierarchal organization of variables. 

 

Levels of resolution.

With higher temporal resolution or (when the system is extended over space) higher spatial resolution, variables can be introduced that have a faster dynamics or diversity on a smaller spatial scale. On the other hand, observation over longer time periods or on a wider spatial domain can show processes that are slower or are not constant on a wider spatial scale.

The increase in number of variables follows the discussion in Section 2. To examine the effect of including additional variables on the system description, the arguments for reduction discussed in Section 1 can be applied in the opposite direction. For example, if the difference in time scale of dynamics is sufficient, an approximation can be used that the faster dynamics have reached equilibrium. However, the levels of convergence between which the above approximation can be used  are not necessarily the same as those in the system equations: a subsystem does not have to show faster convergence than a higher level system. In fact, the system theoretical level designation may be completely rearranged if additional variables are included and this rearrangement may not even be uniquely determined.

 

Levels of abstraction.

It is proposed here to define levels of abstraction as those levels of resolution that are meaningful in terms of the levels in the system equations. More specifically, a level of resolution is an admissible level of abstraction if further resolution does not lead to a rearrangement of the levels in the system equations found up to that point (technical details are discussed in the appendix). In practice this means that a description at a given admissible level of abstraction can leave out all variables at a deeper level.

                   

If the level of abstraction can be selected beforehand (on the basis of additional knowledge) this level can be used to limit the process of increasing resolution. In this case, the system description serves as an integrative model for additional data. If the level of abstraction is not set beforehand, a systematic procedure to increase the level of resolution together with the system theoretical levels provides a systematic way to gauge the appropriate level of abstraction. Such a procedure will be discussed in the next Section.

 

  

 

4. Neurochemical network.

 

Brain processes can be described at various levels of resolution, for example: the molecular, cell, cell group, and brain region level. In practice, these levels of resolution are used heuristically as a level of abstraction. (A formal use would involve showing that they are admissible levels of abstraction, see Section 3 and appendix). In order to use these known levels of abstraction or to gauge the appropriate level of abstraction, a family of descriptions should be used that are compatible with both the lowest and highest admissible levels of abstraction and allow a systematic procedure between these levels. This requirement also deals with cases where it cannot be excluded beforehand that a variable at a low level of abstraction survives on a higher level.

In the previous sections, procedures for determining whether the number of variables should be increased or reduced have been discussed: systematic procedures in the form of growing or pruning algorithms are available for network models. In addition, a network model is required to derive levels in the system equations from observations. Given the discussion in Section 3, it is of importance to develop a network model that can be used at various levels of resolution.

Neurochemical networks (Dekker 1998) fulfill these conditions and can therefore be used to gauge the number of variables needed in a description of brain processes.

 

The basic variables of a neurochemical network are the concentrations of M compounds at N locations. The dynamics are descri­bed by system equations of the type:

 

    dcij/dt = f ij (c1 , .... , cM.N)                               (4)

 

where cij is the concentration of compound i at location j. The network contains first-order spatial connections and higher- order connections due to reaction kinetics. It can be seen as a combination of biochemical network and neural net­work. Therefore the system equations are of the form:

                                                      

   dcij/dt = Σ A1  i(j)n(j)  gn(cn1 , ..., cnM )   +   Σ A2 (i)j(i)m cim       (5)

                  n                                                      m      

(where  A1 and A2 are two 2‑D sets of constants  and the higher-order biochemical terms have been omitted for clarity).

By rewriting the functions of n arguments as single-argument functions (Lorentz 1966, p4, 9, 10, Li and Chow 1996) the system can be brought into the form of a network with higher-order connections.

 

The number of variables can be reduced as discussed in Sections 1 for networks. For example, not all compounds or all locations have to be included. For reaction kinetics, only the most determi­ning reactions need to be incorporated. Since within a chain or cycle of reactions the slowest reactions are rate deter­mining, it is possible to reduce the number of equations to that of the slow reactions (Schauer and Heinrich 1983). Similarly, for the locations, only the most determining regi­ons have to be included.

 

The procedures for expanding the number of variables can be applied as described for networks in Section 2. Some heuristic principles can be derived from the system equations:

1) The system is Markov (cf. Gibson and Bruck 2000), so if correlations are found across time points with the values for intermediate time points held constant, extra variables should be added.  [A non-parametric version of this criterion is that any Granger causality at lag length > 1 that differs from the causality at lag length = 1 suggests that additional variables have to be included.]

 2) The system has only first-order time differentials, so higher-order dependencies have to specified by additional variables.

3) The system has no cross-connections between compounds and locations. To remove such connectivity, the level of resolution should be increased (see below).

4) On the most detailed level, the order of connectivity is at most 3 (Lehninger 1975, p187), so higher-order connections have to be reduced by including other variables.

In resolving discrepancies between repeated observations, information on the stoichiometry of a reaction, possible cofactors and possible enzyme regulation has to be compared with the nature and time scale of possible updating mechanisms: morphological plasticity and biochemical plasticity, such as enzyme induction (for reviews, see: Kolb and Whishaw 1998, Kumer and Vrana 1996).

 

Levels in the system equations can be distinguished as discussed for networks in Section 3. In a description of brain processes, the different levels are not necessarily hierarchally organized. An example of this is the 'cross-talk' between signal transduction mechanisms.

 

 

Levels of resolution.

In the above description (5), the size of the spatial units has been left unspecified. Similarly, the level within the biochemical network of the concentrations is not specified. Therefore, a neurochemical network can be described at various levels of resolution.

In biochemistry, the level can vary from overall parameters of energy consumption, parameters of neurotransmission (e.g. spike frequency), turnover of individual neurotransmitters, to  reaction rates.

The spatial units can be chosen to be a brain structure, a set of neurons, an individual cell (glial cell or neuron) or a subcellular compartment. Since biochemistry is strongly compartimentalized in the brain, it will be assumed that the minimal spatial unit in the neurochemical network is a compartment. Therefore, the neurochemical network excludes a general formulation of reaction‑diffusion equations (Cross and Hohenberg  1993), but allows a discretized version.

 

Increasing the level of resolution.

A higher resolution introduces additional locations and/or compounds. The indices of the sets of constants A1 and A2 are subdivided to describe biochemical or spatial diversity on a smaller scale. Cross-specification can also occur, where concentrations are split off at either end of a spatial connection (factors from A1 enter A2 in a description at a deeper level), or locations are split off from a biochemical connection (factors from A2 enter A1 in a description at a deeper level).

This splitting of A1 and A2 (and the corresponding splitting of higher-order terms left out from (5)) may be considered an extension of the growing algorithms discussed in Section 2.

The description in (5) requires that there are no cross-connections between compounds and locations. This requirement can define the need to further increase the resolution. A level of resolution where (5) can be used can then be pragmatically taken as an admissible level of abstraction.

 

Independent data.

To test a neurochemical network model, the independent data have to combine both spatial and biochemical information.

The spatial resolution of functional neuroimaging yields data of high dimensionality, such that a direct comparison with the dimensionality of independent data (Section 2) will not show that additional variables have to be measured. The opposite is more usual: the range of locations can be restricted to a 'region of interest', even for a single biochemical variable.

However, given the spatial overlap of different neurotransmitter systems in the brain, it is expected that more biochemical information will be needed for a further distinction between brain states. With the techniques currently available such a biochemically diversified picture has to be constructed by combining data from different techniques (e.g. nuclear magnetic resonance spectroscopy, where spatial resolution is sacrificed, and/or positron emission tomography, with different tracers).  In this case, the neurochemical network model forms a natural integration model to combine several biochemical variables across locations. 

 

 

 

Acknowledgement.

 

The author gratefully acknowledges helpful suggestions by Dr. J. Nijhuis and Dr. B. Kappen, and Prof. Dr. F. Takens who reviewed the statements on the use of the Takens embedding theorem for more than one observable.     

 

 

 

References.

 

Amari S‑I.  (1983):  Field theory of self‑organizing neural nets. IEEE trans. systems, man cybern. SMC 13. 741‑748.

 

Ash  T.,  Cottrell G.  (1995):  Topology‑modifying neural network algorithms.  In:  Arbib M.  A. (Ed.) The Handbook of Brain Theory and Neural Networks. The MIT Press, Cambridge, Massachusetts.

 

Baek E., Brock W. (1992): A general test for nonlinear Granger causality. Bivariate model. Working paper. Iowa State University and University of Wisconsin at Madison.

 

Bell A. J., Sejnowski T. J. (1995): An information-maximization approach to blind separation and blind deconvolution. Neural Comp. 7, 1129-1159.

 

Bennett  R.  S.  (1969):  The intrinsic dimensionality of  signal collections. IEEE Trans. Inform. Theory, 15, 517‑525.

 

Chavez M., Martinerie J., Le Van Quyen M. (2003): Statistical assessment of nonlinear causality: application to epileptic EEG signals. J. Neurosci. Meth. 124, 113-128.

 

Coull J. T. (1998): Neural correlates of attention and arousal: insights from electrophysiology, functional neuroimaging and psychopharmacology. Progr. Neurobiol. 55, 343-361.

 

Cross M. C., Hohenberg P. C. (1993): Pattern formation outside of equilibrium. Rev. Mod. Phys. 65, 851‑1112.

 

Dekker A. J. (1998): Neurochemical networks, nonlinear systems and functional neuroimaging. Faculty report Technical informatics and mathematics, Delft University of Technology.

 

Dumas H.  S.  (1995):  A new proof of Anosov's averaging theorem. In:   Dumas H.  S., Meyer K. R., Schmidt D. S. (Eds.) Hamiltonian Dynamical Systems.  History,  Theory and Applications.  Springer, New York.

 

Engel A., Zippelius A. (1995): Statistical mechanics of learning. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural Networks.  The MIT Press, Cambridge, Massachusetts.

 

Flament C. (1963): Applications of Graph Theory to Group Structure. Prentice Hall Inc. Englewood Cliffs, New Jersey.

 

Gibson M. A., Bruck J. (2000): Efficient exact stochastic simulation of chemical systems with many species and many channels. J. Phys. Chem. A. 104, 1876-1889.

 

Glass L. (1995): Chaos in neural systems. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural Networks.  The MIT Press, Cambridge, Massachusetts.

 

Goel N. S., Richter-Dyn N. (1974): Stochastic Models in Biology. Academic Press, New York.

 

Gourevitch B, Le Bouquin-Jeannes R., Foucon G. (2006): Linear and nonlinear causality between signals: methods, examples and neurophysiological applications. Biol. Cybern. 95, 349-369.

 

Grassberger P., Procaccia I. (1983): Characterization of strange attractors. Physical Rev. Lett. 50, 346-349.

 

Guckenheimer J.  M., Holmes P. J. (1983): Nonlinear Oscillations, Dynamical  Systems and Bifurcations of Vector  Fields.  Springer, New York.

 

Hiemstra C., Jones J. D. (1994): Testing for linear and nonlinear Granger causality in the stock price-volume relation. J. Finance  49, 1639-1664.

 

Hirsch M. W. (1989): Convergent activation dynamics in continuous time networks. Neural Networks, 2, 331-349.

 

Kantz H., Schreiber T. (1997): Nonlinear Time Series Analysis. Cambridge University Press, Cambridge.

 

Kolb B., Whishaw I. Q. (1998): Brain plasticity and behavior. Ann. Rev. Psychol. 49, 43-64.

 

Kumer S. C., Vrana K. E. (1996): Intricate regulation of tyrosine hydroxylase activity and gene expression. J. Neurochem. 67, 443-462.

 

Kurkova V.  (1995):  Kolmogorov's theorem.  In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural Networks.  The MIT Press, Cambridge, Massachusetts.

 

Lehninger A. L. (1975): Biochemistry. Worth Publishers Inc. New York.

 

Li J-Y., Chow T. W. S. (1996): Functional approximation ofsimulation of chemical systems with many species and many higher-order neural networks. J. Intell. Systems, 6, 239-260.channels. J. Phys. Chem. A. 104, 1876-1889.

 

Lorentz G. G. (1966): Approximation of Functions. Holt, RinehartGillespie D. T. (2000): The chemical Langevin equation. J. Chem. and Winston, New York.

 

Maass W. (1995): Vapnik-Chervonenkis dimension of neural networks. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural Networks.  The MIT Press, Cambridge, Massachusetts.

 

Marroquin J. L. (1995): Regularization theory and low-level vision. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural Networks.  The MIT Press, Cambridge, Massachusetts.

 

McKeown M. J., Jung T-P., Makeig S., Brown G., Kindermann S. S., Lee T-W., Sejnowski T. J. (1998): Spatially independent activity patterns in functional MRI data during the Stroop color-naming task. Proc. Natl. Acad. Sci. USA, 95, 803-810.

 

Parberry I. (1995): Structural complexity and discrete neural networks. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural Networks.  The MIT Press, Cambridge, Massachusetts.

 

Pincus S. M. (1991): Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA. 88, 2297-2301.

 

Pincus S.,  Singer B. H. (1996): Randomness and degrees of irregularity. Proc. Natl. Acad. Sci. USA. 93, 2083-2088.

 

Schauer M.,  Heinrich R. (1983): Quasi steady state approximation in  the mathematical modeling of biochemical reaction networks. Math. Biosci. 65, 155‑170.

 

Stam C.  J.,  Jelles B.,  Achtereekte H. A. M., Rombouts S. A. R. B., Slaets J. P. J., Keunen R. W. M. (1995): Investigation of EEG non‑linearity in dementia and Parkinson's disease. Electroenceph. and clin. neurophysiol. 95, 309‑317.

 

Takens F. (1981): Detecting strange attractors in turbulence. In: Rand D. A. and Young L. S. (Eds.) Dynamical Systems and Turbulence. Lecture Notes in Mathematics. 898, Springer, Berlin.

 

Verveer P.  J.,  Duin R. P. W. (1995): An evaluation of intrinsic  dimensionality  estimators.   IEEE  Trans.  Patt.  Anal.  Machine Intell. 17, 81‑86.

 

Wahba G. (1995): Generalization and regularization in nonlinear learning systems. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural Networks.  The MIT Press, Cambridge, Massachusetts.

 

Watt R. C., Hameroff S. R. (1988): Phase space electroencephalo-graphy (EEG): a new mode of intaoperative EEG analysis. Int. J. Clin. Monit. Computing 5, 3-13.

 

Wolf A.,  Swift J.  B.,  Swinney H.  L.,  Vastano J.  A.  (1985): Determining  Lyapunov exponents from a time series.  Physica 16D, 285‑317.

 

Zaslavsky  G.  M.  (1985):  Chaos  in  Dynamic  Systems.  Harwood Academic Publishers, New York.

 

Zippelius A., Engel A. (1995): Statistical mechanics of neural networks. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural Networks.  The MIT Press, Cambridge, Massachusetts.

 

 

 

 

Appendix: levels of abstraction and graph theory.

 

A level of resolution is an admissible level of abstraction if further resolution does not lead to a rearrangement of the levels in the system equations found up to that point. This can be specified as follows: for any pair of variables x and y,  where a path exists from x to y  and leveln(x) < leveln(y) at resolution n, there is no deeper resolution m > n  at which levelm(x) > levelm(y).

[ The level of a variable is the length of the shortest path from a given root point (Flament 1963, p25). If there are alternative root points, the minimum of the total number of levels should be taken. The result is not necessarily a minimal arborescence (Flament 1963, p68), since variables can have multiple links from lower-level variables. If the level designation of variables is not unique before and/or after an increase in resolution, the criterion should hold for at least one of the possible numberings of variables. However, a single numbering should be used in all comparisons across levels of resolution. ]

Since there can always be deeper levels of resolution, this criterion can only be applied in practice if there is a maximum to the resolution, e.g. for the brain: the molecular level.

 

Using definitions from graph theory (e.g. Flament 1963, p22-23), the nature of admissible levels of abstraction can be described further. 

A graph is a set of vertices and the (directed) links between them.

A subgraph is a subset of the vertices with their internal links. If graph  is a subgraph of ', then ' will be called a supergraph of .

A reduced graph combines vertices into new units. The links between vertices within such a unit are collapsed, the links to outside vertices are preserved as links between these new units. If    is a reduced graph of ', then '  will be called a specified graph of  .

The subnets defined by Hirsch (1989) are units in a reduced graph, collapsed over maximal connected sets of vertices. (The maximal connected sets are the largest sets in which there is a path from any vertex to any other.)

The relation between these types of graphs and admissible levels of abstractions can be summarized in a Venn-diagram (Figure 1).

 

 

Figure 1. Venn diagram of categories of graphs. = subgaphs,  = admissible levels of abstraction,  = reduced graphs,  = Hirsch subnets.

 

Specifically:

1)   Ç  ¹ Æ. Reduced graphs and subgraphs are partially overlapping. They are the same if the reduced units can be thought to be produced by leaving out vertices. If the vertices that are left out to produce a subgraph would have added external links for a reduced unit, subgraphs are not reduced graphs. If the reduced units combine vertices that are not connected when the units are specified, reduced graphs are not subgraphs. 

 

2)  É (  Ç ). Admissible abstractions include the intersection of reduced graphs and subgraphs. Namely, in these cases the supergraphs or specified graphs have additional vertices that form only side-branches on paths of the existing vertices. Therefore, the levels of the existing variables are not changed by adding new variables. The admissible abstractions include additional cases, since there can be subgraphs (that are not reduced graphs) where additional vertices do not rearrange the level of variables. Similarly, there are reduced graphs (that are not subgraphs) where specification does not rearrange levels.

 

3)  Ì (  È ). Admissible abstractions are included in the union of reduced graphs and subgraphs. This inclusion is strict, since specifying graphs or adding extra vertices can rearrange the levels of variables.

 

4)   Ì . The Hirsch subnets are included in the reduced graphs. Within this set they include some subgraphs, admissible abstractions and remaining reduced graphs. [ The Hirsch subnets do not span these three categories, since the non-connected subgraphs and  the reduced graphs resulting from combining non-connected vertices form the additional elements of these categories. ]

In particular, Hirsch subnets form admissible abstractions if the 'fan-out' = 1, that is: if there is no downstream branching. For multiple downstream paths, the subnet may form connecting paths of different lengths and so rearrange the levels.

 

 

In theory, too many levels of resolution may be admissible abstractions. The criterion can be made more strict in several ways. First, one can use the subnets defined by Hirsch (1989). However, as this criterion collapses circuits, it does not allow a distinction in levels within a circuit (only between circuits). In the extreme, all levels are collapsed if a single overall feedback loop is introduced. A second way to sharpen the criterion is to drop the requirement that there is a path from x to y: the order of the levels should be unaffected for any pair of variables. This will exclude cases where increasing resolution produces subnets etc. of different sizes in different paths. However, this will also have the effect that fewer subnets will be admissible levels of abstraction. It is therefore expected that the present criterion will be more suitable.     

 

Back to General introduction   Next Chapter