Chapter
3: Gauging the number of variables in a description of brain processes: levels
of abstraction.
(Updated 16 May 2003, 28 April, 16 Dec. 2006, 21 March
2007).
Contents:
Summary.
Introduction.
1. Reducing the number of variables in the system description; reducing the number of variables that have to be observed.
2. Increasing the number of variables in the system description; increasing the number of variables that have to be observed.
3.
Levels in the system equations, levels of resolution and levels of abstraction.
4.
Neurochemical network.
Acknowledgement.
References.
Appendix: levels of abstraction and graph theory.
Summary.
This Chapter provides an overview of
criteria and techniques to increase or reduce the number of variables in a
description of a (nonlinear) system of which the system equations are not
known. The algorithms for increasing and reducing the number of variables in a
network model offer the best route for gauging the required number of variables.
Levels in the system equations are
defined, which in combination with levels of resolution lead to admissible
levels of abstraction. The practical importance of an admissible level of
abstraction is that it allows variables on a deeper level to be ignored. For a
description of brain processes, a neurochemical network model is introduced,
that can be used at various levels of resolution.
Introduction.
Descriptions of brain processes span a
range of levels of abstraction. At the most detailed level, a description of
molecular processes such as reaction kinetics is provided by neurochemistry. At the most abstract level,
a description is based on brain regions, to interpret results from functional
neuroimaging (e.g. Coull 1998). A complete description at the molecular level
would involve an unwieldy number of variables (both in the description and as
the number of variables that have to be observed to test a description at this
level of detail). A reduction in the number of variables will be required,
which will lead to a more abstracted description. On the other hand, the
abstracted description leaves out the basic mechanics, and will have to be
expanded when more subtle behavioral phenomena are studied. To test these more
specified description, a greater number of variables will have to be observed.
To gauge the number of variables starting from either extreme, a type of
description is needed that can serve at various levels of abstraction.
1. Reducing the number of variables in the system description; reducing the number of variables that have to be observed.
In general, a variable can be redundant
in two possible ways. First, a variable has a different time constant: it
changes at a much slower or faster rate than the remaining variables and can be
replaced by a constant or equilibrium value. Second, a variable is dynamically
independent: it does not affect the dynamics of the remaining variables and the
system description can be based on the remaining variables.
Different time constant.
If there are different time constants in
the system, a variable in a process that is slow relative to the duration of
observation can be approximated by a constant. In general, the accuracy of this
approximation decreases with the duration of observation and increases with the
time constant of the slow dynamics. However, even a small change in a variable
can affect the dynamics of the others. Therefore, this approach should be
tested with dynamic (in)dependence methods (see below).
On the other hand, if the time constants
are sufficiently different, the description of the slower process can be based
on the equilibrium values for the faster process (e.g. Zaslavsky 1985, p98, Amari 1983, Hirsch 1989). Variables of
the fast processes can then be considered redundant (e.g. Schauer and Heinrich
1983). By taking a suitable ensemble of similar systems or repeated
observations of the same system, the faster variables can be treated as white
noise - and statistics, such as state transition probablities and stationary distribution,
can be derived (Goel and Richter-Dyn, 1974, p36vv).
Even
when the faster variables are not in equilibrium, but cause a periodic
disturbance, the system deduced from
the average of repeated observations on a longer time
scale can still accurately show the behavior of the slow variables.
Formally, averaging means that a system of
differential equations with slow variables evolving on IRm and
fast variable evolving on a smooth compact immersed manifold can be described
by the slow variables in IRm with accuracy increasing for t→∞ (Dumas 1995, Guckenheimer and
Holmes, 1983, p166‑178). Many aspects of the system behavior are preserved
under averaging. However, codimension‑2 bifurcations are not correctly
preserved and only under certain conditions does the global behavior carry over
(for discussion, see Guckenheimer and Holmes, 1983, p180).
If a variable is dropped, the behavior of the system is
viewed in a projected space of the remaining variables. The above reasoning
holds within IRm and it also holds when the immersed manifold
is the result of a projection across additional dimensions. Therefore, it is
even more appropriate in IRm+p with p
being the number of additional dimensions. With the limitations outlined
above, the faster variables can be
dropped and the system description can be based on the slower variables.
Dynamical independence.
Independent Component Analysis on a set
of variables groups the variables into components that have the same time
course (Bell and Sejnowski 1995, Mckeown et al. 1998). This technique can
suggest candidates for redundancy. However, although there is maximal
independence between components (there is no direct or higher-order correlation
between them), the variables that are grouped together within components may
have subtle differences in their dynamics. In addition, independence according
to time course does not imply that the variables are dynamically independent.
If the dynamics of one variable do not
depend on a given other variable - and vice versa - the two variables are
dynamically independent and can be grouped into different subsystems. Such a
grouping offers a more interesting way of reducing the number of variables: in
this case, a reduced system description involves only the dynamics of the
variables that affect one another and leaves out the internal dynamics of
subsystems.
The distinction between systems and
subsystems does not have to match distinctions between fast and slow dynamics.
The choice for a reduced system description therefore does not have to depend
on the duration or temporal resolution of the observations, but is linked to
the functional resolution of the observation technique: the most detailed level
of the system where all variables can still be observed. Possible levels of
resolution are further discussed in Section 3.
Dynamical dependence and independence can be explored by
delay embedding techniques (Takens 1981, Kantz and Schreiber 1997, see Section
2). Overall dependence between n variables can be demonstrated if the
dimensionality of the manifold on which the system moves is less than n.
Pair-wise dependence between variables of a system can be investigated by
applying the Takens embedding theorem to combinations of two observables. [An
observable is a smooth function from the state space to IR and is therefore comparable to an output
variable. The theorems for observables will be applied here only to the more
restricted class of observed system variables.] The procedure is analogous to that
described in Section 2, but with pairs of time points and multiples of two
dimensions, since plotting two variables in different dimensional spaces would
distort their possible dependence. Variables from the same subsystem would
yield the dimensionality of their common attractor, while variables from
different subsystems would yield the sum of dimensionalities of two attractors.
However, this procedure would not detect two independent attractors with the
same periodicity.
A more direct technique is to assess pair-wise dependence
between variables by nonlinear Granger causality (Baek and Brock 1992, Hiemstra
and Jones 1994, Chavez et al. 2003, Gourevich et al. 2006). This technique
detects whether preceding values from one variable provide extra information on
the subsequent values of another variable
- given the information provided by its own preceding values. An alternative technique is
cross-approximate entropy, which compares the irregularity of one variable
relative to the other (Pincus and Singer 1996). [This technique is more
restrictive in that it tests whether the values of two variables stay within a
given distance of each other - rather than each staying individually within a
certain range, cf. definition 5 in
Pincus and Singer (1996) with equation 6 in Hiemstra and Jones (1994).]
A systematic procedure can be used to
determine the number of redundant variables when independent data are available
in the form of output that the system has to mimic. The intrinsic
dimensionality of this dataset can be determined as the smallest number of
parameters needed to generate this set (Bennett 1969, Verveer and Duin 1995).
This represents the theoretical minimum to which the number of system variables
(and thereby the number of variables that have to be observed) can be reduced.
An algorithmic approach to delete variables is only available for neural
networks. Ash and Cottrell (1995) discuss pruning mechanisms that reduce the
number of hidden units in a 3-layer neural network, depending on the error in
the output.
2. Increasing the number of variables in the system description; increasing the number of variables that have to be observed.
In starting from a minimal description
the question rises whether extra variables have to be included (in the system
description and/or in the number of variables that have to be observed to test
this system description) and if there is a systematic procedure to incorporate
these additional variables.
a) No independent data set available.
If an independent data set is not
available, it has to be deduced from the observations themselves whether the
given set of observation variables should be extended.
A time series of a single observable can be used to
estimate a lower bound on the dimensionality of the attractor to which the
system converges. A block of m successive time points is taken as a
point in a m-dimensional space
(the embedding space). The distance to each of the other possible blocks of m successive time points is
calculated. The number of blocks within a given radius r divided by the total number of distances is
called the correlation integral C.
The limit of: ln C/ln r for decreasing radius and increasing number
of points gives a lower bound for the dimensionality of the space from which
these data derive (delay embedding theorem, Takens 1981, Wolf et al. 1985,
Pincus 1991, Kantz and Schreiber 1997, p129). [Different authors give slightly
different calculations. Usually distance is calculated using the dominance
metric or maximum norm. The dimensionality of the embedding space has to be
chosen sufficiently high.]
An analogous method uses the values of n
observables in n-dimensional space (Grassberger and Procaccia 1983).
These procedures can be combined to determine if n observables, measured
at p time points and embedded in np dimensions,
reflect dynamics with a dimensionality greater than n (Kantz and
Schreiber 1997, p142).
Not all time series can be analyzed in
this fashion: a limit cycle with a period of twice the sampling interval (or
where the sampling interval is a multiple of the cycle period) cannot be
reconstructed by a delay embedding (Kantz and Schreiber 1997, p129). A further
point of caution is the distinction between chaotic dynamics and noise. The
embedding methods were introduced to characterize attractors for chaotic
systems, that as a rule should have a fractional dimensionality. However, this
rule does not hold for all chaotic systems (Glass 1995) and fractional
dimensionality is also observed for filtered noise (Stam et al. 1995 and
references therein). Therefore, the dimensionality should be considered a
relative measure of complexity, to be used for comparisons between the dynamics
of different systems.
With these limitations, the measure of
dimensionality can show the need to incorporate more variables in the system
description: that is, if the intrinsic dimensionality is higher than the number
of observed variables, the number of observed variables has to be increased. A
special case of adding new variables arises when the order of the system is
underestimated. An analogous procedure to the ones mentioned above can be used
in these cases, where the additional variables can be identified with time
derivatives (cf. Takens 1981, theorem 3).
Finally, a control u(t) can be rewritten as an
additional variable that has a purely multiplicative effect on the remaining
variables, i.e. u(t) = v(xn+1(t)) in the system description:
f(x1 ...xn) +
u(t).g(x1 ....xn) =
h(x1 .... xn+1) (1)
[Note:
superscipt indices for contravariant variables.]
This situation can occur if the system description has
to be extended to parts of the system that were incorrectly left out.
The algorithms mentioned above do not indicate which variables
should be added. If the system description includes variables that are not
observable, these are obvious candidates. In addition, the system description
itself can be insufficient, in which case additional variables have to be
added. An further problem occurs when the intrinsic dimensionality of the
extended set of observed variables reaches the level indicated by the
algorithms. If this level is n, the Whitney embedding theorem (Kantz and
Schreiber 1997, p126) suggests that maximally 2n+1 variables are needed
as global coordinates.
Simulation
has to be used to determine which variables should be added to the set of
observed variables. For example, when two
values of dy/dt are found for the same value of x in testing a
system description of the form: dy/dt = f(x), the link between dy/dt
and x involves an unobserved variable. This unobserved variable may be
part of the ordinary dynamics or part of slower updating dynamics.
[The 'two values' of dy/dt may in fact be subject
to nonparametric statistical testing, such as cluster analysis. In addition,
alternative explanations have to be excluded: dy/dt should not depend on one of the other
observed variables and f(x)
should have no discontinuity for this value of x. ]
b) Independent data set available.
If independent data are available, for
example: output data that the system has to mimic, and the number of
independent components within the output data is greater than that of the
output generated by the system description, the system description should be
extended. Alternatively, if the system can perform certain tasks, it can be
deduced from the type and the size of the task how many components the system
has. Although the required dimensionality is clearly defined in these cases,
the way in which the output is generated or the input is processed is a source
of ambiguity. Only within the context of a given system architecture can the
extension of a system be derived from the
prescribed input or output.
For example, a
perceptron (a two-layer feedforward neural network) can learn only linearly
separable problems (Engel and Zippelius 1995); it cannot learn the Boolean
function: exclusive OR; this task requires either a hidden layer or
higher-order connections (Li and Chow 1996). However, the multi-layer perceptron
is universal approximator in the sense that it can learn any function (Engel
and Zippelius 1995, Kurkova 1995).
For a
given type of task, the size of the
task determines the network needed to perform the task.
For
example, the parity problem requires log(n)
depth (layers) for n inputs; it
cannot be solved with a fixed depth (Parberry 1995). In the context of
approximation of functions, Kurkova
(1995) discusses the possibility to determine the necessary number of hidden
units in a three-layer neural network depending on the class of function that
has to be generated. If the task is pattern classification, the
maximum number of patterns that can be stored (and thereby classified) can be
calculated for simple two-state networks, such as the perceptron and the
Hopfield network (Zippelius and Engel 1995, Engel and Zippelius 1995). For more
general categories of neural networks, the Vapnik-Chervonenkis dimension has to
be estimated. For any neuron, this dimension is the maximum degrees of freedom
to map input to the set {0,1} and determines the maximal set of inputs that can
be distinguished. The Vapnik-Chervonenkis dimension can be extended to
networks, multiple layers, different activation functions and to real-valued
output sets. Upper bounds on this dimension can be used as criteria for
extending the network if a more extensive input set has to be classified. However, the upper bound holds only when the activation
function for a neuron belongs to a given class of functions (linear, piecewise
polynomial, sigmoid), since smooth step functions can be constructed that give
the network the ability to classify an infinite number of stimuli (Maass
1995).
In only a few cases a systematic
procedure can be followed for adding new variables. Ash and Cottrell (1995)
give a short overview of algorithms to expand the number of hidden units in a
three-layer neural network, depending on the output error. However,
these algorithms would not work for systems in general (for example when
the input of one unit cannot be written as the sum of of outputs from other
units) and an analogous algorithm for networks with higher-order connections
still has to be developed (cf. Li and Chow 1996).
If the process of adding new variables is
compared to approximation of functions in general, it is clear that there does
not have to be a unique route to optimal approximation [this holds only within
the saturation class of an approximation method] (Lorentz 1966). Therefore, in
practice, when a new variable has to be included, plausible candidates have to
be added to the set of observed variables (and, if not already included, to the
system description). Based on simulation, the candidate with the highest
improvement of fit has to be selected.
Since in practice the external data are
subject to noise (which can be considered infinite-dimensional), the
improvement of fit has to include measures against 'over-fitting'. This can be
accomplished by setting smoothness criteria (Wahba 1995, Marroquin 1995) and
combining growing algorithms with pruning algorithms (see Section 1).
3. Levels in the system equations, levels of resolution
and levels of abstraction.
The system equations:
dxi/dt
= f i(x1, .... ,xn) (i = 1...n) (2)
incorporate variables at various levels. The number of
levels can be traced as follows. A directed graph can be defined with variables
as its vertices and with links between variables: a link from xi
to xj exists if xj is an argument of f i.
The number of levels of the system is the maximum pathlength in such a directed
graph:
max
(min pathlength (xi,xj)) (3)
ij paths
The level of a variable can be defined as
the pathlength from a root point in the directed graph (see appendix).
A similar approach has been discussed by
Hirsch (1989) for networks. First, circuits are collapsed by reducing the
maximal connected subnetworks (of which every unit is connected directly or
indirectly to every other unit) to units in an abstracted network. In a
feedforward network of subnetworks, the first level is that of networks that
receive no links from other networks, and successive levels are defined by
tracing the links between subnetworks. Using the graph defined above, this
approach can be applied to systems of subsystems, to find the level of a
subsystem (rather than that of an single variable).
The number of levels cannot be determined from
observations if general system equations are allowed, since f(x,y,z) is
not distinguishable in observations from f(g(x,y),z). Similarly, the
procedures to translate general systems into networks do not by themselves
indicate (sub)levels.
[ f(x,y,z)
and f(g(x,y),z) result in different numbers of summation terms in
Kolmogorov superposition. In the case of Bernstein polynomials, the composite
function results in a polynomial of polynomials. Since the polynomial includes
a term equal to 1, the only difference is the degree of the composite
polynomial, at a given level of approximation, which vanishes in the limit.
Moreover, no difference should be expected in cases such as:
g1(x,y)=x+y, g2(x,y,z)=x+y+z, h(x,y)=x+y, where:
g1(h(x,y),z) = g2(x,y,z).
Finally, as discussed before, a three-level network is a
universal approximator. ]
Additional restrictions to the network
model may be introduced: e.g. the form
of the response function, a limit to the 'fan-in' etc. However, it is not clear
if these restrictions would allow a plausible model, especially in the case of
abstracted networks.
An alternative procedure is to test
nonlinear causality between variables (Baek and Brock 1992, Hiemstra and Jones
1994, Chavez et al. 2003, Gourevitch et al. 2006). Since these techniques give
information on the direction of the causality, an ordering of the levels can be
obtained. Although one cannot determine the
absolute level for each variable or the mathematical form of the
relation between variables, the ordering is sufficient for the applications
discussed below.
[ Note that the correlation dimension on
different sets of variables (see Section 1) only gives information on the existence
of a relation between variables and not its direction. Therefore,
with this technique one can only establish the maximum difference between the
levels of variables.]
If variables on various levels form a
hierarchy, a description at a deeper level is derived by including additional
variables that form a refinement of the existing set, i.e. are hierarchally
subordinate to members of the existing set. This would greatly facilitate the
search for new variables. However, the dynamics are in general not hierarchally
organized. Therefore, such an organization has to be set up along a different
route. The levels of resolution, discussed next, allow a hierarchal
organization of variables.
Levels of resolution.
With higher temporal resolution or (when
the system is extended over space) higher spatial resolution, variables can be
introduced that have a faster dynamics or diversity on a smaller spatial scale.
On the other hand, observation over longer time periods or on a wider spatial domain
can show processes that are slower or are not constant on a wider spatial
scale.
The increase in number of variables
follows the discussion in Section 2. To examine the effect of including
additional variables on the system description, the arguments for reduction
discussed in Section 1 can be applied in the opposite direction. For example,
if the difference in time scale of dynamics is sufficient, an approximation can
be used that the faster dynamics have reached equilibrium. However, the levels
of convergence between which the above approximation can be used are not necessarily the same as those in the
system equations: a subsystem does not have to show faster convergence than a
higher level system. In fact, the system theoretical level designation may be
completely rearranged if additional variables are included and this
rearrangement may not even be uniquely determined.
Levels of abstraction.
It is proposed here to define levels of abstraction as
those levels of resolution that are meaningful in terms of the levels in the
system equations. More specifically, a level of resolution is an admissible
level of abstraction if further resolution does not lead to a rearrangement of
the levels in the system equations found up to that point (technical details
are discussed in the appendix). In practice this means that a description at a
given admissible level of abstraction can leave out all variables at a deeper
level.
If the level of abstraction can be
selected beforehand (on the basis of additional knowledge) this level can be
used to limit the process of increasing resolution. In this case, the system
description serves as an integrative model for additional data. If the level of
abstraction is not set beforehand, a systematic procedure to increase the level
of resolution together with the system theoretical levels provides a systematic
way to gauge the appropriate level of abstraction. Such a procedure will be
discussed in the next Section.
4. Neurochemical network.
Brain processes can be described at
various levels of resolution, for example: the molecular, cell, cell group, and
brain region level. In practice, these levels of resolution are used
heuristically as a level of abstraction. (A formal use would involve showing that
they are admissible levels of abstraction, see Section 3 and appendix). In
order to use these known levels of abstraction or to gauge the appropriate
level of abstraction, a family of descriptions should be used that are
compatible with both the lowest and highest admissible levels of abstraction
and allow a systematic procedure between these levels. This requirement also
deals with cases where it cannot be excluded beforehand that a variable at a
low level of abstraction survives on a higher level.
In the previous sections, procedures for
determining whether the number of variables should be increased or reduced have
been discussed: systematic procedures in the form of growing or pruning
algorithms are available for network models. In addition, a network model is
required to derive levels in the system equations from observations. Given the
discussion in Section 3, it is of importance to develop a network model that
can be used at various levels of resolution.
Neurochemical networks (Dekker 1998)
fulfill these conditions and can therefore be used to gauge the number of
variables needed in a description of brain processes.
The basic variables of a neurochemical network are the
concentrations of M compounds at N locations. The dynamics are
described by system equations of the type:
dcij/dt = f ij (c1
, .... , cM.N) (4)
where cij is the concentration of
compound i at location j. The network contains first-order
spatial connections and higher- order connections due to reaction kinetics. It
can be seen as a combination of biochemical network and neural network.
Therefore the system equations are of the form:
dcij/dt
= Σ A1 i(j)n(j) gn(cn1 , ..., cnM
) + Σ A2 (i)j(i)m
cim (5)
n m
(where A1
and A2 are two 2‑D sets of constants and the higher-order biochemical terms have
been omitted for clarity).
By rewriting the functions of n arguments as
single-argument functions (Lorentz 1966, p4, 9, 10, Li and Chow 1996) the
system can be brought into the form of a network with higher-order connections.
The number of variables can be reduced as
discussed in Sections 1 for networks. For example, not all compounds or all
locations have to be included. For reaction kinetics, only the most determining
reactions need to be incorporated. Since within a chain or cycle of reactions
the slowest reactions are rate determining, it is possible to reduce the
number of equations to that of the slow reactions (Schauer and Heinrich 1983).
Similarly, for the locations, only the most determining regions have to be
included.
The procedures for expanding the number
of variables can be applied as described for networks in Section 2. Some
heuristic principles can be derived from the system equations:
1) The system is Markov (cf. Gibson and
Bruck 2000), so if correlations are found across time points with the values
for intermediate time points held constant, extra variables should be
added. [A non-parametric version of
this criterion is that any Granger causality at lag length > 1 that differs
from the causality at lag length = 1 suggests that additional variables have to
be included.]
2) The system has only first-order time differentials, so
higher-order dependencies have to specified by additional variables.
3) The system has no cross-connections
between compounds and locations. To remove such connectivity, the level of
resolution should be increased (see below).
4) On the most detailed level, the order
of connectivity is at most 3 (Lehninger 1975, p187), so higher-order
connections have to be reduced by including other variables.
In resolving discrepancies between repeated
observations, information on the stoichiometry of a reaction, possible
cofactors and possible enzyme regulation has to be compared with the nature and
time scale of possible updating mechanisms: morphological plasticity and
biochemical plasticity, such as enzyme induction (for reviews, see: Kolb and
Whishaw 1998, Kumer and Vrana 1996).
Levels in the system equations can be
distinguished as discussed for networks in Section 3. In a description of brain
processes, the different levels are not necessarily hierarchally organized. An
example of this is the 'cross-talk' between signal transduction mechanisms.
Levels of resolution.
In the above description (5), the size of
the spatial units has been left unspecified. Similarly, the level within the
biochemical network of the concentrations is not specified. Therefore, a
neurochemical network can be described at various levels of resolution.
In biochemistry, the level can vary from
overall parameters of energy consumption, parameters of neurotransmission (e.g.
spike frequency), turnover of individual neurotransmitters, to reaction rates.
The spatial units can be chosen to be a
brain structure, a set of neurons, an individual cell (glial cell or neuron) or
a subcellular compartment. Since biochemistry is strongly compartimentalized in
the brain, it will be assumed that the minimal spatial unit in the
neurochemical network is a compartment. Therefore, the neurochemical network
excludes a general formulation of reaction‑diffusion equations (Cross and
Hohenberg 1993), but allows a
discretized version.
Increasing the level of resolution.
A higher resolution introduces additional locations
and/or compounds. The indices of the sets of constants A1 and
A2 are subdivided to describe biochemical or spatial diversity
on a smaller scale. Cross-specification can also occur, where concentrations
are split off at either end of a spatial connection (factors from A1
enter A2 in a description at a deeper level), or locations
are split off from a biochemical connection (factors from A2
enter A1 in a description at a deeper level).
This splitting of A1 and A2
(and the corresponding splitting of higher-order terms left out from (5)) may
be considered an extension of the growing algorithms discussed in Section 2.
The description in (5) requires that
there are no cross-connections between compounds and locations. This
requirement can define the need to further increase the resolution. A level of
resolution where (5) can be used can then be pragmatically taken as an
admissible level of abstraction.
Independent data.
To test a neurochemical network model,
the independent data have to combine both spatial and biochemical information.
The spatial resolution of functional
neuroimaging yields data of high dimensionality, such that a direct comparison
with the dimensionality of independent data (Section 2) will not show that
additional variables have to be measured. The opposite is more usual: the range
of locations can be restricted to a 'region of interest', even for a single biochemical
variable.
However, given the spatial overlap of
different neurotransmitter systems in the brain, it is expected that more
biochemical information will be needed for a further distinction between brain
states. With the techniques currently available such a biochemically
diversified picture has to be constructed by combining data from different
techniques (e.g. nuclear magnetic resonance spectroscopy, where spatial
resolution is sacrificed, and/or positron emission tomography, with different
tracers). In this case, the
neurochemical network model forms a natural integration model to combine
several biochemical variables across locations.
Acknowledgement.
The author gratefully acknowledges
helpful suggestions by Dr. J. Nijhuis and Dr. B. Kappen, and Prof. Dr. F.
Takens who reviewed the statements on the use of the Takens embedding theorem
for more than one observable.
References.
Amari
S‑I. (1983): Field theory of self‑organizing neural
nets. IEEE trans. systems, man cybern. SMC 13. 741‑748.
Ash T.,
Cottrell G. (1995): Topology‑modifying neural network
algorithms. In: Arbib M.
A. (Ed.) The Handbook of Brain Theory and Neural Networks. The MIT
Press, Cambridge, Massachusetts.
Baek E., Brock W. (1992): A general test for nonlinear
Granger causality. Bivariate model. Working paper. Iowa State University and
University of Wisconsin at Madison.
Bell
A. J., Sejnowski T. J. (1995): An information-maximization approach to blind
separation and blind deconvolution. Neural Comp. 7, 1129-1159.
Bennett R.
S. (1969): The intrinsic dimensionality of signal collections. IEEE Trans. Inform.
Theory, 15, 517‑525.
Chavez M., Martinerie J., Le Van Quyen M. (2003):
Statistical assessment of nonlinear causality: application to epileptic EEG signals.
J. Neurosci. Meth. 124, 113-128.
Coull
J. T. (1998): Neural correlates of attention and arousal: insights from
electrophysiology, functional neuroimaging and psychopharmacology. Progr.
Neurobiol. 55, 343-361.
Cross
M. C., Hohenberg P. C. (1993): Pattern formation outside of equilibrium. Rev.
Mod. Phys. 65, 851‑1112.
Dekker
A. J. (1998): Neurochemical networks, nonlinear systems and functional
neuroimaging. Faculty report Technical informatics and mathematics, Delft
University of Technology.
Dumas
H. S.
(1995): A new proof of Anosov's
averaging theorem. In: Dumas H. S., Meyer K. R., Schmidt D. S. (Eds.)
Hamiltonian Dynamical Systems.
History, Theory and
Applications. Springer, New York.
Engel
A., Zippelius A. (1995): Statistical mechanics of learning. In: Arbib M. A.
(Ed.) The Handbook of Brain Theory and Neural Networks. The MIT Press, Cambridge, Massachusetts.
Flament
C. (1963): Applications of Graph Theory to Group Structure. Prentice Hall Inc.
Englewood Cliffs, New Jersey.
Gibson M. A., Bruck J. (2000): Efficient exact
stochastic simulation of chemical systems with many species and many channels.
J. Phys. Chem. A. 104, 1876-1889.
Glass L. (1995): Chaos in neural systems. In: Arbib M.
A. (Ed.) The Handbook of Brain Theory and Neural Networks. The MIT Press, Cambridge, Massachusetts.
Goel N. S., Richter-Dyn N. (1974): Stochastic Models in
Biology. Academic Press, New York.
Gourevitch B, Le Bouquin-Jeannes R., Foucon G. (2006):
Linear and nonlinear causality between signals: methods, examples and
neurophysiological applications. Biol. Cybern. 95, 349-369.
Grassberger P., Procaccia I. (1983): Characterization of
strange attractors. Physical Rev. Lett. 50, 346-349.
Guckenheimer J.
M., Holmes P. J. (1983): Nonlinear Oscillations, Dynamical Systems and Bifurcations of Vector Fields.
Springer, New York.
Hiemstra C., Jones J. D. (1994): Testing for linear and
nonlinear Granger causality in the stock price-volume relation. J. Finance 49, 1639-1664.
Hirsch M. W. (1989): Convergent activation dynamics in
continuous time networks. Neural Networks, 2, 331-349.
Kantz H., Schreiber T. (1997): Nonlinear Time Series
Analysis. Cambridge University Press, Cambridge.
Kolb B., Whishaw I. Q. (1998): Brain plasticity and
behavior. Ann. Rev. Psychol. 49, 43-64.
Kumer S. C., Vrana K. E. (1996): Intricate regulation of
tyrosine hydroxylase activity and gene expression. J. Neurochem. 67, 443-462.
Kurkova V.
(1995): Kolmogorov's
theorem. In: Arbib M. A. (Ed.) The
Handbook of Brain Theory and Neural Networks.
The MIT Press, Cambridge, Massachusetts.
Lehninger A. L. (1975): Biochemistry. Worth Publishers
Inc. New York.
Li J-Y., Chow T. W. S. (1996): Functional approximation
ofsimulation of chemical systems with many species and many higher-order neural
networks. J. Intell. Systems, 6, 239-260.channels. J. Phys. Chem. A. 104,
1876-1889.
Lorentz G. G. (1966): Approximation of Functions. Holt,
RinehartGillespie D. T. (2000): The chemical Langevin equation. J. Chem. and
Winston, New York.
Maass W. (1995): Vapnik-Chervonenkis dimension of neural
networks. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural
Networks. The MIT Press, Cambridge,
Massachusetts.
Marroquin J. L. (1995): Regularization theory and
low-level vision. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural
Networks. The MIT Press, Cambridge,
Massachusetts.
McKeown M. J., Jung T-P., Makeig S., Brown G.,
Kindermann S. S., Lee T-W., Sejnowski T. J. (1998): Spatially independent
activity patterns in functional MRI data during the Stroop color-naming task.
Proc. Natl. Acad. Sci. USA, 95, 803-810.
Parberry I. (1995): Structural complexity and discrete
neural networks. In: Arbib M. A. (Ed.) The Handbook of Brain Theory and Neural
Networks. The MIT Press, Cambridge,
Massachusetts.
Pincus S. M. (1991): Approximate entropy as a measure of
system complexity. Proc. Natl. Acad. Sci. USA. 88, 2297-2301.
Pincus S.,
Singer B. H. (1996): Randomness and degrees of irregularity. Proc. Natl.
Acad. Sci. USA. 93, 2083-2088.
Schauer M.,
Heinrich R. (1983): Quasi steady state approximation in the mathematical modeling of biochemical
reaction networks. Math. Biosci. 65, 155‑170.
Stam C. J., Jelles B.,
Achtereekte H. A. M., Rombouts S. A. R. B., Slaets J. P. J., Keunen R. W.
M. (1995): Investigation of EEG non‑linearity in dementia and Parkinson's
disease. Electroenceph. and clin. neurophysiol. 95, 309‑317.
Takens F. (1981): Detecting strange attractors in
turbulence. In: Rand D. A. and Young L. S. (Eds.) Dynamical Systems and
Turbulence. Lecture Notes in Mathematics. 898, Springer, Berlin.
Verveer P.
J., Duin R. P. W. (1995): An
evaluation of intrinsic
dimensionality estimators. IEEE
Trans. Patt. Anal.
Machine Intell. 17, 81‑86.
Wahba G. (1995): Generalization and regularization in
nonlinear learning systems. In: Arbib M. A. (Ed.) The Handbook of Brain Theory
and Neural Networks. The MIT Press,
Cambridge, Massachusetts.
Watt R. C., Hameroff S. R. (1988): Phase space
electroencephalo-graphy (EEG): a new mode of intaoperative EEG analysis. Int.
J. Clin. Monit. Computing 5, 3-13.
Wolf
A., Swift J. B., Swinney H. L.,
Vastano J. A. (1985): Determining Lyapunov exponents from a time series. Physica 16D, 285‑317.
Zaslavsky
G. M. (1985): Chaos in
Dynamic Systems. Harwood Academic Publishers, New York.
Zippelius
A., Engel A. (1995): Statistical mechanics of neural networks. In: Arbib M. A.
(Ed.) The Handbook of Brain Theory and Neural Networks. The MIT Press, Cambridge, Massachusetts.
Appendix: levels of abstraction and graph theory.
A level of resolution is an admissible level of
abstraction if further resolution does not lead to a rearrangement of the
levels in the system equations found up to that point. This can be specified as
follows: for any pair of variables x and y, where a path exists from x to y and leveln(x) < leveln(y)
at resolution n, there is no deeper resolution m > n at which levelm(x) > levelm(y).
[ The level of a variable is the length
of the shortest path from a given root point (Flament 1963, p25). If there are
alternative root points, the minimum of the total number of levels should be
taken. The result is not necessarily a minimal arborescence (Flament 1963,
p68), since variables can have multiple links from lower-level variables. If
the level designation of variables is not unique before and/or after an
increase in resolution, the criterion should hold for at least one of the
possible numberings of variables. However, a single numbering should be used in
all comparisons across levels of resolution. ]
Since there can always be deeper levels
of resolution, this criterion can only be applied in practice if there is a
maximum to the resolution, e.g. for the brain: the molecular level.
Using definitions from graph theory (e.g.
Flament 1963, p22-23), the nature of admissible levels of abstraction can be
described further.
A graph is a set of vertices and the
(directed) links between them.
A subgraph is a subset of the vertices
with their internal links. If graph
is a subgraph of
',
then
'
will be called a supergraph of
.
A reduced graph combines vertices into
new units. The links between vertices within such a unit are collapsed, the
links to outside vertices are preserved as links between these new units.
If
is a reduced graph of
',
then
' will be called a specified graph of
.
The subnets defined by Hirsch (1989) are
units in a reduced graph, collapsed over maximal connected sets of vertices.
(The maximal connected sets are the largest sets in which there is a path from
any vertex to any other.)
The relation between these types of
graphs and admissible levels of abstractions can be summarized in a
Venn-diagram (Figure 1).

Figure 1. Venn diagram of categories of
graphs.
=
subgaphs,
=
admissible levels of abstraction,
= reduced graphs,
= Hirsch subnets.
Specifically:
1)
Ç
¹ Æ. Reduced
graphs and subgraphs are partially overlapping. They are the same if the
reduced units can be thought to be produced by leaving out vertices. If the
vertices that are left out to produce a subgraph would have added external
links for a reduced unit, subgraphs are not reduced graphs. If the reduced
units combine vertices that are not connected when the units are specified,
reduced graphs are not subgraphs.
2)
É (
Ç
). Admissible abstractions
include the intersection of reduced graphs and subgraphs. Namely, in these
cases the supergraphs or specified graphs have additional vertices that form
only side-branches on paths of the existing vertices. Therefore, the levels of
the existing variables are not changed by adding new variables. The admissible
abstractions include additional cases, since there can be subgraphs (that are
not reduced graphs) where additional vertices do not rearrange the level of
variables. Similarly, there are reduced graphs (that are not subgraphs) where
specification does not rearrange levels.
3)
Ì (
È
). Admissible abstractions are
included in the union of reduced graphs and subgraphs. This inclusion is
strict, since specifying graphs or adding extra vertices can rearrange the
levels of variables.
4)
Ì
.
The Hirsch
subnets are included in the reduced graphs. Within this set they include some
subgraphs, admissible abstractions and remaining reduced graphs. [ The Hirsch
subnets do not span these three categories, since the non-connected subgraphs
and the reduced graphs resulting from
combining non-connected vertices form the additional elements of these
categories. ]
In particular, Hirsch subnets form admissible
abstractions if the 'fan-out' = 1, that is: if there is no downstream
branching. For multiple downstream paths, the subnet may form connecting paths
of different lengths and so rearrange the levels.
In theory, too many levels of resolution may be
admissible abstractions. The criterion can be made more strict in several ways.
First, one can use the subnets defined by Hirsch (1989). However, as this
criterion collapses circuits, it does not allow a distinction in levels within
a circuit (only between circuits). In the extreme, all levels are collapsed if
a single overall feedback loop is introduced. A second way to sharpen the
criterion is to drop the requirement that there is a path from x to y:
the order of the levels should be unaffected for any pair of variables. This will
exclude cases where increasing resolution produces subnets etc. of different
sizes in different paths. However, this will also have the effect that fewer
subnets will be admissible levels of abstraction. It is therefore expected that
the present criterion will be more suitable.
Back to General
introduction Next
Chapter