Revolutionary new technology from Evolving Logic


Parametric Modeling of the Temporal Dynamics of Neuronal Responses Using Connectionist Architectures

Steven C. Bankes and Daniel Margoliash

Summary and Conclusions
   1. We describe an exploratory approach to the parametric modeling of dynamical (time-varying) neurophysiological data. The models use stimulus data from a window of time to predict the neuronal firing rate at the end of that window. The most successful models were feedforward 3-layered networks of input, hidden, and output 'nodes' connected by weights that were adjusted during a training phase by the backpropagation algorithm. The memory in these models was represented by delay lines of varying length propagating activation between the layers. Connectionist models with no memory (1 sequential node per layer) as well as zero-memory non-linear non-connectionist models were also tested.

   2. Models were tested with recordings of neuronal activity from the auditory thalamic nucleus ovoidalis of urethane-anesthetized zebra finches (Taeniopygia guttata). All cells reported here showed phasic/tonic responses. Extensive modeling of one neuron (Cell 1) defined a 'canonical' architecture which was most successful in modeling this cell. The canonical model had a zero-memory input layer, a hidden layer with 29 bins representing 185.6 msec, and a single output node whose value as a function of sequential bin position represented the output of the neuron as a function of time. The canonical model achieved convergence on the entire data set for Cell 1, including responses to single tone bursts and zebra finch songs. The 'frequency' weights of the canonical model matched well excitatory and inhibitory frequencies for Cell 1 as determined by the cell's frequency tuning curve. The 'memory' weights of the canonical model were dominated by excitation over the first 25 msec followed by inhibition.

   3. When trained with only the dynamical responses to tone bursts, the canonical model also accurately predicted the responses to song (average R2 =0.823). Thus, for this neuron the responses to single tone bursts were sufficient to predict most of the responses to 6 different songs, each presented at 3 different amplitudes, although a Monte Carlo procedure indicated the residual variance was not just due to noise (p <0.001).

   4. The model's ability to predict the responses to song was further explored by altering the tone burst (training) data. For Cell 1, the responses to songs were most strongly related to the phasic/tonic details of the temporal responses to tone bursts. Changes in the characteristic frequency, frequency tuning curves, and rate/intensity function of the neuron had less effect. These experiments would be difficult or impossible to conduct electrophysiologically. The modeling of neuronal temporal dynamics for this cell therefore gave insight into relationships between response properties that would otherwise not have been experimentally tractable.

   5. A total of 16 other neurons were tested with the canonical architecture originally derived for Cell 1. For 9 of these, the canonical architecture converged on the entire stimulus set with good (mean R2 = 0.774) to excellent (mean R2 = 0.924) results. When a match between frequency weights and a cell's frequency tuning curve was explicitly implemented, one of the remaining 6 cells converged well and two others partially converged. Presumably, further attempts to adjust the model architecture would have produced better results, and for more of the cells.

   6. A Monte Carlo procedure indicated that for all models, whether trained just with tone bursts or the entire stimulus set available for that cell, the remaining variance between a model's prediction and the corresponding neuronal response could not be explained on the basis of random fluctuations in the neuronal response. It is unclear whether the remaining variance results from higher-order non-linear interactions (e.g., two-tone interactions) and/or from inadequate model architectures.

    7. The ability to model neurons with similar classical properties using a single parametric model demonstrates that connectionist modeling approaches can provide insight into relationships among complex sets of electrophysiological data (e.g., responses to tone bursts and complex stimuli) that would have otherwise proven difficult to obtain. With the techniques reported here, for models that converge it is possible to apply quantitative and statistical rigor to assertions regarding the predictive power of a stimulus repertoire.


Introduction
   The interpretation of experimental neurophysiological data is often challenging due to the complexity and variability of CNS behavior and the complexity and variability of the stimuli necessary to assess that behavior. Computational techniques for assisting with this problem are welcome, as data that is technically difficult to obtain may not be fully exploited and contain hidden implications that are not apparent. One approach to providing computational assistance is to use experimental data to train parametric models to mimic observed input/output relationships. A parametric model that captures the input/output behavior of the data to within observed variability can serve as a terse representation of observed patterns. Insights about the data may be drawn both from the sufficiency of a model architecture to capture the relationships inherent in the data and from the parameter values of the trained model. A survey of the range of model architectures adequate to model the data can further serve to constrain speculation about the representational significance of the observed neuron and plausible neural mechanisms responsible for the observed behavior.

While parametric models may be constructed using a variety of mathematical structures, there has been increasing use of non-linear connectionist models for interpreting neurophysiological data (e.g., Lockery et al., 1989) since the discovery of a simple method known as backpropagation for sub-optimally setting parameter values (often called weights) (Rumelhart et al., 1986). Some of the previous studies using this approach have either developed models of average firing rates (Lehky and Sejnowski, 1988; Zipser and Andersen, 1988) or have developed temporally varying models that imitate the general character of temporal patterns of activity observed in neurons (Anastasio, 1991; Zipser, 1991). In contrast, in this paper we report on models that capture the details of the temporal dynamics of the responses of neurons in the auditory thalamus of zebra finches (Taeniopygia guttata).

Significant information may be contained in the dynamical (time varying) behavior of single neurons. Numerous acoustic behaviors involve recognition of temporal patterns in vocal signals, and in bats, songbirds, and other systems neurons selective for frequency modulations and/or temporal combinations of sounds have been described (see Konishi, 1978). Analyzing the temporal properties of neuronal activity has in general been impeded, however, by difficulties in constructing models capable of capturing salient response properties from physiological data.

In this paper we demonstrate a modeling approach capable of capturing the temporal dynamics of neuronal responses. In the work reported here, we use connectionist models that represent temporal dynamics by delay lines between nodes. These models are fundamentally similar to the Time Delay Neural Network (TDNN) connectionist models that have been used in studies of speech recognition (Waibel, 1989). We have constructed parametric models of the time varying responses of cells in the auditory system of zebra finches to artificial and natural stimuli. This paper describes in detail the methodology and results of these models on data from one cell, and the general behavior of the most successful model on data from a collection of similar cells.


Methods
Biological data

   The modeling studies reported here are based upon single unit extracellular response data from neurons in the thalamic auditory nucleus ovoidalis of urethane-anesthetized zebra finches (Diekamp and Margoliash, 1991, unpublished data). Details of the recording procedures can be found elsewhere (Margoliash and Fortune, in press). Ovoidalis neurons exhibit vigorous responses or profound inhibition to tone and noise bursts, harmonic stacks, FMs, and other artificial stimuli, depending on frequency and amplitude. In the zebra finch, several classes of neuronal responses to tones have recently been described. The frequency tuning curves (FTCs) of ovoidalis neurons may be sharply tuned or broadband with single or multiple peaks (Diekamp and Margoliash, 1991, unpublished data). Additionally, responses can be tonic, phasic, or phasic/tonic. This report examines the responses of the seemingly simplest class of neurons in ovoidalis ? those with tonic or phasic/tonic responses and single-peaked FTCs. We have yet to make a systematic attempt to model other classes of neurons.

In this paper we report in detail the modeling of a single ovoidalis neuron (zf_grn23/5/1: "Cell 1"). The most successful model for Cell 1 was applied to 16 other cells. All 17 cells were chosen on the basis of similar response morphologies and FTCs (see Results). The available database of ovoidalis neurons represented the times of spikes in response to a range of both artificial and natural auditory stimuli. These stimulus-response pairs were used to train connectionist models using supervised learning procedures, described below.

Cell 1 was presented with 97 different stimuli, including 45 tone bursts (200 msec duration, 500 Hz and 10 dB resolution) that spanned the neuron's FTC (excitation at 4 kHz to 7.5 kHz at 80 dB, characteristic frequency at 6.0 kHz, 20 dB; see Fig. 1), 3 tone bursts of different duration (25, 50, 100 msec at 70 dB), 6 broadband noise bursts (200 msec duration, 30-80 dB, 10 dB steps), 20 presentations of various sequences of two tones, 6 presentations of various two-tone combinations, 20 presentations of 6 zebra finch songs including the bird's own song (1212 - 2148 msec; 60, 70, 80 dB), 1 presentation each of the conspecific song zf_hv_17 at 80 dB reversed and with a modified amplitude envelope (constant 70 dB amplitude with 10 msec onset/offset ramps for each syllable), and finally 1 presentation of broadband noise with the amplitude envelope of zf_hv_17. (All amplitudes are referenced to 20 µPa = 0 dB.) Each presentation comprised 20 repetitions. The stimuli were sampled at 20 kHz and digitally recorded/synthesized with a resolution of 12 bits. (In later experiments, for cells other than Cell 1, songs were recorded with a resolution of 15 bits and stimuli were delivered with a resolution of 16 bits.) Stimuli were delivered once per second (all artificial stimuli) or once every 3 sec (natural and modified songs). The times of occurrence of spikes were recorded with 50 µsec resolution.

Cell 1 was a relatively high spontaneous rate neuron that exhibited a tonic response to excitatory tone bursts with a slight phasic onset (e.g., Fig. 1a). The neuron had a single-peaked FTC (Fig. 1b) with a characteristic frequency of about 6.0 kHz, weak inhibitory sidebands, and a sloping ('soft') saturation. Inhibition was stronger on the high-frequency side. Zebra finch songs contain a wide range of complex acoustic elements, including high notes, harmonic stacks, FM sweeps, broadband but structured sounds, etc. (e.g., Fig. 1c). In response to such complex stimuli, Cell 1 exhibited complex responses (e.g., Fig. 1c). Essentially, the goal of this study was to develop insight into the relationship between the types of responses observed in Figures 1a and lc.

[ Insert Figure 1 around here ]


Modeling procedure
    Prior to being used in modeling experiments, the data for each cell was preprocessed to produce spectrogram and amplitude representations of the input, and firing rate (average and variance over all presentations) vs. time representations for the output. For spectrograms, auditory stimuli were parceled into 50% overlapping windows of 256 samples each. Each window was fast Fourier transformed (FFT) using a Hanning filter to produce power spectra from 0 to 10 kHz with 78.125 Hz bins and 12.8 msec temporal resolution. (Note that the time interval between successive windows was 6.4 msec.) Only power bins in the range 500 Hz to 8 kHz were used in the modeling. For amplitude, the root-mean squared (RMS) value for each window was encoded as a "thermometer" code. Each node of the thermometer code was set either on (1.0) or off (0.0) depending upon whether the RMS amplitude for that window exceeded a specified dB level. Finally, for firing rates, histograms of neuronal response were generated with 6.4 msec resolution. The 20 stimulus presentations were used to compute both an average response and a variance for each 6.4 msec window. For Cell 1, this resulted in 97 pairs of stimulus and associated response, with variance. Each pair consisted of stimulus and response at a series of points in time for the duration of the repetition (1 or 3 sec for Cell 1).

The resulting stimulus-response pairs for Cell 1 were used for a series of modeling experiments. An individual experiment consisted of: 1) selecting a model architecture (see below), 2) segregating stimulus-response pairs into either of the training or test data sets, 3) using the stimulus-response pairs in the training set to determine values for model parameters, 4) evaluating the model?s performance on members of the test set. Steps 1 and 2 specify a modeling experiment, which was carried out in step 3, and evaluated in step 4. For a particular architecture selected in step 1, several different experiments could result from the variation of the training set/test set partition (step 2). For example, a model might initially be assessed by incorporating all stimuli in the training set. If the model can achieve good performance when trained on all data, subsequent experiments could investigate the minimum training set that suffices to allow the model to predict the cell's response to stimuli in the test set.

A particular modeling experiment involved computing values for free parameters in the model that minimize the squared error between model prediction and actual response. This (local) minimization was accomplished using a version of gradient descent based on back-propagation (Rumelhart et al., 1986). This procedure involves iteratively presenting stimuli of members of the training set to the model and comparing the model?s outputs with the cell?s firing rate that was actually observed. Model parameters are adjusted incrementally in a direction chosen to reduce the sum squared error between model prediction and observed response. The squared error is computed for all time bins for all members of the training set. Thus, the optimization algorithm attempts to match the model output with the trajectory of the neuronal response over time.

Across many cycles through the training set ('epochs') the parameters settle to values that minimize the sum squared error over some local region of parameter space. In many experiments, the squared error of the training set continues to decline to asymptotic limits for as many epochs as the experiment is continued. Squared errors of the test set may exhibit similar asymptotic trajectories, but often achieve a minimum at some point, and henceforth increase for subsequent epochs. In the experiments reported here, we use model parameters taken from the point of best performance on the test set. This occurred at points varying from 100 to 5000 epochs.

In general, the backpropagation technique will find multiple solutions for a given architecture and training set, corresponding to multiple local minima in the weight space, depending on the initial values of the weights. In our experiments, the initial weight values in the range [-0.25,+0.25] were chosen based on a pseudorandom sequence. In the series of experiments reported below, as other architectural features of the models were varied, the initial values of the weights were not altered. We also tested the effects of changing the initial weights while holding other details constant. For most cases, our best results as measured by the average squared error (ASE - the square root of the sum squared error/number of time bins) could be obtained by several different pseudorandom sequence seeds, giving nearly identical final ASE scores. Where different results were obtained, performance was typically much worse than the better result. These observations suggest that the architectures we used typically described a weight space dominated by a single minimum. The existence of a prominent minimum may be related to the relative sparseness of the architectures we explored (e.g., 2 nodes, 66 free parameters - see below) compared to the massive quantity of data to be modeled. Because of the complexity of these data and the resultant models, however, even where large numbers of different initial weight values are used it is impossible to categorically assert that the global minimum has been achieved.

The goodness of fit of the model (step 4) was evaluated using a test set of stimulus-response pairs not used in training, which provides a check against overfitting of the data. In addition to subjective evaluation of fit based upon comparison of the predicted vs. actual response profile, several statistical measures were employed to evaluate the performance of models. ASE was used for relative evaluation of model alternatives. ASE is useful in comparing between two modeling experiments on the same data set, but may vary markedly between datasets. Hence, ASE is not a good measure to compare results from different cells. Instead, we used R2 (defined as the regression sum of squares divided by the total sum of squares1) to evaluate how much of the variability in a cell?s response at different times was explained by any given model. The R2 measure was useful both in comparing results between cells (Table 1) and in comparing the effects of parametric alterations of the training set for Cell 1 (Figs. 5-8).

A Monte Carlo approach was developed to evaluate the null hypothesis that a particular model completely explained the data within observed variability. The logic of the Monte Carlo procedure is as follows. If the fitted model fully captures the data, then deviations of the data from the fit should be no larger than the sampling variability. We used an estimate of sampling variability as the variance among the responses to the 20 repetitions of each stimulus. On this basis, 1000 simulated test data sets were generated by adding sampling variability to the real data set. For each such simulated data set, we computed a measure of merit (ASE), thereby obtaining a simulated probability distribution. Assuming the fitted model is true (the null hypothesis), the observed (real) measure of merit should be near the mean of the simulated distribution. If the observed measure of merit is in the tail of the simulated distribution, then a statistically significant component of variability is not being captured by the model. This approach mimics the usual logic of significance tests and is an example of the bootstrap procedure (Efron, 1982; Efron and Tibshirani, 1986). While more sophisticated statistical measures would be welcome, the methods used here have proved successful and are of general applicability.


Architecture of the canonical model
   A variety of architectures were tested using this procedure. The architecture that has proven most capable of modeling this data (the 'canonical' model) is depicted in Figure 2. This architecture can be described as a two node, feedforward network, utilizing an array of delay lines with differing lengths to provide the output node with the outputs of the hidden node over a window of time. (Here we adopt terminology to be consistent with the connectionist literature.) Each output of the hidden node (input to the output node) is multiplied by a weight (free parameter) and the weighted sum of these inputs (plus a bias weight adjusted to produce the spontaneous rate of the cell in the absence of any input) is transformed by the sigmoidal transfer function f(x)=1.0/(1.0+e-x) to produce the model's prediction for that time bin. (Note that the sigmoid is a form of a saturating nonlinearity.) For the output node, the range of the sigmoid is 0.0 to 1.0. Model predictions are therefore in the range 0.0 to 1.0 and are compared to actual neuronal responses scaled to lie in the range 0.1 to 0.9. This scaling is chosen because weights must approach ±"x for the sigmoid to approach 0.0 or 1.0. The input representation sampled by the hidden node for each time bin consists of nodes representing the bins of the power spectra of the stimulus plus additional nodes encoding the RMS amplitude of the stimulus represented as the thermometer code. The value of each of these input nodes for a given stimulus at a given time bin is multiplied by its associated weight, and the transformed weighted sum is the output of the hidden unit for that time bin. The range of the sigmoid for the hidden node is -0.5 to 0.5, and there is no bias term for the hidden node.

[ Insert Fig. 2 around here ]

The architecture of the canonical model is temporally invariant, capable of responding appropriately to stimuli that arrive asynchronously. The outputs of the hidden unit are integrated together across a window of time bins (29 bins wide in the canonical model; see below) by an output node whose output is the model's prediction for the firing rate of the last bin2. The output of the output node represents the model's prediction of a firing rate for each time bin, based upon the inputs for a window of previous bins. Note that it is equivalent to think of the model either spatially, with the model's temporal window being "slid" across the transformed input data to produce a response profile , or temporally, with inputs for points in time being presented sequentially to the model and the time-varying model output producing the response profile (as in Fig. 2a).

For the canonical architecture shown in Figure 2, the width of the window of temporal integration for the output node, the coarseness of the power spectra bins, and the resolution of the thermometer code, are all parameters whose optimal values were determined through experimentation. Various alternative connectionist architectures were also examined including variations of the input representation and variations of the numbers of hidden units and their individual windows of temporal integration (these widths are 1 in the canonical model; see Figure 2). Non-connectionist (linear static tuning curve based) and static (non-TDNN) connectionist models were also tested. Additional computational experiments were conducted to explore the impact of the choice of training set on the outcome of the modeling and on the salience of particular characteristics of the responses to training set stimuli for successful prediction on the test stimuli set.

The backpropagation code was developed in the C programming language. Non-connectionist models were developed in Mathematica (Wolfram Research), and additional signal processing was implemented with the MATLAB program (The Math Works), both running on a DECStation 5000 (Digital Equipment Corp.). Experiments were run on Sun Microsystems computers, a Sparcstation II (University of Chicago) or a Sparcstation I (RAND). On the faster machine, a large run with a 400 element training set required approximately 10 hrs of CPU time for 1000 epochs.


Results
Canonical model

   Best results were obtained using the canonical architecture of Figure 2. The model giving best performance had a temporal window 186 msec (29 bins) wide. Inputs in a given time frame consisted of 32 nodes whose real values represent the power density in frequency bins plus an additional 11 nodes that coded for the RMS amplitude of the signal in that frame. The frequency bins were 234.375 Hz wide (derived from combining three FFT bins with 78.125 Hz resolution) and spaced between 500-8000 Hz. The RMS amplitude was represented by a thermometer code in 11 steps of 5 dB, from 30 dB to 80 dB.

The model generalized well when the training set contained tone and noise bursts (see below). The final values of the free parameters for the case where training was on both noise and tone bursts are shown graphically in Figure 3. Output weight values (Fig. 3a) show strong excitatory influence of inputs with small delays, and varying smaller effects for inputs with greater delay times (see Discussion). The input weights for frequency power bins show large positive values surrounding the characteristic frequency of Cell 1 (6.25 kHz), and small positive values for weakly excitatory frequencies (2 kHz) (Fig. 3b). A center-surround organization is apparent such that weights with positive values are surrounded by weights with negative values. The values for the frequency input weights, however, do not exactly match the frequency-dependent response of the cell at any given amplitude. When the frequency data is collapsed across all amplitudes, the magnitude of the resultant measure of total spikes per frequency (adjusted for spontaneous rate) matches the positive input weights better than the negative input weights (Fig. 3b). Alternation also occurs in the inputs weights for the RMS amplitude (Fig. 3c; see Discussion).

[ Insert Fig. 3 around here ]


Ability to generalize across stimuli
    When trained on all stimuli, the canonical model achieved a close fit to the data, with an ASE of 0.0685. Almost identical performance was achieved when the model is trained only on tone and noise bursts, with an ASE across all stimuli of 0.0626. (The ASE for the test set, not used for training, was 0.0742.) A graphical comparison of model prediction versus actual response for a selection of stimuli from the test set for this case is shown in Figure 4. The ability of the model to predict much of the variability in the cell's response to birdsong when trained only on tone and noise bursts is confirmed by statistical measures. R2 values for all birdsongs (test stimuli) were in the range 0.7211 to 0.9628. (The stimuli in Figure 4 are typical, not best results, with R2 values of 0.8387 to 0.9094.) However, the residual difference between the cell's responses and model predictions are not completely explained by neuronal variability. A Monte Carlo simulation rejects the null hypothesis that the cell is the model plus noise (p<0.001)3.

[ Insert Fig. 4 around here ]

Training the model only on tone bursts results in somewhat poorer generalization, with an ASE on the test set of 0.0885. Excluding from the training set tone bursts that were not strongly excitatory has a small effect when the training set includes noise bursts (test set ASE of 0.0768 vs. 0.0742) but has a much greater effect when it does not (test set ASE = 0.0943).


Relative importance of temporal dynamics
   Modeling the temporal profile of the cell's response to artificial stimuli appears critical for accurate prediction of response to bird song. This conclusion is supported by both manipulations of the canonical model and experiments with alternative model architectures. Reducing the resolution of the input representation or the width of the temporal window produced poorer results, although the deterioration was gradual. When the temporal window of the output layer was reduced to a single time bin (zero memory), however, performance degraded sharply (ASE = 0.0989). Similar results were obtained using a non-connectionist model based on Cell 1?s tuning curve. In this model the tuning curve derived from average firing rate responses to tones was used to predict responses to complex stimuli by summing across frequencies. Stimuli were analyzed as the sum of tones at various frequencies and amplitudes (the Fourier transform), and the response predicted to be the sum of the responses to those individual tones. Such models also make poor predictions, even when saturation effects are accounted for by the use of a squashing function (ASE = 0.1329). Equivalent models make similarly poor predictions for cat DCN type IV neurons (Spirou and Young, 1989).

Using the canonical model trained only on tone bursts, we also examined the relative impact of alterations to the training set upon performance of the model. The performance was assessed by comparing predictions of responses to stimuli not used for training (the test set) based on unaltered and altered training sets. In one set of experiments, the model was trained on data constructed to mimic a cell with identical dynamical responses as Cell 1 but with a tuning curve with a different characteristic frequency or amplitude sensitivity. The training data for these tests were produced by associating observed outputs for tone bursts with different tone stimuli so that the net effect was to shift the frequency tuning curve (FTC) of the data used to train the model. Shifting the FTC in amplitude up to 20 dB in either direction had a modest and gradual effect on model performance (Fig. 5a). Shifting the FTC in frequency by ±500 Hz produced a modest degradation in model performance, but greater shifts in frequency had a large effect (Fig. 5b). The width of the "frequency shift tuning curve" is thus comparable to that of the FTC itself.

[ Insert Fig. 5 around here ]

In a related set of experiments, the rate/intensity function presented to the model was varied without altering the temporal profile of the response or the Q of the FTC. This test was accomplished by scaling firing rates in the training set depending upon the amplitude of the stimulus. Model performance was also relatively insensitive to these alterations (Fig. 6).

[ Insert Fig. 6 around here ]

In contrast, the model's predictive power was dramatically degraded by manipulations of temporal aspects of the training set. Figure 7 shows the results of altering the training set to mimic cells with varying latencies. This was accomplished by altering the temporal registration of the stimulus response pairs. Model performance was highly sensitive to these alterations, with strongly degraded performance for shifts of one or more bins (1 bin = 6.4 msec). We also altered the adaptation characteristics of the data used for training the model. This was accomplished by multiplying the response profile of the training set members (responses to tone bursts) by various time-varying masks. Multiplication by linear ramps had an impact on performance that varied with the slope of the ramp (Fig. 8). Performance was less degraded for ramps with negative slopes (preserving the initial, phasic portion of the response) than for ramps with positive slopes (Fig 8; compare 1 to 3, 2 to 4). Directly eliminating the phasic part of the response, however, had less impact than did eliminating the tonic plateau (Fig 8; compare 5 to 6, 7 to 8).

[ Insert Figs. 7 and 8 around here ]


Ability to generalize across cells
    As a preliminary test of the general applicability of the canonical architecture, we tested 16 other ovoidalis cells. The 16 cells were chosen based on having relatively simple phasic/tonic excitation with dominant single-peaked FTCs. Within the sample, there was a broad range of classical physiological properties, including characteristic frequency (CF), sharpness of frequency tuning (as assessed by Q10 and Q50; see Suga and Tsuzuki, 1985), spontaneous rates, and latencies (Table 1). (Latencies were measured with 0.1 msec resolution as the time to 50% of the first peak of response above spontaneous activity to the tone at CF that elicited the greatest number of spikes, typically 60-80 dB.) Each cell had been presented with a very broad range of stimuli, including complete testing with 50 msec tone bursts (250-7250 Hz; 10-80 dB, "xf=250 Hz, "xa=10 dB) and 5 songs presented at 5 different amplitudes (40-80 dB, "xa=10 dB). The total number of stimuli ranged from 358 to 422. For each cell, we trained with the entire data set. That is, there was no test of generalization. There was also no attempt made to adjust the canonical architecture derived for Cell 1 on a cell-by-cell basis.

[Insert Table 1 around here]

Of a total of 17 cells (including Cell 1), good convergence was achieved for 10 cells with the canonical model and randomly assigned weights (see below), as assessed by visual inspection of responses to songs. We verified our subjective assessment by calculating R2 values for all responses to songs for all 10 cells. In all cases, average R2 values were very high (> 0.75; Table 1). Note that the R2 values for Cell 1 was about average for the group, which is consistent with our subjective impression of the goodness-of-fit of the data. Thus, the match between neuron and model depicted in Figure 4 is about average for these 10 cells.

The 7 cells for which the model initially failed to converge (Cells 4, 6-10, 13) tended to include cells that had FTCs with higher Q values than Cell 1 (Q10: 3 of 5 cells; Q50: 5 of 9 cells) (Table 1). There were no other obvious classical measures of response properties common to all cells for which the model converged or failed to converge. All four cells with higher Q50 values than Cell 1 that converged exhibited some degree of matching between the input weights for frequency power bins for the cell and the FTC of the cell, with a substantially better match for Cell 5 than the other three cells (Cells 2, 11, 12). In contrast, cells that converged that had lower Q50 values than Cell 1 exhibited only poor matches or no simple relationship between a cell's input weights for frequency power bins and FTC. Based on these observations, we ran a second test on all 7 cells, initializing the input weights for frequency power bins based on each cell's FTC4. With this adjustment, Cell 6 also showed good convergence (mean R2: 0.789). The FTC of Cell 6 also had a high Q (Table 1). After the adjustment, two other cells (Cells 4 and 9) showed substantial improvement, with excellent convergence for some songs, but poor convergence for others.

In summary, we were able to demonstrate convergence for 11 of 17 (65%) cells, and partial convergence for 2 of 17 (12%) of cells. Given that we had explored numerous models before arriving at the canonical architecture, and that the duration of tone burst stimuli for the 16 cells other than Cell 1 was 50 msec rather than 200 msec, it is likely that with further analysis more of the cells would exhibit convergence, and converge better.


Discussion
Details of the modeling

   The architecture. The canonical model architecture was arrived at through extensive experimentation with models using the data from Cell 1. In addition to variation of the window of temporal integration and the resolution of input representations in the canonical architecture, more extreme architectural variants were also investigated. In particular, we examined architectures that integrated over time at the hidden layer or had multiple hidden units whose activities were integrated at the output layer. The canonical architecture had better performance than any variants tried. The superior performance of the canonical architecture may result from having the fewest free parameters of any model of equal temporal window size that covers the same range of frequencies and amplitudes. Minimizing the number of free parameters will generally improve the chances that a parametric model such as those used here will successfully interpolate to data it has not been trained on, or extrapolate to data that is different in kind from any data used in training.

The need for the 'thermometer code' (instantaneous amplitude) component of the input representation was arrived at empirically. Architectures with only 'sonograph' inputs gave some evidence of generalization for some songs, but not at all amplitudes. Since the hidden node sums the power across all frequency bins of the sonograph (i.e. derives the amplitude information), the need for independent information about the waveform may seem surprising. Notice however, that adjusting the weights of the sonograph input on the basis of amplitude criteria results in frequency-by-amplitude interactions. Addition of the thermometer code representation permits the model to independently adjust response to both frequency and amplitude. However, the amplitude information in the sonograph inputs is also important to achieve optimal performance of the canonical model. Subsequent experiments where the sonograph inputs were set to 0.0 or 1.0 depending upon whether the instantaneous energy exceeded a threshold (20, 30, or 40 dB) produced results substantially worse than the canonical model.

A explanation of the significance of all weight values in the model is the ultimate description of the model's behavior. Although we cannot provide an explanation for the precise values of each of the weights, some conclusions can be reached for Cell 1. The output bias weight of the model is directly related to the background firing rate of the cell. With no inputs, the model?s output will be the sigmoidal transform of the output bias weight. A negative bias weight is required to produce a background firing rate less that 0.5. The other output weight values show strong excitatory influence of inputs with small delay times, and varying smaller effects for inputs with greater delay times. Understanding this pattern is facilitated by considering the cell's responses to tone bursts, which exhibits a phasic onset followed by a gradually decaying tonic excitation throughout the duration of the stimulus. While a recurrent model (model with feedback loops) or an exponential process might allow for more precise modeling of such a phenomenon, a TDNN model with finite window width must approximate an adapting tonic response by a finite series of terms. The large initial peak in the weights produces the phasic portion of the cells? response. In order to correctly predict the declining response immediately after the phasic peak, the immediately succeeding weights must be inhibitory. Subsequent weights are essentially a series of "correction terms" balancing inhibition against excitation to approximate the gradually adapting tonic response. A similar consideration applies to the trailing edge of the stimulus. For time frames in the offset inhibition after the trailing edge of the stimulus, the model?s predictions must be based upon the weights associated with the greatest delay. Hence, the final output weights must be negative. In general, the values of the output weights are consistent with a process that balances excitation and inhibition, together with edge effects. Notice that for training on noise and tone bursts for Cell 1, the window size is smaller than the duration of the stimuli in the training set. Consequently, the weights must optimize model performance over both those time frames where the window is entirely within the stimulus and those where the window includes one edge of the stimulus.

Similarly, the input weights for frequency power bins and RMS amplitude must approximate the actual response surface through a series of weighted terms. The values of frequency power bin weights in the model for Cell 1 match the FTC of Cell 1 reasonably well. The weights have large positive values for strongly excitatory frequencies, smaller negative values for surrounding inhibition, and weights of smaller and varying sign elsewhere. Initializing the frequency weights to match the FTC of a cell can help the model to converge, thus this relationship is not trivial. The match between the magnitude of positive weights and total spikes across all amplitudes elicited for frequencies near the characteristic frequency is relatively good. In contrast, the negative weights have greater magnitude than predicted by spike counts at inhibitory (or weakly excitatory) frequencies. In general, in neurophysiological recordings the dynamic range available to assess inhibitory phenomena is low, and the results of the model suggest that Cell 1 exhibits stronger inhibitory influences than a simple spike count might suggest.

Alternation of sign occurs in the inputs weights for the RMS amplitude, which are presumably capturing an approximation to the rate/intensity function for Cell 1. Note however that the stimuli in the training set all had amplitudes that were some multiple of 10 dB. Thus the large values for weights at 40, 60, and 80 dB and the alternation of weights could have been in part an artifact of the structure of the training set. To test this hypothesis, two subsequent experiments adjusted the range of the thermometer code (25-75 dB, "xa=10 dB, and 27.5-77.5 dB, "xa=10 dB) so as to not match the amplitudes of the training set stimuli. The former manipulation changed the phase of the alternation (at what weight it started) whereas the latter manipulation did not, but in both cases alternations at 10 dB intervals were still prominent. When the canonical model was trained only with stimuli at 30, 50 , and 70 dB, then the alternations occurred at 20 dB intervals. Apparently, the frequency of alternation of inputs weights for the RMS amplitude is related to the amplitude granularity of the input stimulus data. Alternation may result from a need to balance the influence of strongly positive weights with negative weights in a fashion analogous to curve fitting, but obviously this explanation can only be considered tentative.

Modeling methodology. The use of a test set insures that model results are not specious due to overfitting effects. In many modeling studies, the use of test sets has been confined to checking that the model successfully interpolates. Typical studies have used 90% of the data for training, reserving only 10% for test purposes. In our modeling studies we have a surfeit of data, making overfitting less of a danger. For example, the data for Cell 1 contains 9734 data points (time bins) while the canonical model has only 66 free parameters. Since the data points are not necessarily independent, however, goodness of fit must still be carefully assessed (see below).

In the results reported here, we have made use of the partition into training set and test set for an additional purpose. The successful extrapolation of the canonical model, predicting the responses of Cell 1 to complex bird song stimuli when trained only on the responses of Cell 1 to relatively simple artificial stimuli is a more profound result than is successful interpolation to similar type of stimuli. This has allowed us to reach conclusions regarding the sensitivity of Cell 1 to perturbations of its temporal, frequency, and amplitude response characteristics that would otherwise not have been possible. In general, the search for minimal training sets still capable of generalization can be rewarding, since demonstrations of extrapolation have greater implications for the meaningfulness of the model than do demonstrations of interpolation. Note in this regard that while we have demonstrated that the canonical model can fit data obtained from 11 of 17 (65%) phasic/tonic cells and partially fit data from two of the remaining 6 cells, at this time the extrapolation result has only been demonstrated for Cell 1. Successful convergence across so many cells in the absence of any attempts to adjust the canonical architecture for each cell suggests that the architecture is reasonably robust for this class of ovoidalis neurons. Presumably, detailed exploration of each cell can lead to better and more universal convergence, and will presumably yield a number of cells which exhibit the generalization (extrapolation) demonstrated for Cell 1.

Statistical verification. Given the intrinsic variability of neuronal response data, the use of statistically valid measures of goodness of fit is critical for any parametric modeling effort. Without such a figure of merit, modeling efforts must appeal to subjective assessments of significance, which are prone to both positive and negative bias. Many previous attempts to use connectionist architectures to model static or dynamic neuronal response properties have not assessed the statistical significance of the results (Anastasio, 1991; Lehky and Sejnowski, 1988; Zipser, 1991; Zipser and Andersen, 1988). This may result in part from the inapplicability of existing statistical distribution theory to connectionist models.

Classical statistical measures of significance were not useful for our modeling due to a variety of problems. For example, use of the chi-squared distribution requires an accurate estimate of the correct number of degrees of freedom in the data. The number of data points in our training data set, however, was a function of the total duration of all stimuli. Presumably, the underlying 'true' number of degrees of freedom is not linearly related to the duration of tone burst stimuli used to test Cell 1. Furthermore, instantaneous neuronal firing rates are dependent on previous events; successive PSTH bins do not represent statistically independent samples. Without a reliable means to estimate the degrees of freedom in the data, the chi-squared estimate of significance cannot be used to assess significance.

Although the R2 figure of merit does not provide an absolute measure of significance it was useful because it provided quantitative support for the visual impression of the success or failure of a model. R2 is in general more useful than ASE because R2 can be compared across cells, but both R2 and ASE are useful in comparing the relative effects of different data sets for a given cell. Interpretation of the absolute values of such figures of merit should be approached carefully, however. For example, the largest effect across all parametric manipulations of the training data (Figs. 5-8) would appear to be only a factor of two (Fig. 8 part 7, ASE = 0.143 vs. ASE = 0.0742 baseline). This interpretation is incorrect, however, because it assumes that the lowest achievable error is zero. A model with zero error would perfectly capture every detail of all the PSTHs of the training set, including the background and other sources of noise, spontaneous activity for those particular presentations, as well as all the specifics of the neuronal variance for those particular presentations. Such a model would be maximally overtrained to specific stimulus/response pairs, and would exhibit the minimum ability to generalize.

To help in estimating the significance of the model fit, we developed a statistic based on Monte Carlo simulation (see Methods). This measure has been useful in demonstrating that the residual variance between the fit of the canonical model and the actual neuronal data is not a result of neuronal variance alone. The Monte Carlo simulation also provides an estimate of the minimum error possible (mean of distribution of ASE from simulated test data set). For Cell 1, the mean ASE predicted by neuronal variance alone is 0.0288 ± 0.0038 SD. This can be compared with ASE = 0.0742 for the canonical model. This figure represents a lower bound on the minimum error, as the Monte Carlo simulation used here assumes statistical independence of successive PSTH bins. Thus, the difference between observed error and predicted minimum possible error, as well as the statistical deviation of the canonical model from a complete fit of the data, are both likely to be exaggerated by this statistic. Clearly, further development of appropriate statistical measures is necessary to provide such parametric studies with a firm statistical basis.


Biological implications
   In the work described here we have modeled temporal effects using an array of delays between nodes in a connectionist network. While delay lines may actually occur neurophysiologically (e.g., Carr and Konishi, 1988), temporal details of neuronal behavior result from synaptic physiology, dendritic integration, recurrent connections, and other processes that we had no specific knowledge of for the modeled cells. We did not attempt to capture such details with the model. Thus, while our models can potentially provide useful insights into response properties of ovoidalis (and other) neurons, it would be incautious to assume a priori that details of the connectionist architecture have any relationship whatsoever to neuronal organization. For example, the separation of frequency and amplitude integration from temporal integration in the canonical model was motivated in part by the desire to minimize the number of free parameters, and is therefore an artifact of modeling constraints. In general, successful modeling requires a decision as to the appropriate level of detail to incorporate (Bankes and Margoliash, 1992).

Implications for peripheral nonlinearities. The canonical architecture converged with a high R2 on the entire data set for about two-thirds of the neurons tested. In all cases, the entire data set included tone bursts and songs. Thus, the quantitative relationship between stimulus input and neuronal output we have described predicts both responses to single tone bursts and reponses to songs. The form of this relationship suggests for these neurons it is necessary to posit only compression and threshold nonlinearities to describe the neuronal response within the variance captured by the models. Frequency-by-frequency nonlinear interactions are not required to establish a quantitative relationship between responses to single tone bursts and songs. Although for cells other than Cell 1 we have not determined the minimal training set necessary to establish such a relationship, we have demonstrated that for a majority of cells in our sample such a relationship exists.

The implications of these results should be viewed cautiously, however. The nonlinear mechanics of the basilar membrane has a significant effect on the response of auditory nerve fibers to two tones presented simultaneously in mammals (e.g., Sachs and Kiang, 1968, Sokolowski et al., 1989), and there is evidence for such nonlinearities in birds (Manley, 1990). Thus, three conclusions may obtain. First, it may be that the residual variance of the results for each cell is explained by forms of nonlinear interactions that the canonical architecture did not model. Second, it may be that the threshold and compression nonlinearities of the canonical model approximate other nonlinearities; given the response variance of single neurons it could potentially be difficult to distinguish the specific forms of various nonlinearities within physiologically feasible recording times. It is possible that transfer functions that better mimic known biological nonlinearities than the sigmoidal function used in this study could produce better results. Finally, it could be that nonlinearities in the periphery are canceled by central processes. Thalamic cells need not exhibit peripheral nonlinearities. Furthermore, all three conclusions may obtain.

Importance of temporal dynamics. Explorations with alternative model architectures and sensitivity analysis on the most successful model both suggest that modeling the details of the temporal dynamics of a cell's response was critical for this success. In particular, explorations of the behavior of the model when trained using modified response data reveals that the response of Cell 1 to song is most sensitive to modifications of the temporal dynamics of its response to tone bursts. Modest alterations to amplitude tuning curve, or to the rate/intensity profile have relatively small effects.

The major component of temporal characteristics of Cell 1's response was determined by time delays in the range of 0-25 msec. A similar range of delays has been cited for a broad range of phenomena in the primary auditory pathway of birds and mammals (Langner, 1981; Langner and Schreiner, 1988; Suga and Horikawa, 1986). It is noteworthy that most weights in the 'tail' of the time window of the canonical model (long delays) had small values. Although the best performance was obtained with a time window of 29 bins (representing 185.6 msec), experiments where the time window was systematically changed indicate less than a 1% degradation in performance for a time window of only 19 bins (121.6 msec). For TDNN architectures, the length of the time window may be related not only to the time constants of the cells being modeled, but also to the approximation of the shape of the cells' dynamical responses by windows of fixed duration (see above). Thus, the small weight values at long delays may not reflect the actual memory of Cell 1.

These results suggests that, while temporally dynamic models are more difficult to construct than models based on average firing rates, they can be more useful. Dynamic models of time varying data can capture important information missing in models using only average firing rates, even for cells with relatively simple response properties. Further, as this result is not obvious in the raw data, it provides a demonstration that parametric models of neurophysiological data can reveal implications of the data that otherwise would have remained hidden. Although explorations of the significance of amplitude, frequency, and temporal parameters for a single cell do not prove that this approach is widely applicable, the demonstration of convergence for 11/17 cells and partial convergence for two other cells is certainly promising.


Concluding remarks
   Our choice of model architectures using delay lines (TDNN models) was based on ease of interpretation and superior computational efficiency compared with recurrent models. In our experience, the delay line format is useful in providing the opportunity to explicitly vary windows of temporal integration and allow the investigation of the contributions of various parts of an architecture to observed neuronal properties. Recurrent models, however, have several properties to recommend them. In particular, they allow temporal profiles such as exponential decay to be modeled with greater accuracy and many fewer parameters than are required for a TDNN model. Future experimentation with recurrent architectures and other extensions such as models using "tunable" widths of temporal windows might produce models capable of fitting a wider range of cells than the results reported here.

The limitations of the current models (remaining unexplained variance) and the failures to converge for some cells are at least as instructive as the successes. It will also be interesting to see if extensions of the current architecture can effectively model more ovoidalis cells that are tuned to a narrow range of frequencies, and other classes of cells such as those that exhibit more phasic response profiles. Both properties presumably result from strongly suppressive inputs. Adoption of more biologically realistic 'front-end' processing (e.g., Sachs et al., 1989) may help these efforts.


FOOTNOTES

1

where are the observed (cell response) values, are the predicted (model output) values, , and the summation runs over all time frames for a stimulus/response pair.

2 The TDNN architecture can be understood either as having a single output unit with a fixed temporal window which slides across the input to produce a response profile, or as an architecture with an output unit for each time bin of response data where the weights for each output unit are constrained to be identical. During the training phase the backpropagation algorithm is executed by calculating the weight deltas as normal, summing across time frames, and applying the total delta to each weight of the one actual unit at the end of each stimulus in the training set.

3 For the ASE figure of merit, the model's score against the actual data (average of 20 presentations) was greater than all 1000 Monte Carlo draws from the null distribution. Thus, the null hypothesis is rejected with probability<0.001.

4 For each cell, the FTC was collapsed across amplitudes. The resulting function, which has 29 spike counts as a function of frequency (250-7250 Hz, "xa=250 Hz), was resampled to 32 values using interpolation and decimation routines. The frequency power bin input weights were initialized to the resultant function, normalized to the range 0.1?0.9.
 

Acknowledgments
We thank Edward Z. Hall, who contributed considerable programming expertise to early phases of this project and James S. Hodges who advised us on statistical methods.

This work was supported by a grant from the ONR (N00014-89-J-1509). The physiological experiments were conducted under a grant to DM from the NIH (PHS 1R01 NS25677). DM gratefully acknowledges the support of the Searle Scholars Program/ Chicago Community Trust.


References
Anastasio, T. J. Neural network models of velocity storage in the horizontal vestibulo-ocular reflex. Biological Cybernetics 64: 187-196, 1991.

Bankes S. C., and Margoliash, D. (1992) Methods in computational neurobiology. Amer. Zool. in press.

Carr, C. E., and Konishi, M. Axonal delay lines for time measurement in the owl's brainstem. Proc. Nat. Acad. Sci. USA 85: 8311-8315, 1988.

Diekamp, B., and Margoliash, D. Auditory responses in the nucleus ovoidalis are not so simple. Soc. Neurosci. Abstr. 17: 446, 1991.

Efron, B. The Jackknife, the Bootstrap, and Other Resampling Plans. Philadelphia: SIAM, 1982.

Efron, B., and Tibshirani, R. Bootstrap methods for standard errors, confidence intervals, and other measures of statistical accuracy. Statistical Science 1(1): 54-77, 1986.

Konishi, M. Ethological aspects of auditory pattern recognition. In: Handbook of Sensory Physiology. Vol. VIII: Perception (Held R, Leibowitz HW, Teuber H-L, ed), pp 289-309. Berlin: Springer Verlag, 1978.

Langner, G. Neuronal mechanisms for pitch analysis in the time domain. Exp. Brain Res. 44: 450-454, 1981.

Langner, G., and Schreiner, C. E. Periodicity coding in the inferior colliculus of the cat. I. Neuronal mechanisms. J. Neurophysiol. 60: 1799-1822, 1988.

Lehky S. R., and Sejnowski, T. J. Network model of shape-from-shading: Neural function arises from both receptive and projective fields. Nature 333: 452-454, 1988.

Lockery, S. R., Wittenberg, G., Kristan, W. B., and Cottrell, G. W. Function of identified interneurons in the leech elucidated using neural networks trained by back-propagation. Nature 340: 468-471, 1989.

Manley, G. A. Peripheral Hearing Mechanisms in Reptiles and Birds. Berlin: Springer-Verlag, 1990.

Margoliash, D., and Fortune, E. S. Temporal and harmonic combination sensitive neurons in the zebra finch's HVc. J. Neurosci. in press.

Rumelhart, D.E., Hinton, G.E., and Williams, R.J. Learning internal representations by error propagation. In: Parallel Distributed Processing: Explorations in the Microstructure of Cognition (Rumelhart DE, McClelland JL, ed), pp 318-362. MIT Press, 1986.

Sachs, M. B. and Kiang, N. Y. S. Two-tone inhibition in auditory-nerve fibers. J. Acoust. Soc. Am. 43: 1120-1128, 1968.

Sachs M. B., Winslow, R. L., and Sokolowski, B. H. A. A computational model for rate-level functions from auditory-nerve fibers. Hear. Res. 41: 81-89, 1989.

Sokolowski, B. H. A., Sachs, M. B., and Goldstein, J. L. Auditory nerve rate-level functions for two-tone stimuli: Possible relation to basilar membrane nonlinearity. Hear. Res. 41: 115-124, 1989.

Spirou, G. A., and Young, E. D. Organization of dorsal cochlear nucleus type IV unit response maps and their relationship to activation by bandlimited noise. J. Neurophysiol. 66: 1750-1768, 1989.

Suga, N., and Tsuzuki, K. Inhibition and level-tolerant frequency tuning in the auditory cortex of the mustached bat. J. Neurophysiol. 53: 1109-1145, 1985.

Suga, N., and Horikawa, J. Multiple time axes for representation of echo delays in the auditory cortex of the mustached bat. J. Neurophysiol. 55(4): 776-805, 1986.

Waibel, A. Modular construction of time-delay neural networks for speech recognition. Neural Computation 1: 39-46, 1989.

Zipser, D. Recurrent network model of the neural mechanism of short-term active memory. Neural Computation 3: 179-193, 1991.

Zipser, D., and Andersen, R. A. A back-propagation programmed network that simulates response properties of a subset of posterior parietal neurons. Nature 331: 679-684, 1988.-

Table 1. Response properties and modeling results for 17 ovoidalis cells.

Cell

ID

Spon

(spikes/s)

Latency

(msec)

CF

(kHz)

Amp

(dB)

Q10

(dB)

Q50

(dB)

R2

( SD)

1

grn23/5/1/0

45.4

7.0

6000

20

6.3

1.7

0.861 0.066

2

ol60/0/3/1

35.6

6.5

2250

10

9.0

4.5

0.846 0.040

3

ol60/0/3/2

11.0

11.4

2000

20

2.7

1.0

0.774 0.116

4

ol60/0/3/3

11.8

5.1

2250

20

1.8

1.5

nc

5

ol60/1/1/1

25.8

14.5

3250

5

13.0

13.0

0.8640.063

6*

ol60/1/2/1

23.9

11.3

3250

5

13.0

13.0

0.7890.980

7

ol60/5/1/1

49.0

8.3

4000

15

8.0

5.3

nc

8

ol60/5/1/2

39.2

8.1

3500

10

14.0

4.7

nc

9

ol60/5/2/1

10.5

22.1

3625

10

2.9

2.9

nc

10

ol60/6/2/1

24.2

7.2

5250

25

3.5

1.9

nc

11

ol60/6/3/1

27.9

7.7

5250

25

3.5

2.1

0.833 0.06

12

or74/3/3/1

19.4

7.9

4000

15

2.7

1.8

0.916 0.013

13

or74/4/1/2

1.0

7.3

1500

20

1.0

0.8

nc

14

or74/7/1/2

10.0

7.0

3250

25

0.7

0.7

0.924 0.02

15

or74/16/1/2

12.6

7.3

1500

35

0.7

0.7

0.875 0.038

16

pn60/1/1/1

2.1

10.1

2000

25

0.6

0.7

0.895 0.048

17

pn60/1/1/2

0.5

9.9

2500

25

1.1

0.8

0.846 0.086


* converged after explicit initialization of frequency weights (see text).
Threshold amplitude at CF
nc - no convergence.


Figure legends
Figure 1. Responses of Cell 1. A) Raster and PSTH response (10 msec bins) to 200 msec tone burst of 6.0 kHz at 70 dB. Bold line underneath PSTH represents the stimulus duration. Note phasic onset, slowly adapting tonic response, and post-stimulus inhibition. Recovery from inhibition requires about 200-300 msec. B) Total spikes over 200 msec tone bursts as a function of frequency and amplitude. FTC has a major response peak centered at about 6.0 kHz and a minor response peak at 2.0 kHz. Note weak inhibitory sidebands and sloping saturation. C) Raster and PSTH response to the bird?s own song, peak amplitude 80 dB. Oscillograph and sonograph appear time-aligned beneath the response data. Time axis represents 3 sec (same scale as A). All stimuli 20 repetitions.

Figure 2. Alternative diagrams of canonical model architecture. Stimulus parameters at a given point in time, represented as an array of frequency bins (sonogram) plus the RMS amplitude represented by a thermometer code, are combined via a weighted sum. This weighted sum is transformed by a sigmoidal activation function to produce the output of the hidden unit for that time. An array of delay lines provide hidden unit outputs over a window of times as input to the output unit. These are combined by a weighted sum, and then transformed by a sigmoidal function to produce the predicted firing rate. A) Model architecture represented as a single hidden node connected to a single output node by an array of delay lines of varying lengths. This architecture can be imagined to be slid across a spatial representation of the input to produce an output profile, or equivalently as receiving inputs sequentially, producing a sequence of outputs. B) Equivalent view of constant model architecture, with an array of hidden nodes over a window of time, which slides across the input representation to produce output values sequentially. Analogous weights of the various hidden nodes are constrained to have identical values. C) Model architecture interpreted as relationships among temporal profiles among input, hidden, and output layers. A separate hidden node for each time bin integrates the spectral and amplitude information for that time bin. Each node in the output layer integrates a temporal window of hidden node outputs. Weights of the hidden or output nodes at different times are constrained to be identical. D) Detail showing how information is integrated at a hidden or output node. The bias weight is not present for the hidden node (see text).

Figure 3. Final weight values for the canonical architecture trained on tone bursts from Cell 1. A) Weights for output units are strongly excitatory for small delays, and have small values elsewhere except for final weight (see text). Bias weight is strongly negative to produce Cell 1?s normalize background firing rate (app. 0.2). B) Input weights for frequency bins (filled bars) show a strong excitatory peak around Cell 1?s characteristic frequency of 6.25 kHz, and a weak excitatory peak around 2.0 kHz, the second response peak of Cell 1's FTC. Inhibition surrounds excitatory response peaks. At each frequency, the total spikes for Cell 1 (corrected for spontaneous rate) summed across all amplitudes is also plotted (open circles). C) Input weights for signal amplitude are large and positive at 40, 60, and 80 dB, and smaller elsewhere. Note: this structure may be an artifact of the set of amplitudes included in the training set (see text).

Figure 4. Predicted responses (dashed line) and actual responses (solid line) for the canonical model and Cell 1, respectively, for six zebra finch songs at 80 dB peak amplitude. The model was trained on tone bursts and noise bursts. Note close match. Where the fit is poor, the model tends to over-respond.

Figure 5. A) Amplitude shift tuning curve. Altering the training data to affect an amplitude shift in the FTC has only a modest impact on model performance except for large changes in amplitude. B) Frequency shift tuning curve. Altering the training data to shift the effective FTC by 500 Hz produces a modest degradation in model performance. Larger shifts have a greater impact.

Figure 6. Changes in rate/intensity function. This histogram shows the effect of scaling the firing rates in the training set data depending upon the instantaneous amplitude of the stimulus. This procedure alters the effective rate/intensity function. The inset graphs show masks used and the resultant rate/intensity function for each case, including the baseline of no alteration. All changes have relatively small effects.

Figure 7. Latency shift tuning curve. Altering the temporal registration of stimulus/response pairs in the training set mimics a cell with a different latency. Model performance was highly sensitive to these alterations, with any shift of more than one bin producing very degraded performance on the test set.

Figure 8. Adaptation changes. This histogram displays the effects of modifications to the temporal response profile of members of the training set. Columns 1 through 6 six show the effects of multiplying response profiles by linear ramps (shown in insets). Column 7 shows the effect of reducing the tonic phase of responses to background rates, mimicking a primarily phasic cell. Column 8 shows the effect of reducing the phasic peak at the initiation of response to tonic firing rates, mimicking a primarily tonic cell. Baseline shows response for no alteration.
 

>| return to News, Press and Publications section
>| up