Spatio-temporal networks for speech and visual pattern recognition

I am interested in the representational, computational, and adaptive properties of spatio-temporal networks and the use of such nets in speech and visual pattern recogntion.

A spatio-temporal neural net differs from other neural networks in two ways.

Propagation delays along links and recurrence play an important role in the computations carried out by the network.

The representational state of the network depends not only on which nodes are firing, but also on the relative firing times of nodes.

Consequently, the representational significance of a node varies with time and depends on the firing state of other nodes. The use of recurrence and multiple links with variable propagation delays provides a rich mechanism for integration, context sensitivity, feature extraction and pattern recognition. Recurrent links enable nodes to integrate and differentiate inputs, detect the onset of features and measure their duration. At the same time, multiple links with variable propagation delays between nodes serve as a short-term memory and allow the network to maintain context over a window of time.

The combination of the above characteristics makes spatio-temporal neural networks a potentially powerful mechanism for pattern recognition. Needless to say spatio-temporal neural networks have a sound basis in biology.

In our research we use the Temporal Flow Model (TFM) [Watrous and Shastri, 1986][Watrous, 1988]. For some details on how we train TFM look here.

The need for spatio-temporal network arises naturally when dealing with problems such as speech recognition and time series prediction where the input signal has an explicit temporal aspect. In work with Thomas Fontaine [Fontaine and Shastri, 1993][Shastri and Fontaine, 1995] we have demonstrated that certain tasks that do not have an explicit temporal aspect can also be processed advantageously with neural networks capable of dealing with temporal information. In particular, we have proposed that converting static patterns into time-varying (spatio-temporal) signals by scanning the image would lead to a number of significant advantages.

The effectiveness of the above ideas has been demonstrated in the dissertation work of Thomas Fontaine who designed and trained a system for recognizing sequences of handwritten digits. The system has a 96% recognition rate on a dataset of 2,700 isolated digits provided by USPS and a 96.5% recognition rate on a set of 207,000 isolated digits provided by NIST. On a set of 540 real-word ZIP code images provided by USPS, the system achieved a raw accuracy of 66.0%. A postscript paper describing this work may be found here.

Another paper (in postscript form) describing work done with C. Privitera on a hierarchical self-organizing model for visual trajectory classification may be found here. This paper appeared in the Proceedings of ICANN-96 -- the 1996 International Conference on Artificial Neural Networks, Bochum, Germany.

An extended version of the above paper is also available.


This research has been supported by: NSF grants SBR-9720398, ONR grant N00014-93-1-1149 to L. Shastri, and ARO grants DAA29-84-9-0027 and DAAL03-89-C-0031 to the AI Research Center, the University of Pennsylvania.