I've mainly summarized the Structured/Sparse references and included
some notes for some of the other papers.  I could not obtain a few of
the papers and have not had a chance to read others.  The comments are
minimal, but please tell me if I've misunderstood or misrepresented
your work and I will correct the error.  If you have seen more recent
publications than the ones found here, I would appreciate the
information and will include it in this list.

My thanks to D.R. Mani, Stan Franklin, Joydeep Ghosh, S.C. Kak, Phil
Kohn, Jacob M. Murre, Soheil Shams and Jim Steck for responding to my
request for references.  As noted elsewhere, most of the references
are from D.R. Mani's dissertation proposal.

Overview Papers
---------------
	Nordstrom92 - survey of implementations of bp, recurrent and kohonen
nets on a variety of architectures, including specialized machines.
	Singer90 - survey of CM-2 implementations

			Structured/Sparse Connectionist Networks
			========================================
Model of Computation
--------------------
    There does not seems to be much literature on mapping semantic
networks onto parallel machines.  The work by chung92 and evett89 (and
other PARKA references which I have not yet read) is targeted at
semantic nets while the work in blell87, wei91 and shams92 is mainly
intended for sparse backprop style networks, but might be applied to
semantic nets as well.  Since large, sparse networks are often
specified hierarchically, ghosh89 provides a model that models the net
hierarchically, with the highest connectivity locally (within a
"core"), lower within an "influence" region and lowest in the remote
region.  In addition, conditional connection probabilities model the
"clumpiness" of connections. These conditional probabilities reflect
that fact that (roughly speaking) a connection between two clusters of
a network affects the likelihood that there are other connections as
well.  murre93b investigates general sparsity of connections and
concludes that low uniform sparsity is not very helpful and processor
topology does not much matter.  However, if we partition the neural
network into modules, and if the connections between modules are
sparse, it does help.  The conditional probability used by ghosh89
generates clumpy links, somewhere inbetween uniform sparsity and
"modular" sparsity.

SIMD Mappings
-------------
	In different SIMD mappings of sparse nets onto the CM-2
blell87 assigns two processors per weight and one per node, while
chung92 puts all the outgoing weights on the same processor.

	Lange90 describes a simulator (DESCARTES) that is implemented
on the CM-2 and uses the same mapping as blell87.  The simulator has
mechanisms for dealing with control - which nets are active
and with what frequency are they cycled. 

evett89 and chung92 both represent semantic nets with arbitrary
topologies and labelled arcs between nodes.  Chung's system also
provides for weights on the links while evett89's PARKA system does
not.  The data structures used by evet89 and chung92 are actually
quite similar (both essentially store a set of out-pointers from every
node), but evett89 forces every node to have a pointer (even if it is
null) for every possible arc label available, whereas chung92
explicitly labels arcs with a (4 bit) tag.  Clearly there are various
tradeoffs like the ease of finding all arcs with a particular label,
the number of labels allowed (maximum of 16 in chung92), space
requirements etc.
	wei91 is meant for the backprop of sparse networks and takes
advantage of an efficient permutation algorithm on a mesh structured
machine (certain types of communication must be permitted) to shuffle
around the activations to the weights and the results back to the
activations.  shams92 developes his own mesh SIMD machine, the DREAM
machine, and a constraint net optimization of the assignment and
scheduling problems.  
	
MIMD Machines
-------------
	lange90 deals with massively parallel MIMD machines 

A very different sort of paper is hinton81 which reformulates the 
semantic net problem as one of associative recall of missing members of
triplets (node1 rel node2) with properties like inheritance falling out of 
his representation.  The resulting neural net is fully connected.

bic87 recasts the problem in a predicate logic framework which captures
both the structure of the network as well as the nature of permissible queries.

		Data Parallel Approach
		------------------------
	Training set parallel approaches may be (usually trivially)
applied to any style of network.

Speed of Implementation
-----------------------
	Many of the fastest implementations of backprop style networks
turn out to be training set parallel, when the speed is measures in 
Connections per second.  However, as paugh93 makes clear and has been
discussed on connectionists, this may not translate into any decrease
in training time.  

Singer90 uses training set parallelism to obtain > 1GCPS on the CM-2. 
	Other CM-2 approaches sometimes combine a training set parallel approach
	in combination with another mapping method.

paugh93 There is a well known trade-off between the speed of convergence
	of backprop and the amount of training-set parallelism used. Batch
	updating, the extreme case, leads to much slower convergence. This
        effect  dilutes the speedup.  pugh93 explores this tradeoff
	and obtains empirical rules for the learning rate (not considering
	momentum) and block size to maximize speedup.


		Multi-Layer Backprop Style Networks
		-----------------------------------
Analysis
--------
chu92	Applying integer programming techinques to obtain mappings of
	a layered net (fully connectivity between layers)
	that are within epsilon of optimal. Formulates the problem with
	linear constraints with a non-linear objective. Simplifies 
	by partitioning the processor graph and making assumptions about 
	the dominance of computation within a partition.
	

SIMD Machines
-------------
	In addition to the CM-2, there are a large number of specialized
processors, and implementations are surveyed in Nordstrom92.

blell87  may be used for the dense case as well as the sparse case.

Zhang90  uses a "vertical" partitioning of the network, i.e. a processor
	gets a single node from each layer (as well as the accompanying weights)
	Basic communication is cycling around the activation vec. within a layer.

singer90 surveys other methods and also proposes a very fast data parallel 
	approach which would be most applicable to relatively small networks.

chinn90  applies systolic techniques to implement a dense net on a MasPar MP-1

kak88  deals with an efficient systolic style matrix vector algorithm.

shams92a deals with the design of a SIMD mesh architecture with specialized
	routing switches that permits efficient implementation of multiple 
	communication rings. Deals with algorithms for mapping newtorks onto 
	the DREAM machine.


MIMD Machines
-------------
	There are an enormous number of implementations on MIMD machines,
especially the transputer, eg. Morgan90, chu92, shams92a, kung89 and many others.

 
			Hopfield Style Networks
			-----------------------

Forrest87	Implements the Durbin and Willshaw Elastic Net, 
	an associative memory and the Geman and Geman algorithm on the DAP.

wang89  	Implemenation on a shared memory vector processor

Manner89	Hopfield net considerations are used to derive constraints on
	a parallel processor of 68020's. Different updating considered


			Biologically Realistic Simulations
			----------------------------------
Nelson89 deals with general issues in parallel algorithm analysis
	and proceeds to the description of the implementation of compartmental
	models, a chief issue being the partition of the compartmental model.