Eliasmith, Chris (unpublished). LT-Frame: A replacement for logical inference as a basis for human cognition. DRAFT. Dec. 1996.

LT-Frame: A Replacement for Logical Inference as a Basis for Human Cognition

Philosophy-Neuroscience-Psychology Program, Department of Philosophy, Washington University in St. Louis, Campus Box 1073, One Brookings Drive, St. Louis, MO 63130-4899, chris@twinearth.wustl.edu

Dec, 1996

Introduction

A number of prominent psychologists have argued for the centrality of formal logic in human cognition. Included in this group are Lance Rips, Martin Braine, and Irvin Rock (see Thagard 1996). In particular, Irvin Rock (1977) is interested in defending the position that the process of perception is highly logic-like. However, he weakens his claim slightly by insisting that it is not necessarily the case that syllogistic reasoning is taking place in the brain, but rather that "thinking can be translated into a form of logical inference, as if it occurs in precisely that way, not that it does follow that form" (Rock 1977, p. 364). Of course, if thinking can be translated into logical inference, it must not break the rules of logical inference, so such a claim is still testable in regards to human cognition.

However, a number of early psychological experiments have clearly shown the great difficulty with which human subjects are able to perform some simple inferences. For example, in an experiment by Shapiro (reported in Wason and Johnson-Laird 1972) a group of college students was presented with four forms of logical arguments: modus ponens; modus tollens; denying the antecedent; and affirming the consequent. The first two are valid arguments, while the second two are logical fallacies. These arguments are respectively of the form:

The results of this experiment were summarized by noting that: "The subjects certainly show a susceptibility to the fallacies, and a reliable difficulty in appreciating that modus tollendo tollens is valid" (Wason and Johnson-Laird 1972, p. 44).

It is interestingly odd that subjects who, according to Rock's theory, have basic perceptual mechanisms which "abhor" (Rock 1977, p. 367) logical contradiction seem to be unmoved by logical fallacies with which they are presented. It is true, of course, that the fallacies are being committed at a different 'level' of cognition. Nevertheless, if logic is fundamental to some types of cognition, and not others (especially where the former, in many ways, precedes the latter) a satisfactory cognitive theory must explain this inconsistency in human cognition. Rock (1977) does not attempt to address these sorts of difficulties with his theoretical approach.

In the remainder of this paper, I will present a different approach to modeling cognition which relies on learned transformations rather than logical inference. I will attempt to show how this theoretical framework is better able to explain both the functioning of the human perceptual system and the performance of human cognizers on tasks involving logical inference. I will begin the discussion by briefly describing the representational foundations of this new theoretical framework; which I will refer to as LT-Frame (Learned Transformation Framework). Subsequently, I will describe an example of human visual perception taken from Rock (1977) and show how this framework is better able to account for human performance on this tasks than is Rock's own. Finally, I will present evidence that this framework is also able to explain human performance (both good and bad) on logical inference tasks. Thus, I will propose a framework that is able to provide a unified explanation of human cognition that is not available under Rock's program.

Holographic Reduced Representations

The representational medium of LT-Frame is central to its ability to provide unified accounts of human cognition. The representations used in LT-Frame are called Holographic Reduced Representations or HRRs. HRRs are a particular form of distributed representation. The concept of distributed representation is a product of joint developments in the neurosciences and in connectionist work on recognition tasks (Churchland and Sejnowski 1992). Distributed representations have a number of important advantages over traditional forms of representation. Briefly, distributed representations:

Have been successfully applied to visual (Qian and Sejnowski 1988), olfactory (Skarda and Freeman 1987), auditory (Lazzaro and Mead 1989) and tactile problems;
Have been proved to degrade gracefully with noise (Churchland 1992) and are commonly tested with simulated lesions (i.e. a removal of part of the representation);
Are the natural result of organization of statistical input and thus provide a natural means to capturing semantic information (Smolensky 1995);
Represent concepts continuously;
Are processed in parallel, and;
Can be learned using proven methods (Hinton 1986).

The greatest shortcoming of distributed representations has been, until recently, their inability to capture complex structural relations. However, HRRs are both distributed, and structurally sensitive (Plate 1994). These representations are constructed, or encoded, using a form of vector multiplication called circular convolution and are related to the better-known tensor products of Smolensky (Smolensky 1990). Decoding of HRRs is performed using the approximate inverse of circular convolution, an operation called correlation (see appendix A for the algebraic details of these operations).

The HRR representations are "holographic" because the encoding and decoding operations (i.e. convolution and correlation) used to manipulate these complex distributed representations are the same as those which underlie explanations of holography (Borsellino and Poggio 1973). The convolution of two HRRs creates a third unique HRR which encodes the information present in the two previous HRRs. Importantly, this new HRR is not similar to either of its components, though the components may be retrieved through decoding the new HRR with the correlation operator. These operations are easiest to understand through a simple illustration. Let A, B and C be distributed, 3-dimensional HRR vectors. If C = A ƒ B (read: C equals A convolved with B) then, C # A B (read: C correlated with A approximately equals B) and C # B A (see figure 1).

Figure 1. HRR operations depicted in three dimensions. Vector C is the circular convolution of vectors A and B. Vector D is the normalized superposition of A and B.

Along with convolution another operation, superposition, can be used to combine two HRR vectors. Superposition of two vectors simply results in their sum, and is written: D = A + B (see figure 1). Superimposing two vectors results in a vector which is very similar to the original two vectors but cannot provide a perfect reconstruction of the original vectors. This contrasts with convolution, in which the resulting vector is nothing like the original two vectors; in fact, the expected similarity of either of the original vectors to their convolution is zero (Plate 1994, p. 57). In both cases, some of the information in the original HRR vectors is lost in their combination. Hence, HRRs are considered 'reduced' representations. Upon encoding a new representation from a number of others, the new representation does not contain all of the information present prior to encoding. In other words, the new representation is noisy. Nevertheless, these representations are extremely effective at capturing complex relational information (Eliasmith and Thagard forthcoming). Furthermore, the noise occurring in HRRs seems to have a neurological counterpart (Andersen, personal comment).

When decoding an HRR vector the resultant vector must be recognized even though it is not expected to be identical to the vector that was encoded. The process of recognizing a vector is accomplished through use of another operator called the dot product and represented as 'ï'. The dot product of two vectors is the sum of the product of each of the corresponding elements of the vectors. For normalized vectors, the resulting scalar is equivalent to the length of one of the vectors projected on to the other. This relative length value can be used as a measure of the vectors' similarity. Because all of the vectors are normalized to the unit radius, we can use the dot product operation to determine the similarity of any two vectors.

A number of properties of HRRs make them promising candidates for modeling human cognition. First, HRRs are distributed representations. This means that they have all of the benefits associated with distributed representations.

Second, HRRs accommodate arbitrary variable binding through the use of convolution. Third, HRRs can effectively capture embedded structure (Plate, 1994; Eliasmith and Thagard, forthcoming). Fourth, unlike tensor products, and most other distributed representations which use vector multiplication, HRRs are fixed dimension vectors. Thus, convolving two three-dimensional vectors results in another three-dimensional vector -- not a six- or nine-dimensional vector. Consequently, HRRs are not subject to an explosion in the size of representations as the structures represented become more complex. This property also allows HRRs of various structural depths to be easily comparable to each other without "padding" the representation, as is necessary with tensor products. Fifth and finally, convolution can be implemented by a recurrent connectionist network (Plate, 1993). The potential for implementation in a recurrent network supports the neurological plausibility of HRRs. Though the extent of neurological realism of any such artificial neural networks may be disputed, it is indisputable that they are more neurologically realistic than either localist connectionist or symbolic models (Smolensky, 1995).

An Example of Visual Perception

As an example of unconscious inference, Rock (1977) proposes the following example (p. 358):

The line is perceived as egocentrically oblique at a 40^o clockwise angle with respect to the head (proximal mode).
Information is available that the head is tilted 50^o clockwise with respect to gravity.
Therefore the line in the environment producing the image must be 90^o from the direction of gravity or horizontal.

Rock believes that the process of visual perception "is much like syllogistic reasoning...the premises and conclusions entail relations" (p. 359). As can be seen from this example, the conclusion of these sorts of syllogisms is a percept in the proximal mode (e.g., perceived angle).

There are two major drawbacks to Rock's account of visual perception in this example. First, Cooper and Shepard (1973) have shown that the amount of rotation of an object in the environment affects the time it takes to make decisions concerning orientation of the object (as in Thagard 1996). However, if perception of orientation is always a three step syllogism, it is not clear why the amount of rotation would have such an affect. Solution of the syllogism should take the same amount of time given that the premises are of the same complexity, only the values (i.e. 40^o, 50^o, and 90^o) would vary.

Second, there is evidence that our perceptual systems do not provide accurate absolute measures of external stimuli. In particular, there is evidence that our vestibular system provides more accurate information concerning our orientation relative to the vertical the more upright we are, and fails completely at an angle of about 40^o. Similarly, other sensory systems seem unable to provide 'veridical' information concerning the environment (Akins 1996). Thus, Rock is making, at the very least, controversial assumptions in claiming that absolute measures of orientation are readily available for the syllogistic reasoning process.

Using the LT-Frame, perception of orientation is understood in a much different, non-logical manner. Rather, on the basis of past experience, a general rotational transformation vector can be derived, and later applied to novel situations. Thus, given the vector representation of an image (for simplicity's sake, a point), say a = {0 1 0 0}, and the results of a right-rotation, b = {0 0 1 0}, it is possible to derive the vector which must be convolved with a to produce b as follows:

t_right = a # b

In other words, the circular correlation of a with b provides the transformational vector for right rotations (i.e. t_right). Similarly, the left-rotation transformation can be derived to give t_left. Once the transformation vectors have been formed, novel transformations can be produced. For instance, b circularly convolved with t_right (i.e. b t_right ) will produce the vector {0 0 0 1}. As well, the transformations can be performed one after the other to produce 'more distant' or 'further' rotations. Thus, b t_left t_left will produce the vector {1 0 0 0} (see appendix B for details).

Notably, the LT-Frame avoids both of the short-comings of the unconscious inference framework. First, the amount of rotation of an object will affect the time it takes to perform the transformation since more transformations will have to be applied to the representation of the object the farther it is from the vertical. This property fits well with experimental evidence (Kosslyn 1994). Second, the transformation that is applied is based on the past experience of the perceiver. Thus, we do not need to assume that the perceptual system provides accurate absolute measures of object orientation, only that it is consistent.

Examples of Logical Inference

As noted in the introduction, it seems odd that though, according to Rock's theory, our perceptual system is fundamentally logical evidence at the psychological level shows that people have great difficulty with simple logical operations. It is not clear how Rock would explain this apparent inconsistency. Whatever the explanation was, it would likely introduce notions foreign to syllogistic logic, and render Rock's overall theory of cognition far more complex, and perhaps result in a piecemeal explanation of the data.

In contrast, the LT-Frame is able to account for both perceptual and psychological phenomena. The previous section discussed the former. This section will address how the LT-Frame accounts for human logical performance -- both its strengths and weaknesses. Together, these sections demonstrate the ability of the LT-Frame to provide a unified account of human cognition.

Unlike the four dimensional vector used in the rotational simulations, a 4096 dimensional vector is used to perform the logical operations. To encode the logical sentence "a b" the following HRR is constructed (see appendix C, input):

C1 = rel implies + ante a + cnsq b

This encodes that the relation between the antecedent and the consequent is one of implication. This is a general propositional encoding scheme which has been used successfully to perform complex analogical mappings (see Eliasmith and Thagard forthcoming). Once the simple logical sentence has been encoded, it is necessary to construct transformations that will change the sentence into one which can be decoded as required (e.g. to perform modus ponens). For example, a modus ponens transformation is:

tmp1 = ante # (cnsq # C1)

When convolved with C1, this transformation will replace the ante designator with whatever is in the cnsq position of C1. So, letting C2 = C1 tmp1, we can imagine the transformation to have made C2 as:

C2 = rel implies + b a

Thus, upon correlating C2 with a (the antecedent of the conditional), the 'answer' is b. Thus, when the transformation is applied to C1, the correlation of the result with the antecedent of C1 will be approximately equal to the consequent of C1. This is how the system performs modus ponens on any sentence of the form "a b".

For modus tollens, another transformation must be constructed (tmt1) such that correlation of ~b with C1 tmt1 will be approximately equal to ~a. Appendix C provides two such transformations. Notice that being able to provide multiple ways of performing the same logical inference is a possible means of accounting for individual differences in performance on this task. In this case, tmt1 seems to be a less reliable transformation for perform modus tollens than tmt2. As well, the modus tollens transformation is far more complex than that necessary for performing modus tollens. This difference fits well with the psychological data which shows that people have far more difficulty correctly performing modus tollens than modus ponens (Wason and Johnson-Laird 1972).

Also, as appendix C demonstrates, these transformations provide a means of explaining the tendency subjects have to accept logical fallacies (in particular affirming the consequent and denying the antecedent). Since these transformations can also be used with an invalid query (e.g., "a b, b, ?"), it is possible to see how fallacies arise. The reason that fallacies are not always reported as valid would thus be due to learning the form of a valid query.

Conclusion

Though the results of both the perceptual and logical examples discussed are preliminary, they provide a promising route for providing a unified account of human cognition. As well, these applications of the LT-Frame for explaining cognition fit better with empirical data than the unconscious inference framework proposed by Rock (1977) and others. Speech perception.

Appendix A - The Details of HRR Operations

Consider a set E of elements which are holographic reduced representations (HRRs). A member of E is an n-dimensional vector whose contents may represent an image, a proposition, a concept, etc. The prima facie similarity of two vectors is captured by their dot product. The operations necessary to encode and decode HRRs can be understood as follows:

Let be the space of item vectors in n-dimensions, and let be the space of stored vectors in n-dimensions.

Let

be the encoding operation (circular convolution),

be the decoding operation (circular correlation), and

be the superposition operation (addition). These three operations make it possible to store any relations necessary for generating the network of relations amongst elements of E.

The circular convolution operation ƒ is often referred to simply as convolution and consists of the following operations for c = a ƒ b where a, b, and c are n-dimensional vectors:

co = aobo + anb1 + an-1b2 + ... + a1bn

c1 = a1bo + aob1 + anb2 + ... + a2bn

cn = anbo + an-1b1 + ... + aobn

for j=0 to n-1 (subscripts are modulo-n)

This operation can be represented as:

Figure 2. Visual representation of circular convolution (adapted from Plate (1994)).

Similarly, the circular correlation operation # is often referred to simply as correlation and consists of the following operations for d = a # c:

do = aoco + a1c1 + ... + ancn

d1 = anco + aoc1 + ... + an-1cn

dn = a1co + a2c1 + ... + aocn

for j=0 to n-1 (subscripts are modulo-n)

This operation can be represented as:

Figure 3. Visual representation of circular correlation (adapted from Plate (1994)).

Notably, the correlation of two vectors a # c can be written as a* ƒ c where a* is the approximate inverse of a which is defined as:

Let

a = {ao, a1, ..., an}

then

a* = {ao, an, ..., a1}

Though the exact inverse, a-1, could be used to decode a ƒ c exactly, this process results in a lower signal-to-noise ratio in the retrieved vectors in most instances.

Appendix B - The Rotation Simulation

These simulations were performed using PowerMac 8500. The software was written by Tony Plate in C. I ported the software to the Mac environment using Metrowerks Code Warrior.

Input

# Very simple simulation of visual rotation.

#The vectors (only a and b need be specified, but

#entering them all clarifies the output)

a = {1 0 0 0}

b = {0 1 0 0}

c = {0 0 1 0}

d = {0 0 0 1}

#Calculating the transformations

t_left = <b'*a>

t_right = <a'*b>

#Storing the possible answers in clean up memory

store(mem, a, b, c, d)

#Provide the top two answers

match.capacity = 2

Output

To understand the output, take the following as an example:

1. > match(mem, a*t_left)

d 1

c 1.22e-16

In this case, the transformation a t_left is matched to clean up memory to determine which know vector it most closely matches. The top two answers are displayed, with the similarity measure provided to the left of the vector's name. Therefore, vector d = {1 0 0 0} is the best match for this transformation with a similarity of 1 (i.e. 100%). The following examples can be understood in the same manner.

2. > match(mem, a*t_right)

b 1

a 1.22e-16

3. > match(mem, b*t_right*t_right*t_right)

a 1

b 3.67e-16

4. > match(mem, d*t_right)

a 1

b 1.22e-16

5. > match(mem, c*t_right*t_left)

c 1

b 2.45e-16

Appendix C - The Logical Inference Simulations

These simulations were performed using PowerMac 8500. The software was written by Tony Plate in C. I ported the software to the Mac environment using Metrowerks Code Warrior.

Input

#Logical inference using TL-Frame

#Generate the vectors needed to encode the logical statements

a = randvec()

b = randvec()

not = randvec()

rel = randvec()

impl = randvec()

ante = randvec()

cnsq = randvec()

#Encode the negation of a and b

not_a = <not * a>

not_b = <not * b>

#Encode a->b

c1 = <rel * impl + ante*a + cnsq*b>

#Store the vectors in clean up memory

store(mem, a, b, impl, ante, cnsq, c1, not_a, not_b)

store(mem, not)

#Show the 2 best answers to any query

match.capacity = 2

#MODUS PONENS

#Construct the modus ponens transformation

tmp1 = <ante'*<cnsq'*c1>>

# The transformation can also be done as follows (they are the same)

tmp2 = <cnsq'*<ante'*c1>>

#MODUS TOLLENS

#Version 1 using the modus ponens transformation

t1 = <<ante'*c1>'*cnsq' * c1 * not>

t2 = <<cnsq'*c1>' * ante'*c1 * not>

t3 = <t1+t2>

tmp3 = <ante'*<cnsq'*<c1*t3>>>

tmt1= <t3*tmp3>

#same as above in one step

tmt2 = <<<<<ante'*c1>'*cnsq' * c1 * not> + <<cnsq'*c1>' * ante'*c1 * not>> * tmp2>>>>>*<ante'*<cnsq'*<c1 * <<<<ante'*c1>'*cnsq' * c1 * not> + <<cnsq'*c1>' * ante'*c1 * not>> * tmp2>>>>>>

#Version 2, not using the modus ponens transformation

t5 = <<ante'*c1>' * ante'*c1 * not>

t6 = <ante'*cnsq'*c1*not>

tmt3 = <t5*t6>

#same as above in one step

tmt4 = <<<ante'*c1>' * ante'*c1 * not> * <ante'*cnsq'*c1*not>>

# Perform the Transformations

c2 = <c1*tmp1>

c3 = <c1*tmt1>

c4 = <c1*tmt3>

Output

Note that '* is the same as #, i.e. the operator for circular correlation. Thus, the modus ponens example shows that the best match to C2 # a is b. C2 is equal to C1 convolved with the operator tmp1 (i.e. C2 tmp1). Thus it is the transformed version of the proposition "a implies b" which will perform modus ponens upon correlation with a. The output can be similarly understood for the modus tollens and other examples.

Modus Ponens

> match(mem, a'*c2)

b 0.770

???

Modus Tollens

Transformation 1

> match(mem, not_b'*c3)

not_a 0.109

c1 0.104

Transformation 2

> match(mem, not_b'*c4)

not_a 0.279

not 0.0719

Affirming the Consequent

> match(mem, b'*c2)

a 0.607

???

Denying the Antecedent

Transformation 1

> match(mem, not_a'*c3)

not_b 0.109

ante 0.0758

Transformation 2

> match(mem, not_a'*c4)

not_b 0.279

not_a 0.223

References

Akins, K. (1996). Of sensory systems and the "aboutness" of mental states. Journal of Philosophy: 337-72.

Borsellino, A. and T. Poggio (1973). Convolution and correlation algebras. Kybernetik 13: 113-122.

Churchland, P. M. (1992). A feed-forward network for fast stereo vision with a movable fusion plane. Android Epistemology: Proceedings of the 2nd International Workshop on Human and Machine Cognition, Cambridge, MA, AAAI Press/MIT Press.

Churchland, P. S. and T. Sejnowski (1992). The computational brain. Cambridge, MA, MIT Press.

Eliasmith, C. and P. Thagard (forthcoming). Integrating structure and meaning: A distributed model of analogical mapping. Psychological Review under review.

Hinton, G. E. (1986). Learning distributed representations of concepts. Eighth Conference of the Cognitive Science Society, Lawrence Erlbaum Associates.

Kosslyn, S. (1994). Image and brain: The resolution of the imagery debate. Cambridge, MA, The MIT Press.

Lazzaro, J. and C. Mead (1989). A silicon model of auditory localization. Neural Computation 1: 47-57.

Plate, T. A. (1994). Distributed representations and nested compositional structure, PhD Thesis. University of Toronto.

Qian, N. and T. J. Sejnowski (1988). Learning to solve random-dot seterograms of dense and transparent surfaces with recurrent backpropagation. Connectionist Models Summer School, San Mateo, Morgan Kaufmann Publishers.

Rock, I. (1977). In defense of unconscious inference. Stability and constancy in visual perception. W. Epstein. New York, NY, Wiley.

Skarda, C. A. and W. J. Freeman (1987). How brains make chaos in order to make sense of the world. Behavioral and Brain Sciences 10: 161-195.

Smolensky, P. (1990). Tensor product variable binding and the representation of symbolic structures in connectionist systems. Artificial Intelligence 46: 159-217.

Smolensky, P. (1995). Computational models of mind. A companion to the philosophy of mind. S. Guttenplan. Cambridge, MA, Blackwell.

Thagard, P. (1996). Mind: Introduction to cognitive science. Cambridge, MA, MIT Press.

Wason, P. and P. Johnson-Laird (1972). Psychology of reasoning: Structure and content. Cambridge, MA, Harvard University Press.